polyaxon.experiments.rl_experiment.RLExperiment(agent, env, train_steps=None, train_episodes=None, first_update=5000, update_frequency=15, eval_steps=10, train_hooks=None, eval_hooks=None, eval_delay_secs=0, continuous_eval_throttle_secs=60, eval_every_n_steps=1, delay_workers_by_global_step=False, export_strategies=None, train_steps_per_iteration=100)

Experiment is a class containing all information needed to train an agent.

After an experiment is created (by passing an Agent for training and evaluation), an Experiment instance knows how to invoke training and eval loops in a sensible fashion for distributed training.

None of the functions passed to this constructor are executed at construction time. They are stored and used when a method is executed which requires it.

  • Args:

    • agent: Object implementing an Agent.
    • train_steps: Perform this many steps of training. default: None, means train forever.
    • train_episodes: Perform this many episodes of training. default: None, means train forever.
    • first_update: First timestep to calculate loss and train_op. This is related to the global_timestep variable, number of timesteps in episodes.
    • update_frequency: The frequency at which we should calculate loss and train_op. This frequency is related to the gloabl_step which is incremented every time we update the network.
    • eval_steps: evaluate runs until input is exhausted (or another exception is raised), or for eval_steps steps, if specified.
    • train_hooks: A list of monitors to pass to the Agent's fit function.
    • eval_hooks: A list of SessionRunHook hooks to pass to the Agent's evaluate function.
    • eval_delay_secs: Start evaluating after waiting for this many seconds.
    • continuous_eval_throttle_secs: Do not re-evaluate unless the last evaluation was started at least this many seconds ago for continuous_eval().
    • eval_every_n_steps: (applies only to train_and_evaluate). the minimum number of steps between evaluations. Of course, evaluation does not occur if no new snapshot is available, hence, this is the minimum.
    • delay_workers_by_global_step: if True delays training workers based on global step instead of time.
    • export_strategies: A list of ExportStrategys, or a single one, or None.
    • train_steps_per_iteration: (applies only to continuous_train_and_evaluate). Perform this many (integer) number of train steps for each training-evaluation iteration. With a small value, the model will be evaluated more frequently with more checkpoints saved. If None, will use a default value (which is smaller than train_steps if provided).
  • Raises:

    • ValueError: if estimator does not implement Estimator interface, or if export_strategies has the wrong type.


reset_export_strategies(self, new_export_strategies=None)

Resets the export strategies with the new_export_strategies.

  • Args:

    • new_export_strategies: A new list of ExportStrategys, or a single one, or None.
  • Returns: The old export strategies.


extend_eval_hooks(self, additional_hooks)

Extends the hooks for training.


extend_eval_hooks(self, additional_hooks)

Extends the hooks for training.


train(self, delay_secs=None)

Fit the agent.

Train the agent for self._train_steps steps, after waiting for delay_secs seconds. If self._train_steps is None, train forever.

  • Args:

    • delay_secs: Start training after this many seconds.
  • Returns: The trained estimator.


evaluate(self, delay_secs=None)

Evaluate on the evaluation data.

Runs evaluation on the evaluation data and returns the result. Runs for self._eval_steps steps, or if it's None, then run until input is exhausted or another exception is raised. Start the evaluation after delay_secs seconds, or if it's None, defaults to using self._eval_delay_secs seconds.

  • Args:

    • delay_secs: Start evaluating after this many seconds. If None, defaults to using self._eval_delays_secs.
  • Returns: The result of the evaluate call to the Estimator.


continuous_eval(self, delay_secs=None, throttle_delay_secs=None, evaluate_checkpoint_only_once=True, continuous_eval_predicate_fn=None)


continuous_eval_on_train_data(self, delay_secs=None, throttle_delay_secs=None, continuous_eval_predicate_fn=None)



Interleaves training and evaluation.

The frequency of evaluation is controlled by the constructor arg eval_every_n_steps. When this parameter is None or 0, evaluation happens only after training has completed. Note that evaluation cannot happen more frequently than checkpoints are taken. If no new snapshots are available when evaluation is supposed to occur, then evaluation doesn't happen for another eval_every_n_steps steps (assuming a checkpoint is available at that point). Thus, settings eval_every_n_steps to 1 means that the model will be evaluated everytime there is a new checkpoint.

This is particular useful for a "Master" task in the cloud, whose responsibility it is to take checkpoints, evaluate those checkpoints, and write out summaries. Participating in training as the supervisor allows such a task to accomplish the first and last items, while performing evaluation allows for the second.

  • Returns: The result of the evaluate call to the Estimator as well as the export results using the specified ExportStrategy.


continuous_train_and_evaluate(self, continuous_eval_predicate_fn=None)

Interleaves training and evaluation.

The frequency of evaluation is controlled by the train_steps_per_iteration (via constructor). The model will be first trained for train_steps_per_iteration, and then be evaluated in turns.

This differs from train_and_evaluate as follows: 1. The procedure will have train and evaluation in turns. The model will be trained for a number of steps (usuallly smaller than train_steps if provided) and then be evaluated. train_and_evaluate will train the model for train_steps (no small training iteraions).

2. Due to the different approach this schedule takes, it leads to two
differences in resource control. First, the resources (e.g., memory) used
by training will be released before evaluation (`train_and_evaluate` takes
double resources). Second, more checkpoints will be saved as a checkpoint
is generated at the end of each small trainning iteration.
  • Args:

    • continuous_eval_predicate_fn: A predicate function determining whether to continue after each iteration. predicate_fn takes the evaluation results as its arguments. At the beginning of evaluation, the passed eval results will be None so it's expected that the predicate function handles that gracefully. When predicate_fn is not specified, this will run in an infinite loop or exit when global_step reaches train_steps.
  • Returns: A tuple of the result of the evaluate call to the Estimator and the export results using the specified ExportStrategy.

  • Raises:

    • ValueError: if continuous_eval_predicate_fn is neither None norcallable.



Starts a TensorFlow server and joins the serving thread.

Typically used for parameter servers.

  • Raises:
    • ValueError: if not enough information is available in the estimator's config to create a server.



Tests training, evaluating and exporting the estimator for a single step.

  • Returns: The result of the evaluate call to the Estimator.



Creates a new reinforcement learning Experiment instance.

  • Args:
    • experiment_config: the config to use for creating the experiment.