RLPGModels

[source]

VPGModel

polyaxon.models.rl.vpg.VPGModel(mode, graph_fn, num_states, num_actions, loss_config=None, optimizer_config=None, eval_metrics_config=None, is_deterministic=False, is_continuous=False, summaries='all', clip_gradients=0.5, clip_embed_gradients=0.1, name='Model')

Implements a vanilla policy gradient model - Args: - mode: str, Specifies if this training, evaluation or prediction. See Modes. - graph_fn: Graph function. Follows the signature: * Args: * mode: Specifies if this training, evaluation or prediction. See Modes. * inputs: the feature inputs. - loss_config: An instance of LossConfig. - num_states: int. The number of states. - num_actions: int. The number of actions. - optimizer_config: An instance of OptimizerConfig. Default value Adam. - eval_metrics_config: a list of MetricConfig instances. - is_continuous: bool. Is the model built for a continuous or discrete space. - summaries: str or list. The verbosity of the tensorboard visualization. Possible values: all, activations, loss, learning_rate, variables, gradients - clip_gradients: float. Gradients clipping by global norm. - clip_embed_gradients: float. Embedding gradients clipping to a specified value. - name: str, the name of this model, everything will be encapsulated inside this scope.

  • Returns: EstimatorSpec

[source]

TRPOModel

polyaxon.models.rl.base.TRPOModel(mode, graph_fn, num_states, num_actions, loss_config=None, optimizer_config=None, eval_metrics_config=None, is_deterministic=False, is_continuous=False, summaries='all', clip_gradients=0.5, clip_embed_gradients=0.1, name='Model')

Implements a trust region policy optimization model - Args: - mode: str, Specifies if this training, evaluation or prediction. See Modes. - graph_fn: Graph function. Follows the signature: * Args: * mode: Specifies if this training, evaluation or prediction. See Modes. * inputs: the feature inputs. - loss_config: An instance of LossConfig. - num_states: int. The number of states. - num_actions: int. The number of actions. - optimizer_config: An instance of OptimizerConfig. Default value Adam. - eval_metrics_config: a list of MetricConfig instances. - is_continuous: bool. Is the model built for a continuous or discrete space. - summaries: str or list. The verbosity of the tensorboard visualization. Possible values: all, activations, loss, learning_rate, variables, gradients - clip_gradients: float. Gradients clipping by global norm. - clip_embed_gradients: float. Embedding gradients clipping to a specified value. - name: str, the name of this model, everything will be encapsulated inside this scope.

  • Returns: EstimatorSpec