Base RL PG Model



polyaxon.models.rl.base.BasePGModel(mode, graph_fn, num_states, num_actions, loss_config=None, optimizer_config=None, eval_metrics_config=None, is_deterministic=False, is_continuous=False, summaries='all', clip_gradients=0.5, clip_embed_gradients=0.1, name='Model')

Base reinforcement learning policy gradient model class.

  • Args:

    • mode: str, Specifies if this training, evaluation or prediction. See Modes.
    • graph_fn: Graph function. Follows the signature:
      • Args:
      • mode: Specifies if this training, evaluation or prediction. See Modes.
      • inputs: the feature inputs.
    • loss_config: An instance of LossConfig.
    • num_states: int. The number of states.
    • num_actions: int. The number of actions.
    • optimizer_config: An instance of OptimizerConfig. Default value Adam.
    • eval_metrics_config: a list of MetricConfig instances.
    • is_continuous: bool. Is the model built for a continuous or discrete space.
    • summaries: str or list. The verbosity of the tensorboard visualization. Possible values: all, activations, loss, learning_rate, variables, gradients
    • clip_gradients: float. Gradients clipping by global norm.
    • clip_embed_gradients: float. Embedding gradients clipping to a specified value.
    • name: str, the name of this model, everything will be encapsulated inside this scope.
  • Returns: EstimatorSpec



Create the chosen action w/o sampling.

If inference mode is used the, actions are chosen directly without sampling.


_build_distribution(self, values)



Create the new graph_fn based on the one specified by the user. - Returns: function. The graph function. The graph function must return a PGModelSpec.


_call_graph_fn(self, inputs)

Calls graph function.

Creates first one or two graph, i.e. train and target graphs. Return the optimal action given an exploration policy.

If is_dueling is set to True, then another layer is added that represents the state value.

  • Args:
    • inputs: Tensor or dict of tensors


_preprocess(self, features, labels)

Model specific preprocessing.

  • Args:
    • features: array, Tensor or dict. The environment states. if dict it must contain a state key.
    • labels: dict. A dictionary containing action, reward, advantage.