Explorations

constant

constant(value=0.5)

Builds a constant exploration.

  • Args:

    • value: float. The exploratoin constant to use.
  • Returns: function the exploration function logic.


greedy

greedy()

Builds a greedy exploration. (never selects random values, i.e. random() < 0 == False).

  • Returns: function the exploration function logic.

random

random()

Builds a random exploration (always selects random values, i.e. random() < 1 == True).

  • Returns: function the exploration function logic.

decay

decay(exploration_rate=0.15, decay_type='polynomial_decay', start_decay_at=0, stop_decay_at=1000000000.0, decay_rate=0.0, staircase=False, decay_steps=100000, min_exploration_rate=0)

Builds a decaying exploration.

Decay epsilon based on number of states and the decay_type.

  • Args:

    • exploration_rate: float or list of float. The initial value of the exploration rate.
    • decay_type: A decay function name defined in exploration_decay possible Values: exponential_decay, inverse_time_decay, natural_exp_decay, piecewise_constant, polynomial_decay.
    • start_decay_at: int. When to start the decay.
    • stop_decay_at: int. When to stop the decay.
    • decay_rate: A Python number. The decay rate.
    • staircase: Whether to apply decay in a discrete staircase, as opposed to continuous, fashion.
    • decay_steps: How often to apply decay.
    • min_exploration_rate: float. Don't decay below this number.
  • Returns: function the exploration logic operation.


random_decay

random_decay(num_actions=None, decay_type='polynomial_decay', start_decay_at=0, stop_decay_at=1000000000.0, decay_rate=0.0, staircase=False, decay_steps=10000, min_exploration_rate=0)

Builds a random decaying exploration.

Decay a random value based on number of states and the decay_type.

  • Args:

    • num_actions: int or None. If discrete num_action must be None.
    • decay_type: A decay function name defined in exploration_decay possible Values: exponential_decay, inverse_time_decay, natural_exp_decay, piecewise_constant, polynomial_decay.
    • start_decay_at: int. When to start the decay.
    • stop_decay_at: int. When to stop the decay.
    • decay_rate: A Python number. The decay rate.
    • staircase: Whether to apply decay in a discrete staircase, as opposed to continuous, fashion.
    • decay_steps: How often to apply decay.
    • min_exploration_rate: float. Don't decay below this number.
  • Returns: function the exploration logic operation.


ornsteinuhlenbeck_process

ornsteinuhlenbeck_process(num_actions, sigma=0.3, mu=0, theta=0.15)

Builds an exploration based on the Ornstein-Uhlenbeck process

The process adds time-correlated noise to the actions taken by the deterministic policy. The OU process satisfies the following stochastic differential equation: dxt = theta*(mu - xt)*dt + sigma*dWt, where Wt denotes the Wiener process.