utils

get_global_episode

get_global_episode(graph=None)

get_or_create_global_episode

get_or_create_global_episode(graph=None)

create_global_episode

create_global_episode(graph=None)

get_global_timestep

get_global_timestep(graph=None)

get_or_create_global_timestep

get_or_create_global_timestep(graph=None)

create_global_timestep

create_global_timestep(graph=None)

get_global_counter

get_global_counter(collection, name, graph=None)

Get the global counter tensor.

The global counter tensor must be an integer variable. We first try to find it in the collection, or by name.

  • Args:

    • collection: the counter's collection.
    • name: the counter's name.
    • graph: The graph to find the global counter in. If missing, use default graph.
  • Returns: The global counter variable, or None if none was found.

  • Raises:

    • TypeError: If the global counter tensor has a non-integer type, or if it is not a Variable.

get_or_create_global_counter

get_or_create_global_counter(collection, name, graph=None)

Returns and create (if necessary) the global counter tensor.

  • Args:

    • collection: the counter's collection.
    • name: the counter's name.
    • graph: The graph in which to create the global counter tensor. If missing, use default graph.
  • Returns: The global counter tensor.


create_global_counter

create_global_counter(collection, name, graph=None)

Create global counter tensor in graph.

  • Args:

    • collection: the counter's collection.
    • name: the counter's name.
    • graph: The graph in which to create the global counter tensor. If missing, use default graph.
  • Returns: Global step tensor.

  • Raises:

    • ValueError: if global counter tensor is already defined.

assert_global_counter

assert_global_counter(global_counter_tensor)

Asserts global_counter_tensor is a scalar int Variable or Tensor.

  • Args:
    • global_counter_tensor: Tensor to test.

get_cumulative_rewards

get_cumulative_rewards(reward, done, discount=0.99)

compute cumulative rewards R(s,a) (a.k.a. G(s,a) in Sutton '16)

R_t = r_t + gamma*r_{t+1} + gamma^2*r_{t+2} + ...

The simple way to compute cumulative rewards is to iterate from last to first time tick and compute R_t = r_t + gamma*R_{t+1} recurrently

  • Args:
    • reward: list. A list of immediate rewards r(s,a) for the passed episodes.
    • done: list. A list of terminal states for the passed episodes.
    • discount: float. The discount factor.

conjugate_gradient

conjugate_gradient(fn, b, iterations=50, residual_tolerance=1e-10)

Conjugate gradient solver.

  • Args:

    • fn: Ax of Ax=b
    • b: b in Ax = b
  • Returns: Approximate solution to linear system.


line_search(fn, initial_x, full_step, expected_improve_rate, max_backtracks=10, accept_ratio=0.1)

Backtracking line search, where expected_improve_rate is the slope dy/dx at the initial.