— 9 min read

Introduction to Polyaxon

This post tries to answer a couple of practical questions about the project and the intent behind it’s creation.

What is Polyaxon

Polyaxon is a library and a platform built on top of TensorFlow for building deep learning and reinforcement learning experiments. It provides popular DL and RL modules that can be easily customized and assembled for creating real-world machine learning models.

Why another TensorFlow library

In order to have an end to end system and a platform for managing experiments, the choice of creating a new library was very important, we wanted to to have complete control over the apis and future design decisions. We tried before to base the platform on Keras and/or tf.contrib.learn, but it was not easy, we were brought to tweak many parts and we decided to create a new library. In fact, if you have a chance to read the code, you will find that many programming decisions were completely inspired by these libraries, and sometimes we only patched/mirrored the code of these libraries.

Having said that, the use of Polyaxon does not exclude using other libraries. So every module created by Polyaxon could be mixed and extended by other libraries.

How can I start using Polyaxon

Polyaxon is very modular and provides a simple interface for creating deep learning algorithms.

There are 2 ways you can follow to use Polyaxon:

  • By creating Python objects in a programatic way. Polyaxon provides high level Layers and Models based on GraphModules, a high level object that easily share variables.

  • By creating configuration files (json or yaml). In this case, the library is meant to be used in applications that want to utilize an intelligent system to achieve some task. This configurations would then be customized through an interface, web/cli, and would enable the user to easily tweak the parameters without caring about all the underlying implementation.

An example of graph defined in a Json file:

  1 {
  2    "name":"MyModel",
  3    "output_dir":"/tmp/polyaxon-log/mymodel",
  4    "eval_every_n_steps":10,
  5    "train_steps_per_iteration":1000,
  6    "run_config":{
  7       "save_checkpoints_steps":100
  8    },
  9    "train_input_data_config":{
 10       "pipeline_config":{
 11          "module":"TFRecordImagePipeline",
 12          "batch_size":64,
 13          "num_epochs":5,
 14          "shuffle":true,
 15          "dynamic_pad":false,
 16          "params":{
 17             "data_files":"/path-to-my-data/mnist_train.tfrecord",
 18             "meta_data_file":"/path-to-my-data/meta_data.json"
 19          }
 20       }
 21    },
 22    "eval_input_data_config":{
 23       "pipeline_config":{
 24          "module":"TFRecordImagePipeline",
 25          "batch_size":32,
 26          "num_epochs":1,
 27          "shuffle":true,
 28          "dynamic_pad":false,
 29          "params":{
 30             "data_files":"/path-to-my-data/mnist_eval.tfrecord",
 31             "meta_data_file":"/path-to-my-data/meta_data.json"
 32          }
 33       }
 34    },
 35    "estimator_config":{
 36       "output_dir":"/tmp/polyaxon-log/mymodel"
 37    },
 38    "model_config":{
 39       "summaries":[
 40          "loss",
 41          "gradients",
 42          "activations"
 43       ],
 44       "module":"Classifier",
 45       "loss_config":{
 46          "module":"softmax_cross_entropy"
 47       },
 48       "eval_metrics_config":[
 49          {
 50             "module":"streaming_true_positives"
 51          },
 52          {
 53             "module":"streaming_true_negatives"
 54          },
 55          {
 56             "module":"streaming_false_positives"
 57          },
 58          {
 59             "module":"streaming_false_negatives"
 60          },
 61          {
 62             "module":"streaming_recall"
 63          },
 64          {
 65             "module":"streaming_auc"
 66          },
 67          {
 68             "module":"streaming_accuracy"
 69          },
 70          {
 71             "module":"streaming_precision"
 72          }
 73       ],
 74       "optimizer_config":{
 75          "module":"adam",
 76          "learning_rate":0.007,
 77          "decay_type":"exponential_decay",
 78          "decay_rate":0.2
 79       },
 80       "one_hot_encode":true,
 81       "n_classes":10,
 82       "graph_config":{
 83          "name":"lenet",
 84          "features":[
 85             "image"
 86          ],
 87          "definition":[
 88             [
 89                "Conv2d",
 90                {
 91                   "regularizer":"l2_regularizer",
 92                   "num_filter":32,
 93                   "filter_size":5,
 94                   "strides":1
 95                }
 96             ],
 97             [
 98                "MaxPool2d",
 99                {
100                   "kernel_size":2
101                }
102             ],
103             [
104                "Conv2d",
105                {
106                   "num_filter":64,
107                   "filter_size":5,
108                   "regularizer":"l2_regularizer"
109                }
110             ],
111             [
112                "MaxPool2d",
113                {
114                   "kernel_size":2
115                }
116             ],
117             [
118                "FullyConnected",
119                {
120                   "num_units":1024,
121                   "activation":"tanh"
122                }
123             ],
124             [
125                "FullyConnected",
126                {
127                   "num_units":10
128                }
129             ]
130          ]
131       }
132    }
133 }

An example of graph defined in a Python file:

 1 import polyaxon as plx
 2 import tensorflow as tf
 3 
 4 env = plx.envs.GymEnvironment('CartPole-v0')
 5 
 6 def graph_fn(mode, inputs):
 7     return plx.layers.FullyConnected(mode, num_units=512)(inputs['state'])
 8 
 9 def model_fn(features, labels, mode):
10     model = plx.models.DDQNModel(
11         mode,
12         graph_fn=graph_fn,
13         loss_config=plx.configs.LossConfig(module='huber_loss'),
14         num_states=env.num_states,
15         num_actions=env.num_actions,
16         optimizer_config=plx.configs.OptimizerConfig(module='sgd', learning_rate=0.01),
17         exploration_config=plx.configs.ExplorationConfig(module='decay'),
18         target_update_frequency=10,
19         summaries='all')
20     return model(features, labels)
21 
22 memory = plx.rl.memories.Memory()
23 estimator = plx.estimators.Agent(
24   model_fn=model_fn, memory=memory, model_dir="/tmp/polyaxon_logs/ddqn_cartpole")
25 
26 estimator.train(env)

What do you support currently in Polyaxon

The current Polyaxon version offers a very rich set of modular layers, deep learning models and reinforcement learning models, here and here, estimators/agents to construct your algorithms with.

For training and evaluating an algorithm, you can use Polyaxon built-in experiments or use custom scripts with training loops and evaluation.

Can I train my models in a distributed way

Yes, as it was mentioned before, the project was initially built on top of tf.contrib.learn, but the API was constantly changing and breaking, and this is why we decided to roll our own estimator and experiment modules, with the idea that in the future we can interchangeably use tensorflow’s built in estimators/experiments when they become stable.

How can I deploy and monitor my algorithms

First we hope that you might find a useful use-case for your projects. We are currently working on getting an architecture which will allow to easily manage, monitor, and compare experiments. And finally deploy an API to the cloud. We are also trying different approaches and concepts in an internal version to have a consistent, simple and stable way to create useful application of deep learning to real world problems.