Polyaxon allows to schedule Tensorflow experiments and distributed Tensorflow experiments, and supports tracking metrics, outputs, and models.
With Polyaxon you can:
- log hyperparameters for every run
- see learning curves for losses and metrics during training
- see hardware consumption and stdout/stderr output during training
- log images, charts, and other assets
- log git commit information
- log env information
- log model
- …
Tracking API
Polyaxon provides a tracking API to track experiment and report metrics, artifacts, logs, and results to the Polyaxon dashboard.
You can use the tracking API to create a custom tracking experience with Tensorflow.
Setup
In order to use Polyaxon tracking with Tensorflow, you need to install Polyaxon library
pip install polyaxon
Initialize your script with Polyaxon
This is an optional step if you need to perform some manual tracking or to track some information before passing the callback.
from polyaxon import tracking
tracking.init(...)
Tensorflow Callback
Polyaxon provides a Tensorflow callback, you can use this callback with your experiment to report metrics automatically
from polyaxon.tracking.contrib.tensorflow import PolyaxonCallback
...
estimator.train(hooks=[PolyaxonCallback(...)])
...
Customizing the callback
Polyaxon’s callback can be customized to alter the default behavior:
- It will use the current initialized run unless you pass a different run
- You can enable images logging
- You can enable histograms logging
- You can enable tensors logging
PolyaxonCallback(run=run, log_image=True, log_histo=True, log_tensor=True)
Manual logging
If you want to have more control and use Polyaxon to log metrics in your custom TensorFlow training loops:
from polyaxon import tracking
with tf.GradientTape() as tape:
# Get the probabilities
predictions = model(features)
# Calculate the loss
loss = loss_func(labels, predictions)
# Log your metrics
tracking.log_metrics(loss=loss.numpy())
Logging the model
To make sure the model is uploaded to your artifacts store, you can pass get_outputs_path("model_rel_path", is_dir=True)
to your checkpoint dir:
from polyaxon import tracking
...
tracking.init()
...
model_dir = tracking.get_outputs_path("model", is_dir=True)
classifier = tf.estimator.LinearClassifier(
model_dir=model_dir,
feature_columns=[...],
n_classes=2
)
tracking.log_model_ref(model_dir, framework="tensorflow", ...)
...
classifier.train(input_fn=train_input_fn, steps=100000, hooks=[PolyaxonCallback()])
...