Polyaxon v0.4: Integrations, Machine Learning CI, Better support of external repos, Better support of cloud storages...

For over a year now, Polyaxon has been delivering software that enables many teams and organizations to be more productive, iterate faster on their research and ideas, and ship robust models to production.

So what’s new in Polyaxon v0.4

Apart from the efforts and work that has been done to make the core components of Polyaxon more secure and stable, i.e. robust orchestration and scheduling, optimization, tracking, and visualization, The new v0.4 release introduces:

Integrations and an abstraction plugin, allowing Polyaxon to work almost with any and all systems that data science teams use in their daily work.
An alpha release of a CI system, to automate training of experiments based on external events.
A private Beta of Polyaxon tracking, a system running on any container platform to track machine learning experiments.
An improved support of cloud storages i.e. S3, GCS, and Azure Storage, in addition to mounted volumes, for accessing data and storing and managing experiments’ and jobs’ outputs and logs.
Experiment Group Selections for creating a selection of experiments to compare and document.
Beta release of scheduling TPUs.
Improved build process with support of different backends, support of dockerfiles, support of contexts, and conda environments.
Archiving and restoring experiments, jobs, projects, …
Improved support of external repos on Github, Gitlab, and Bitbucket.
Improved Jupyter Notebooks with possibility to switch the backend to Jupyter Labs.
Improved dashboard with an objective to allow users to do most of the interactions directly in the dashboard in the v0.5.
Alpha support of external registries to manage build images.

Integrations

feature-1

Polyaxon v0.4 includes a lot of integrations, for scheduling, automation, storage, experimentation, visualization, …

Polyaxon’s integrations is an interface that the platform exposes to simplify the users’s workflow, our objective is to turn these integration into actionable and more granular steps in the data scientists workflow (events -> actions).

You can check the list of integrations that the platform currently supports.

Polyaxon CI

feature-2

Polyaxon v0.4 introduces a new feature, PolyaxonCI. This CI system automates training of experiments based on external events, e.g. git commits.

Since the platform relies on the Polyaxonfile specification to turn a code repo into an experiment, a hyperparameters tuning, a job, a build, a notebook, or a tensorboard, … It was natural to take advantage of this specification to bring more automation to the data scientists’ workflow.

The PolyaxonCI can both work with the in-cluster git server as well as the external code platform such as Github, GitLab, and Bitbucket.

Once a user enable the PolyaxonCI on a project, every time there’s new commit for instance, the platform will look for the specification file and start experiments based on that file.

And since the platform is in the process of exposing different integrations and an event/action abstraction, the PolyaxonCI will be exposing a configuration to customize its behaviour to subscribe to more events.

Polyaxon Tracking

feature-3

Polyaxon exposes a rich tracking API, coupled with a user management system, a knowledge center for insight, and a dashboard. It was natural to decouple this part of Polyaxon and make it a standalone deployable software capable of running on any container platform, docker, docker compose, nomad, Kubernetes…

The reason we made this change, is because many teams are still unable to run or manage a Kubernetes cluster, in some other situations, they rely on a different system for orchestrating containers. Polyaxon can work with different environments to track experiments and report their results to a dashboard where teams can analyze their performance.

Experiment Group Selection

feature-4

feature-5

Polyaxon provides 2 experiment group types: Experiment groups for running hyperparameters tuning, and selections.

Users who are using Polyaxon are already familiar with the built-in optimization engine that the platform exposes, i.e. grid search, random search, hyperband, and bayesian optimization.

Polyaxon, now, provides a different type of grouping of experiments to compare them and measure their performance.

Archiving & Restoring projects, experiments, jobs, builds

Users can have the choice between immediately deleting items or scheduling them for deletion by archiving them.

feature-6

When an item is archived, the user can always restore it, also the archiving period is configurable.

feature-8

Better build process

Building containers is one of the most important steps in Polyaxon, we have received a lot of feedback, requests, and concerns about how to improve it.

In Polyaxon v0.4, the build process has been improved in terms of performance, security, and extensibility.

The build process detects multiple types of environments, pip, conda, dockerfiles, and supports 2 backends; a native build process and Kaniko, with the possibility to support other systems such as https://github.com/containers/buildah, https://github.com/moby/buildkit.

feature-9

Notebooks

We improved the Notebook experience on Polyaxon:

Users can set a default docker image to avoid creating polyaxonfile in order to start notebooks.
We added support for both Jupyter Lab and Jupyter Notebook as backends.
We improved the tracking of experiments running on Notebooks.
We are working on adding support of Papermill.

feature-10

Other notable changes

feature-11

The list of improvements is still long, but one thing that we are continuously working on is the documentation. We are trying to make the experience of deploying and using Polyaxon much better, and we think that providing a better documentation experience is important, we restructured our references, and created new sections to expose different aspects of the platform’s functionalities.

What to expect in the v0.5

One of the things that the v0.5 will be focusing on is extending the list of integrations, we are working on many different extensions to make an end-to-end machine learning process possible for every team using Polyaxon, while keeping all our interfaces as simple as they are now. We believe that data scientists should be able to work and iterate on their models as fast as possible without the need to learn about dev ops or how other systems works. We would like to make Polyaxon more hackable by other users, and not only advanced users with K8S and engineering backgrounds.

One of the aspects that the platform has been excelling at until now is streamlining the process of training experiments. Although the current version of Polyaxon exposes the generic job primitive, its usage is very limited. Moving forward Polyaxon will be trying to simplify and automate, the whole process of going from raw data, preprocessing, experimentation, deployed model, and then the feedback loop, by exposing native processes as well as integrating with projects in the open source community.

Finally, by supporting multiple storage backends, we noticed that the users are still having issues referencing their datasets and other experiments’ and jobs’ outputs. Although the tracking API allows to calculate a hash of the datasets to detect when it has changed, we think that by the time it was scheduled it was already too late, we are working on extending out knowledge center to better handle meta data about datasets and experiments outputs, artifacts, and models. Not only this is important for simplifying the deployment process, but we believe also that this is an important aspect to ensure reproducibility.

Conclusion

Polyaxon will keep improving and providing the simplest machine learning layer on top of Kubernetes as well as other container platforms. We hope that these updates will improve your workflows and increase your productivity, and again, thank you for your continued feedback and support.