Overview

Polyaxon allows to resume, restart, and restart with copy mode previous operation runs with or without modification of the original manifest.

Resuming operations

Polyaxon provides the possibility to resume an already stopped operation (experiment, job, or a service).

polyaxon ops resume

The way Polyaxon resumes operations is by automatically loading all artifacts generated by the previous run and by using similar configuration that was used for scheduling the original operation.

Users can also resume an operation with an updated environment or parameters by providing a preset(s). The CLI, Client, and UI provide a way to pass presets to submit with the resume API call.

polyaxon ops resume -f preset1.yaml -f preset2.yaml

If you use an operation for training a machine learning or deep learning model, your code must have the logic to handle check-pointing and resuming the training.

Restarting operations

Sometimes resuming is not an option, or you may want to preserve the original operation's run record in an immutable fashion. In that case restarting an operation is a better option, Polyaxon by default will restart the operation with the same configuration:

polyaxon ops restart

Users can also restart an operation with an updated environment or parameters by providing a preset(s). The CLI, Client, and UI provide a way to pass presets to submit with the restart API call.

polyaxon ops restart -f preset1.yaml -f preset2.yaml

Restarting operations with copy mode

In order to restart an operation and automatically load the artifacts generated by the original run, users can enable the copy mode:

polyaxon ops restart --copy

The copy mode works as well with presets and can be enabled by the CLI, Client, and UI:

polyaxon ops restart -f preset1.yaml -f preset2.yaml --copy