Overview
Polyaxon allows to resume, restart, and restart with copy mode previous operation runs with or without modification of the original manifest.
Resuming operations
Polyaxon provides the possibility to resume an already stopped operation (experiment, job, or a service).
polyaxon ops resume
The way Polyaxon resumes operations is by automatically loading all artifacts generated by the previous run and by using similar configuration that was used for scheduling the original operation.
Users can also resume an operation with an updated environment or parameters by providing a preset(s). The CLI, Client, and UI provide a way to pass presets to submit with the resume API call.
polyaxon ops resume -f preset1.yaml -f preset2.yaml
If you use an operation for training a machine learning or deep learning model, your code must have the logic to handle check-pointing and resuming the training.
Restarting operations
Sometimes resuming is not an option, or you may want to preserve the original operation’s run record in an immutable fashion. In that case restarting an operation is a better option, Polyaxon by default will restart the operation with the same configuration:
polyaxon ops restart
Users can also restart an operation with an updated environment or parameters by providing a preset(s). The CLI, Client, and UI provide a way to pass presets to submit with the restart API call.
polyaxon ops restart -f preset1.yaml -f preset2.yaml
Restarting operations with copy mode
In order to restart an operation and automatically load the artifacts generated by the original run, users can enable the copy mode:
polyaxon ops restart --copy
The copy mode works as well with presets and can be enabled by the CLI, Client, and UI:
polyaxon ops restart -f preset1.yaml -f preset2.yaml --copy