polyaxon.polyflow.run.kubeflow.pytorch_job.V1PytorchJob(kind='pytorch_job', clean_pod_policy=None, scheduling_policy=None, master=None, worker=None)
Kubeflow Pytorch-Job provides an interface to train distributed experiments with Pytorch.
run: kind: pytorchjob cleanPodPolicy: schedulingPolicy: master: worker:
from polyaxon.polyflow import V1KFReplica, V1PytorchJob from polyaxon.k8s import k8s_schemas pytorch_job = V1PytorchJob( clean_pod_policy='All', master=V1KFReplica(...), worker=V1KFReplica(...), )
The kind signals to the CLI, client, and other tools that this component's runtime is a pytorchjob.
If you are using the python client to create the runtime, this field is not required and is set by default.
run: kind: pytorchjob
Controls the deletion of pods when a job terminates.
The policy can be one of the following values: [
run: kind: pytorchjob cleanPodPolicy: 'All' ...
SchedulingPolicy encapsulates various scheduling policies of the distributed training
job, for example
minAvailable for gang-scheduling.
run: kind: pytorchjob schedulingPolicy: ... ...
The master replica in the distributed PytorchJob
run: kind: pytorchjob master: replicas: 1 container: ... ...
The workers do the actual work of training the model.
run: kind: pytorchjob worker: replicas: 3 container: ... ...