polyaxon.polyflow.run.kubeflow.mpi_job.V1MPIJob(kind='mpi_job', clean_pod_policy=None, slots_per_worker=None, launcher=None, worker=None)
Kubeflow MPI-Job provides an interface to train distributed experiments with Pytorch.
run: kind: mpijob cleanPodPolicy: slots_per_worker: launcher: worker:
from polyaxon.polyflow import V1KFReplica, V1MPIJob from polyaxon.k8s import k8s_schemas mpi_job = V1MPIJob( clean_pod_policy='All', launcher=V1KFReplica(...), worker=V1KFReplica(...), )
The kind signals to the CLI, client, and other tools that this component's runtime is a mpijob.
If you are using the python client to create the runtime, this field is not required and is set by default.
run: kind: mpijob
Controls the deletion of pods when a job terminates.
The policy can be one of the following values: [
run: kind: mpijob cleanPodPolicy: 'All' ...
The launcher replica in the distributed mpijob
run: kind: mpijob master: replicas: 1 container: ... ...
The workers do the actual work of training the model.
run: kind: mpijob worker: replicas: 3 container: ... ...