V1MXJob
polyaxon._flow.run.kubeflow.mx_job.V1MXJob()
Kubeflow MXNet-Job provides an interface to train distributed experiments with MXNet.
- Args:
- kind: str, should be equal
mxjob
- clean_pod_policy: str, one of [
All
,Running
,None
] - scheduling_policy: V1SchedulingPolicy, optional
- mode: str, one of [
MXTrain
,MXTune
] - scheduler: V1KFReplica, optional
- server: V1KFReplica, optional
- worker: V1KFReplica, optional
- tuner: V1KFReplica, optional
- tuner_tracker: V1KFReplica, optional
- tuner_server: V1KFReplica, optional
- kind: str, should be equal
YAML usage
run:
kind: mxjob
cleanPodPolicy:
schedulingPolicy:
mode:
scheduler:
server:
worker:
tuner:
tunerTracker:
tunerServer:
Python usage
from polyaxon.schemas import V1KFReplica, V1MXJob
mx_job = V1MXJob(
clean_pod_policy='All',
scheduler=V1KFReplica(...),
server=V1KFReplica(...),
worker=V1KFReplica(...),
tuner=V1KFReplica(...),
)
Fields
kind
The kind signals to the CLI, client, and other tools that this component’s runtime is a mxjob.
If you are using the python client to create the runtime, this field is not required and is set by default.
run:
kind: mxjob
cleanPodPolicy
Controls the deletion of pods when a job terminates.
The policy can be one of the following values: [All
, Running
, None
]
run:
kind: mxjob
cleanPodPolicy: 'All'
...
schedulingPolicy
SchedulingPolicy encapsulates various scheduling policies of the distributed training
job, for example minAvailable
for gang-scheduling.
run:
kind: mxjob
schedulingPolicy:
...
...
mode
The kind of MXJob to schedule. Different mode may have different replicas.
run:
kind: mxjob
mode: 'MXTrain'
...
Scheduler
Ths scheduler replica in the distributed MXJob.
run:
kind: mxjob
scheduler:
replicas: 2
container:
...
...
server
The server replica in the distributed MXJob.
run:
kind: mxjob
server:
replicas: 2
container:
...
...
worker
The worker replica in the distributed MXJob.
run:
kind: mxjob
worker:
replicas: 2
container:
...
...
tuner
The tuner replica in the distributed MXJob.
run:
kind: mxjob
tuner:
replicas: 1
container:
...
...
tunerTracker
The tuner tracker replica in the distributed MXJob.
run:
kind: mxjob
tunerTracker:
replicas: 1
container:
...
...
tunerServer
The tuner server replica in the distributed MXJob.
run:
kind: mxjob
tunerServer:
replicas: 1
container:
...
...