Polyaxon provides a list of options to select which nodes should be used for running operations.
Every component in Polyaxon can set an environment section which exposes many pod level options.
Component’s environment section can be patched by the operation to override the default environment section per execution.
The environment section can be used as well to configure a particular job of a distributed experiment on a specific node, every replica of a distributed job comes with an environment section.
Node Name
The simplest form of node selection constraint, but due to its limitations it is typically not used.
environment:
nodeName:
Node Selector
Node selector is the simplest recommended form of node selection constraint.
environment:
nodeSelector:
For example, if you have some GPU nodes, you may want to only use them for training your experiments. In this case you should label your nodes:
kubectl label nodes <node-name> <label-key>=<label-value>
And use that label for running experiments.
Example:
kubectl label nodes worker_1 worker_2 polyaxon.com=experiments
And then in your Polyaxonfile
environment:
nodeSelector:
polyaxon.com: experiments
This will force Polyaxon to schedule this particular experiment on the specific node(s).
Tolerations
Tolerations are applied to pods, and allow (but do not require) the pods to schedule onto nodes with matching taints.
Similar to node selector, it’s very easy to provide tolerations in the Polyaxonfile:
environment:
tolerations:
...
Affinity
The affinity/anti-affinity feature greatly expands the types of constraints you can express.
environment:
affinity:
podAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
...
Spot instances / Preemptible VMs
If you are using a cloud provider, you can leverage spot instances to reduce your ML training cost.
Configuring spot instances or preemptible VMs should follow similar guides provided by your cloud provider.
For example, following this guide from GKE, we can configure Polyaxon operations to use a preemptible VMs node pool.
...
environment:
nodeSelector:
cloud.google.com/gke-preemptible: "true"
...
In Python
from polyaxon.schemas import V1Environment
environment = V1Environment(annotations={"cloud.google.com/gke-preemptible": "true"})
Additionally, if you have a tainted node for preemptible VMs, you can configure a toleration to schedule to that node.
...
environment:
nodeSelector:
cloud.google.com/gke-preemptible: "true"
tolerations:
- key: cloud.google.com/gke-preemptible
operator: Equal
value: "true"
effect: NoSchedule
...
In Python
from polyaxon.schemas import V1Environment
environment = V1Environment(
annotations={"cloud.google.com/gke-preemptible": "true"},
tolerations=[{
"key": "cloud.google.com/gke-preemptible",
"operator": "Equal",
"value": "true",
"effect": "NoSchedule",
}],
)