Polyaxon schedules workload on Kubernetes, which means you can enable GPU, TPU, or any other resource supported in your cluster for running your operations.
Prerequisites
If you have not seen this article about node scheduling, we suggest that you check it out for more details about the options provided to select which nodes should be used for running your operations.
Using GPUs
To enable GPUs for your operations, you just need to set the GPU limits/requests on the container resources section, similar to Kubernetes.
run:
kind: job
container:
...
resources:
limits:
nvidia.com/gpu: "2"
...
In Python
from polyaxon import k8s
container = k8s.V1Container(
name="job",
image="busybox:1.28",
resources=k8s.V1ResourceRequirements(requests={"nvidia.com/gpu": "2"}),
command=['sh', '...']
)
If the cluster has multiple node pools with different GPU types, you can specify the GPU type by using a node selector in the environment section, e.g. GKE:
environment:
nodeSelector:
cloud.google.com/gke-accelerator: nvidia-tesla-p4
...
run:
kind: job
container:
...
resources:
limits:
nvidia.com/gpu: "2"
...
In Python
from polyaxon.schemas import V1Environment
from polyaxon import k8s
environment = V1Environment(node_selector={"cloud.google.com/gke-accelerator": "nvidia-tesla-p4"})
container = k8s.V1Container(
name="job",
image="busybox:1.28",
resources=k8s.V1ResourceRequirements(requests={"nvidia.com/gpu": "2"}),
command=['sh', '...']
)
Note: You might need to install NVIDIA Device Plugin.
Using TPUs
To use TPUs for your Polyaxon workload on GKE, you just need to set the TPU limits/requests on the container resources section, similar to Kubernetes, additionally you need to set the required annotations on the environment section.
environment:
annotations:
tf-version.cloud-tpus.google.com: "1.12"
...
container:
...
resources:
limits:
cloud-tpus.google.com/v2: "8"
...
In Python
from polyaxon.schemas import V1Environment
from polyaxon import k8s
environment = V1Environment(annotations={"tf-version.cloud-tpus.google.com": "1.12"})
container = k8s.V1Container(
name="job",
image="busybox:1.28",
resources=k8s.V1ResourceRequirements(limits={"cloud-tpus.google.com/v2": "8"}),
command=['sh', '...']
)
Using other resources
If your cluster has special resources schedulable with Kubernetes, you can use them with Polyaxon, for instance AMD GPU.
Sharing GPUs
Aliyun provides a plugin for GPU sharing. You need to deploy the plugin and use the aliyun.com/gpu-mem
instead of the default nvidia.com/gpu
.
For more details please check this user guide.