Polyaxon supports running Ray and Dask clusters for distributed computing workloads via Kubernetes operators.
By default, Polyaxon does not deploy the operators required for running clusters to keep the deployment process lightweight. In order to use a cluster operator, you need to make sure that your namespace/cluster has the operator deployed or you should deploy the operator(s) before starting an execution.
When you start a cluster, Polyaxon will stream and archive logs from all replicas.
Cluster operators
In order to run RayCluster/DaskCluster you will need to deploy their operators before scheduling a cluster from Polyaxon.
Ray
In order to run Ray clusters you will need to deploy the KubeRay operator before scheduling a cluster from Polyaxon.
version: 1.1
kind: component
name: ray-cluster
run:
kind: raycluster
rayVersion: '2.9.0'
head:
container:
image: rayproject/ray:2.9.0
workers:
small:
replicas: 2
minReplicas: 1
maxReplicas: 3
container:
image: rayproject/ray:2.9.0The same example in Python:
from polyaxon.schemas import V1Component, V1RayCluster, V1RayReplica
from polyaxon import k8s
ray_cluster = V1RayCluster(
ray_version="2.9.0",
head=V1RayReplica(
container=k8s.V1Container(image="rayproject/ray:2.9.0"),
),
workers={
"small": V1RayReplica(
replicas=2,
min_replicas=1,
max_replicas=3,
container=k8s.V1Container(image="rayproject/ray:2.9.0"),
),
},
)
component = V1Component(name="ray-cluster", run=ray_cluster)RayCluster with Autoscaling
version: 1.1
kind: component
name: ray-cluster-autoscaling
run:
kind: raycluster
rayVersion: '2.9.0'
enableInTreeAutoscaling: true
head:
rayStartParams:
dashboard-host: '0.0.0.0'
container:
image: rayproject/ray:2.9.0
resources:
limits:
cpu: "1"
memory: 2Gi
requests:
cpu: 500m
memory: 1Gi
workers:
cpu-workers:
replicas: 2
minReplicas: 1
maxReplicas: 5
container:
image: rayproject/ray:2.9.0
resources:
limits:
cpu: "1"
memory: 2Gi
requests:
cpu: 500m
memory: 1GiRayCluster with Runtime Environment
version: 1.1
kind: component
name: ray-cluster-runtime
run:
kind: raycluster
rayVersion: '2.9.0'
entrypoint: python train.py
runtimeEnv:
pip: ["requests==2.26.0", "pandas==2.0.0"]
env_vars:
MY_ENV_VAR: "value"
head:
container:
image: rayproject/ray:2.9.0
workers:
default:
replicas: 2
container:
image: rayproject/ray:2.9.0Specification
Please check the RayCluster guide to learn about all details for running Ray clusters in Polyaxon.
Dask
In order to run Dask clusters you will need to deploy the dask-kubernetes operator before scheduling a cluster from Polyaxon.
version: 1.1
kind: component
name: dask-cluster
run:
kind: daskcluster
worker:
replicas: 2
container:
image: ghcr.io/dask/dask:latest
scheduler:
container:
image: ghcr.io/dask/dask:latestThe same example in Python:
from polyaxon.schemas import V1Component, V1DaskCluster, V1DaskReplica
from polyaxon import k8s
dask_cluster = V1DaskCluster(
worker=V1DaskReplica(
replicas=2,
container=k8s.V1Container(image="ghcr.io/dask/dask:latest"),
),
scheduler=V1DaskReplica(
container=k8s.V1Container(image="ghcr.io/dask/dask:latest"),
),
)
component = V1Component(name="dask-cluster", run=dask_cluster)DaskCluster with Autoscaling
version: 1.1
kind: component
name: dask-cluster-autoscaling
run:
kind: daskcluster
minReplicas: 1
maxReplicas: 10
worker:
replicas: 2
container:
image: ghcr.io/dask/dask:latest
resources:
requests:
memory: 2Gi
cpu: 1
limits:
memory: 4Gi
cpu: 2
scheduler:
container:
image: ghcr.io/dask/dask:latest
resources:
requests:
memory: 1Gi
cpu: 500mSpecification
Please check the DaskCluster guide to learn about all details for running Dask clusters in Polyaxon.