How to schedule runs on Polyaxon

Scheduling Strategies

Overview

Oftentimes, teams will have several environments with different resources and access from different users. Allocating operations to the right resources while ensuring a fair queueing is an important behavior, especially when you scale your workload.

Polyaxon provides several interfaces designed to achieve fairness when a limited resource is shared, for example, to prevent a hyperparameter tuning with large search space or parallel executions from consuming more cluster resources than other workflows and operations.

Features

Polyaxon provides several tools to:

Limit workflows from running a large number of concurrent operations.
Prioritize some important operations.
Route operations that require special resources to the right node(s), namespace, or cluster.
Split your workload over several nodes and clusters.

Concepts

There are several distinct features involved in the scheduling strategies:

Node scheduling: A feature that leverages the Kubernetes API to select nodes for running your operations.
Resources scheduling: A feature that leverages the Kubernetes API to enable GPU/TPU, or other special resources for your operations.
Queue concurrency: A feature to throttle the number of operations on a queue based on parallelism.
Queue resources and cost quota: A feature to throttle the number of operations on a queue based on resources (CPU/Memory/GPU/…) or operations’ costs.
Queue agent: A feature to route operations on a queue to a namespace or cluster.
Concurrency management: A feature to limit the number of operations queued.
Resume & Restart: Scheduling operation by resuming, restarting, and copying previous operation runs.
Conditional scheduling: A feature to start operation on nodes or queues based on inputs data or to completely skip scheduling the operation.
Manual approval: A feature to pause and suspend operations and pipelines and wait for human approval to resume the work.
Operation cache layer: A feature to reduce the cost and execution time by avoiding and skipping similar work.
External scheduling: A feature to schedule and submit operations from external systems.
Handling termination: A feature to handle failures and termination and enforcing SLAs.
Managing Priority: A feature to prioritize important operations and enforcing preemption.
Cost estimation: A feature to estimate the cost of running operations or to enforce quota based on complex environment definitions.

Core Scheduling Strategies

Scheduling Strategies

Overview

Features

Concepts

CLI

Scheduling Strategies

Version

Improve this page!

Have a feedback?