Oftentimes, teams will have several environments with different resources and access from different users. Allocating operations to the right resources while ensuring a fair queueing is an important behavior, especially when you scale your workload.

Polyaxon provides several interfaces designed to achieve fairness when a limited resource is shared, for example, to prevent a hyperparameter tuning with large search space or parallel executions from consuming more cluster resources than other workflows and operations.


Polyaxon provides several tools to:

  • Limit workflows from running a large number of concurrent operations.
  • Prioritize some important operations.
  • Route operations that require special resources to the right node(s), namespace, or cluster.
  • Split your workload over several nodes and clusters.


There are several distinct features involved in the scheduling strategies:

  • Node scheduling: A feature that leverages the Kubernetes API to select nodes for running your operations.
  • Resources scheduling: A feature that leverages the Kubernetes API to enable GPU/TPU, or other special resources for your operations.
  • Queue priority: A feature to prioritize operations on a queue.
  • Queue concurrency: A feature to throttle the number of operations on a queue based on parallelism.
  • Queue Resources (Roadmap): A feature to throttle the number of operations on a queue based on resources (CPU/Memory/GPU/...).
  • Queue agent: A feature to route operations on a queue to a namespace or cluster.
  • Concurrency management: A feature to limit the number of operations queued.
  • Scheduling presets: A feature for injecting certain information into operations at compilation time to preset configuration for node scheduling, queue routing, resources requirements and definition, connections, and access level control.
  • Defining a catalog of machines: By combining Queues and Presets, users can expose their cluster(s) as an organized and easy-to-use catalog of machines.
  • Resume & Restart: Scheduling operation by resuming, restarting, and copying previous operation runs.
  • Conditional scheduling: A feature to start operation on nodes or queues based on inputs data or to completely skip scheduling the operation.
  • Manual approval: A feature to pause and suspend operations and pipelines and wait for human approval to resume the work.
  • Operation cache layer: A feature to reduce the cost and execution time by avoiding and skipping similar work.
  • External scheduling: A feature to schedule and submit operations from external systems.