Polyaxon provides several features for scaling and automating your workflows.

There are different ways to scale your operations:

  • Running distributed jobs
  • Running hyperparameter tuning and parallel jobs
  • Automating a large and a complex process

Polyaxon has APIs and clients that you can use with your favorite scheduler. It also comes with built-in support for distributed jobs, parallel executions, an optimization and a flow engine.

Automation with DAGs

DAGs are one of the runtimes supported by Polyaxon.

The file dag.yaml contains a DAG definition, it automates the journey we built manually in this tutorial:

version: 1.1
kind: component
name: automated-process
description: runs an experiment, if the loss is higher than max_loss start a hyperparameter tuning process, and then print the best models
inputs:
- {name: max_loss, type: float, value: 0.01, isOptional: true, description: "max loss to start a tuning job."}
- {name: top, type: int, value: 5, isOptional: true, description: "top jobs."}
run:
  kind: dag
  operations:
  - name: build
    hubRef: dokerizer
    params:
      destination:
        connection: docker-connection
        value: polyaxon/polyaxon-quick-start
    runPatch:
      init:
      - dockerfile:
          image: "tensorflow/tensorflow:2.2.0"
          run:
          - 'pip3 install --no-cache-dir -U polyaxon'
          langEnv: 'en_US.UTF-8'
  - name: experiment
    urlRef: "https://raw.githubusercontent.com/polyaxon/polyaxon-quick-start/master/experimentation/typed.yaml"
    dependencies: [build]
    params:
      learning_rate:
        value: 0.005
      epochs:
        value: 10
  - name: tune
    urlRef: "https://raw.githubusercontent.com/polyaxon/polyaxon-quick-start/master/experimentation/typed.yaml"
    params:
      upstream_loss:
        ref: ops.experiment
        value: outputs.loss
        contextOnly: true
      max_loss:
        ref: dag
        value: inputs.max_loss
        contextOnly: true
    conditions: "{{ upstream_loss > max_loss }}"
    matrix:
      kind: random
      concurrency: 2
      numRuns: 20
      params:
        learning_rate:
          kind: linspace
          value: 0.001:0.1:5
        dropout:
          kind: choice
          value: [0.25, 0.3]
        conv_activation:
          kind: pchoice
          value: [[relu, 0.1], [sigmoid, 0.8]]
        epochs:
          kind: choice
          value: [5, 10]
  - name: best_model
    dependencies: [experiment, tune]
    trigger: all_done
    params:
      top:
        ref: dag
        value: inputs.top
        contextOnly: true
    component:
      run:
        kind: job
        init:
        - git: {url: "https://github.com/polyaxon/polyaxon-quick-start"}
        container:
          image: polyaxon/polyaxon-quick-start
          workingDir: "{{ globals.artifacts_path }}/polyaxon-quick-start"
          command: [python3, best_models.py]
          args: ["--project={{ globals.project_name }}", "--top={{ top }}"]

This DAG will start an experiment, if the experiment has a loss > max_loss it will start a tuning job based on a random search algorithm. Finally, it will run a container to print the best 5 models.

The DAG itself is parameterized, we can pass different values for max_loss and top.

polyaxon run --url https://raw.githubusercontent.com/polyaxon/polyaxon-quick-start/master/dags/dag.yaml -P loss=0.002 -P top=10

A DAG definition is also managed internally by a pipeline, which means you can also leverage all the pipeline tools to control the caching for runs with similar details, concurrency for managing the number of parallel jobs, and early stopping strategies.

To learn more about DAGs.

Learn More

You can check the automation section for more details about all the automation features.