Operation - Polyaxon Specification

V1Operation

polyaxon._flow.operations.operation.V1Operation()

An operation is how Polyaxon executes a component by passing parameters, connections, and a run environment.

With an operation users can:

Pass the parameters for required inputs or override the default values of optional inputs.
Patch the definition of the component to set environments, initializers, and resources.
Set termination logic and retries.
Set trigger logic to start a component in a pipeline context.
Parallelize or map the component over a matrix of parameters.
Put an operation on a schedule.
Subscribe a component to events to trigger executions automatically.

After resolution and compilation, Polyaxon will prepare an executable that will be scheduled on Kubernetes:

polyaxonfile operation

Args:
- version: str
- kind: str, should be equal to operation
- patch_strategy: str, optional, defaults to post_merge
- is_preset: bool, optional
- is_approved: bool, optional
- name: str, optional
- description: str, optional
- tags: List[str], optional
- presets: str, optional
- queue: str, optional
- namespace: str, optional
- cache: V1Cache, optional
- termination: V1Termination, optional
- plugins: V1Plugins, optional
- params: Dict[str, V1Param], optional
- schedule: Union[V1CronSchedule, V1IntervalSchedule, V1DateTimeSchedule], optional
- events: List[V1EventTrigger], optional
- build: V1Build, optional
- hooks: List[V1Hook], optional
- matrix: Union[V1Mapping, V1GridSearch, V1RandomSearch, V1Hyperband, V1Bayes, V1Hyperopt, V1Iterative], optional
- joins: List[V1Join], optional
- dependencies: dependencies, optional
- trigger: trigger, optional
- conditions: conditions, optional
- skip_on_upstream_skip: skip_on_upstream_skip, optional
- run_patch: Dict, optional
- hub_ref: str, optional
- dag_ref: str, optional
- url_ref: str, optional
- path_ref: str, optional
- component: V1Component, optional
- template: V1Template, optional

YAML usage

operation:
  version: 1.1
  kind: operation
  patchStrategy:
  isPreset:
  isApproved:
  name:
  description:
  tags:
  presets:
  queue:
  namespace:
  cache:
  termination:
  plugins:
  events:
  actions:
  hooks:
  params:
  build:
  runPatch:
  hubRef:
  dagRef:
  pathRef:
  component:
  template:

Python usage

from polyaxon.schemas import (
    V1Build, V1Cache, V1Component, V1Hook, V1Param, V1Plugins, V1Operation, V1Termination
)
from polyaxon.schemas import V1PatchStrategy
operation = V1Operation(
    patch_strategy=V1PatchStrategy.REPLACE,
    name="test",
    description="test",
    tags=["test"],
    presets=["test"],
    queue="test",
    namespace="test",
    cache=V1Cache(...),
    termination=V1Termination(...),
    plugins=V1Plugins(...),
    events=["event-ref1", "event-ref2"],
    hooks=[V1Hook(...)],
    outputs={"param1": V1Param(...), ...},
    build=V1Build(...),
    component=V1Component(...),
)

Fields

version

The polyaxon specification version to use to validate the operation.

operation:
  version: 1.1

kind

The kind signals to the CLI, client, and other tools that this is an operation.

If you are using the python client to create an operation, this field is not required and is set by default.

operation:
  kind: component

patchStrategy

Defines how the compiler should handle keys that are defined on the component, or how to merge multiple presets when using the override behavior -f.

There are four strategies:

replace: replaces all keys with new values if provided.
isnull: only applies new values if the keys have empty/None values.
post_merge: applies deep merge where newer values are applied last.
pre_merge: applies deep merge where newer values are applied first.

isPreset

This is a flag to tell if this operation must be validated or is only a preset that will be used with the override behavior to inject extra information to the main operation specification.

For instance a user might want to define a scheduling behavior that applies to several operations. One way to do that is to set the environment section on every operation. But sometimes the same scheduling behavior makes sense for several operations and components. In that case, the user can define an operation preset to extract that logic:

isPreset: true
runPatch:
  environment:
    nodeSelector:
      node_label: node_value

and use the override behavior to inject that section dynamically:

polyaxon run -f component -f scheduling-preset.yaml

Note: Please check this in-depth section about presets.

name

The name to use for this operation run, if provided, it will override the component’s name otherwise the name of the component will be used if it exists.

operation:
  name: test

description

The description to use for this operation run, if provided, it will override the component’s description otherwise the description of the component will be used if it exists.

operation:
  description: test

presets

The presets to use for this operation run, if provided, it will override the component’s presets otherwise the presets of the component will be used if it exists.

operation:
  presets: [test]

queue

The queue to use for this operation run, if provided, it will override the component’s queue otherwise the queue of the component will be used if it exists.

operation:
  queue: agent-name/queue-name

If the agent name is not specified, Polyaxon will resolve the name of the queue based on the default agent.

operation:
  queue: queue-name

namespace

Note: Please note that this field is only available in some commercial editions.

The namespace to use for this operation run, if provided, it will override the component’s namespace otherwise the namesace of the component will be used if it exists or it will default to the agent’s namespace.

operation:
  namespace: polyaxon

cache

The cache to use for this operation run, if provided, it will override the component’s cache otherwise the cache of the component will be used if it exists.

operation:
  cache:
    disable: false
    ttl: 100

termination

The termination to use for this operation run, if provided, it will override the component’s termination otherwise the termination of the component will be used if it exists.

operation:
  termination:
    maxRetries: 2

plugins

The plugins to use for this operation run, if provided, it will override the component’s plugins otherwise the plugins of the component will be used if it exists.

operation:
  name: debug
  ...
  plugins:
    auth: false
    collectLogs: false
  ...

params

The params to pass to the component, they will be validated against the inputs/outputs. If a parameter is passed and the component does not define a corresponding inputs/outputs, a validation error will be raised unless the param has the contextOnly flag enabled.

operation:
  params:
    param1: {value: 1.1}
    param2: {value: test}
    param3: {ref: ops.upstream-operation, value: outputs.metric}
  ...

build

Note: Please check V1Build for more details.

This section defines if this operation should build a container before starting the main logic. If the build section is provided, Polyaxon will set the main operation to a pending state until the build is done and then it will use the resulting docker image for starting the main container.

operation:
  ...
  build:
    hubRef: kaniko
  ...

runPatch

The run patch provides a way to override information about the component’s run section, for example the container’s resources or the environment section.

The run patch is a dictionary that can modify most of the runtime information and will be resolved against the corresponding run kind:

V1Job: for running batch jobs, model training experiments, data processing jobs, …
V1Service: for running tensorboards, notebooks, streamlit, custom services or an API.
V1TFJob: for running distributed Tensorflow training job.
V1PytorchJob: for running distributed Pytorch training job.
V1PaddleJob: for running distributed PaddlePaddle training job.
V1MXJob: for running distributed MXNet training job.
V1XGBoostJob: for running distributed XGBoost training job.
V1MPIJob: for running distributed MPI job.
V1RayJob: for running a Ray job.
V1DaskJob: for running a Dask job.
V1Dag: for running a DAG/workflow.

For example, if we define a generic component for running Jupyter Notebook:

version: 1.1
kind: component
name: notebook
run:
  kind: service
  ports: [8888]
  container:
    image: "jupyter/tensorflow-notebook"
    command: ["jupyter", "lab"]
    args: [
      "--no-browser",
      "--ip=0.0.0.0",
      "--port={{globals.ports[0]}}",
      "--allow-root",
      "--NotebookApp.allow_origin=*",
      "--NotebookApp.trust_xheaders=True",
      "--NotebookApp.token=",
      "--NotebookApp.base_url={{globals.base_url}}",
      "--LabApp.base_url={{globals.base_url}}"
    ]

This component is generic, and does not define resources requirements, if for instance this component is hosted on github and you don’t want to modify the component while at the same time you want to request a GPU for the notebook, you can patch the run:

version: 1.1
kind: operation
urlRef: https://raw.githubusercontent.com/org/repo/master/components/notebook.yaml
runPatch:
  container:
    resources:
      limits:
        nvidia.com/gpu: 1

By applying a run patch you can effectively share components while having full control over customizable details.

hubRef

Polyaxon provides a Component Hub for hosting versioned components with an access control system to improve the productivity of your team.

To run a component hosted on Polyaxon Component Hub, you can use hubRef

version: 1.1
kind: operation
hubRef: myComponent:v1.1
...

dagRef

If you are building a dag and you have a component that can be used by several operations, you can define a component and reuse it in all operations using dagRef. Please check Polyaxon automation’s flow engine section for more details.

urlRef

You can host your components on an accessible url, e.g github, and reference those components without downloading the data manually.

version: 1.1
kind: operation
urlRef: https://raw.githubusercontent.com/org/repo/master/components/my-component.yaml
...

Please note that you can only use this reference when using the CLI tool.

pathRef

In many situations, components can be placed in different folders within a project, e.g. data-processing, data-exploration, ml-modeling, …

You can define operations without the need to change the directory by referencing a path to that component:

version: 1.1
kind: operation
pathRef: ../data-processing/component-clean.yaml
...

Please note that you can only use this reference when using the CLI tool.

component

If you are still in the development phase or if you are building a singleton operation that can be executed in a unique way, you can define the component inline inside the operation:

version: 1.1
kind: operation
component:
  run:
     kind: job
     container:
       image: foo:latest
       command: train --lr=0.01
...

isApproved

This is a flag to trigger human validation before queuing and scheduling an operation. the default behavior is True even when the field is not set, i.e. no validation is required. To require a human validation prior to scheduling an operation, you can set this field to False.

isApproved: false

Cost

A field to define the cost of running the operation, the value is a float and should map to a convention of a cost estimation in your team or it can map directly to the cost of using the environment where the operation is running.

cost: 2.2

Core Operation Specification

Operation Specification