V1Init

polyaxon.polyflow.init.V1Init(artifacts=None, git=None, dockerfile=None, file=None, model=None, connection=None, path=None, container=None)

Polyaxon init section exposes an interface for users to run init containers before the main container containing the logic for training models or processing data.

Polyaxon init section is an extension of Kubernetes init containers.

Polyaxon init section has special handlers for several connections in addition to the possibility for the users to provide their own containers and run any custom init containers which can contain utilities or setup scripts not present in the main container.

By default, all built-in handlers will mount and initialize data under the path /plx-context/artifacts/{{connection-name}} unless the user passes a custom path.

YAML usage

You can only use one of the possibilities for built-in handlers, otherwise an exception will be raised. It's possible to customize the container used with the default built-in handlers.

version:  1.1
kind: component
run:
  kind: job
  init:
  - artifacts:
      dirs: ["path/on/the/default/artifacts/store"]
  - model: "modelName:version"
  - connection: gcs-large-datasets
    artifacts:
      dirs: ["data"]
    container:
      resources:
        requests:
          memory: "256Mi"
          cpu: "500m"
  - connection: s3-datasets
    path: "/s3-path"
    artifacts:
      files: ["data1", "path/to/data2"]
  - connection: repo1
  - git:
      revision: branch2
    connection: repo2
  - dockerfile:
      image: test
      run: ["pip install package1"]
      env: {'KEY1': 'en_US.UTF-8', 'KEY2':2}
  - file:
      name: script.sh
      chmod: "+x"
      content: |
        echo test
  - container:
      name: myapp-container
      image: busybox:1.28
      command: ['sh', '-c', 'echo custom init container']

  container:
    ...

Python usage

Similar to the YAML example if you pass more than one handler, an exception will be raised. It's possible to customize the container used with the default built-in handlers.

from polyaxon.polyflow import V1Component, V1Init, V1Job
from polyaxon.schemas.types import V1ArtifactsType, V1DockerfileType, V1GitType
from polyaxon.k8s import k8s_schemas
component = V1Component(
    run=V1Job(
       init=[
            V1Init(
                artifacts=V1ArtifactsType(dirs=["path/on/the/default/artifacts/store"])
            ),
            V1Init(
                model="object-detection:v12"
            ),
            V1Init(
                connection="gcs-large-datasets",
                artifacts=V1ArtifactsType(dirs=["data"]),
                container=k8s_schemas.V1Container(
                    resources=k8s_schemas.V1ResourceRequirements(requests={"memory": "256Mi", "cpu": "500m"}), 
                )
            ),
            V1Init(
              path="/s3-path",
              connection="s3-datasets",
                artifacts=V1ArtifactsType(files=["data1", "path/to/data2"])
            ),
            V1Init(
              connection="repo1",
            ),
            V1Init(
              connection="repo2",
              git=V1GitType(revision="branch2")
            ),
            V1Init(
                dockerfile=V1DockerfileType(
                    image="test",
                    run=["pip install package1"],
                    env={'KEY1': 'en_US.UTF-8', 'KEY2':2},
                )
            ),
            V1Init(
                dockerfile=V1FileType(
                    name="test.sh",
                    content="echo test",
                    chmod="+x",
                )
            ),
            V1Init(
                container=k8s_schemas.V1Container(
                    name="myapp-container",
                    image="busybox:1.28",
                    command=['sh', '-c', 'echo custom init container']
                )
            ),
       ],
       container=k8s_schemas.V1Container(...)
    )
)

Understanding init section

In both the YAML and Python example we are telling Polyaxon to initialize:

  • A directory path/on/the/default/artifacts/store from the default artfactsStore, because we did not specify a connection and we invoked an artifacts handler.
  • A model artifact from the registry object-detection version v12. (it's possible to provide the FQN org_name/model_name:version)
  • A directory data from a GCS connection named gcs-large-datasets, we also customized the built-in init container with a new resources section.
  • Two files data1, path/to/data2 from an S3 connection named s3-datasets, and we specified that the 2 files should be initialized under /s3-path instead of the default path that Polyaxon uses.
  • A repo configured under the connection name repo1 will be cloned from the default branch.
  • A repo configured under the connection name repo2 will be cloned from the branch name branch2.
  • A dockerfile will be generated with the specification that was provided.
  • A custom container will finally run our own custom code, in this case an echo command.