V1SparkReplica

polyaxon.polyflow.run.spark.replica.V1SparkReplica(replicas=None, environment=None, init=None, sidecars=None, container=None)

Spark replica is the specification for a Spark executor or driver.

YAML usage

executor/driver:
  replicas
  environment:
  init:
  sidecars:
  container:

Python usage

from polyaxon.polyflow import V1Environment, V1Init, V1SparkReplica
from polyaxon.k8s import k8s_schemas
replica = V1SparkReplica(
    replicas=2,
    environment=V1Environment(...),
    init=[V1Init(...)],
    sidecars=[k8s_schemas.V1Container(...)],
    container=k8s_schemas.V1Container(...),
)

Fields

replicas

The number of replica (executor/driver) instances.

executor:
  replicas: 2

environment

Optional environment section, it provides a way to inject pod related information into the replica (executor/driver).

driver:
  environment:
    labels:
       key1: "label1"
       key2: "label2"
     annotations:
       key1: "value1"
       key2: "value2"
     nodeSelector:
       node_label: node_value
     ...
 ...

init

A list of init handlers and containers to resolve for the replica (executor/driver).

If you are referencing a connection it must be configured. All referenced connections will be checked:
  • If they are accessible in the context of the project of this run

  • If the user running the operation can have access to those connections

executor:
  init:
    - artifacts:
        dirs: ["path/on/the/default/artifacts/store"]
    - connection: gcs-large-datasets
      artifacts:
        dirs: ["data"]
      container:
        resources:
          requests:
            memory: "256Mi"
            cpu: "500m"
    - container:
      name: myapp-container
      image: busybox:1.28
      command: ['sh', '-c', 'echo custom init container']

sidecars

A list of sidecar containers that will used as sidecars.

driver:
  sidecars:
    - name: sidecar2
      image: busybox:1.28
      command: ['sh', '-c', 'echo sidecar2']
    - name: sidecar1
      image: busybox:1.28
      command: ['sh', '-c', 'echo sidecar1']
      resources:
        requests:
          memory: "128Mi"
          cpu: "500m"

container

The main Kubernetes Container that will run your experiment training or data processing logic for the replica (executor/driver).

executor:
  kind: job
  container:
    name: tensorflow:2.1
    init:
      - connection: my-tf-code-repo
    command: ["python", "/plx-context/artifacts/my-tf-code-repo/model.py"]