Overview

Until very recently with our efforts to expose a more versatile runtime to Polyaxon, kubernetes was a required layer for running and managing the machine learning workload with Polyaxon. It is a powerful orchestration tool that is used by Polyaxon to deploy, scale, and manage users’ workload.

Polyaxon operations

When a user submits a job or a service, Polyaxon creates a Kubernetes custom resource (CRD), which will knows how to run and manage the underlying native workload on Kubernetes. Polyaxon’s custom resource makes sure that the job or service is correctly running and reports the status transition back to Polyaxon’s API.

The following schema shows how a Polyaxon operation is constructed and submitted to Kubernetes:

  • Operation with job runtime
Operation
└──Job
   └──Pod
      └──Containers
  • Operation with service runtime
Operation
└──Service
   └──Deployment
      └──Pod
         └──Containers
  • Operation with TFJob runtime
Operation
└──TFJob
   └──Pods
      └──Containers

Same for PytorchJob, PaddleJob, MXNetJob, MPIJob, SparkJob, and DaskJob operations.

Pods

Pods are the smallest deployable units of computing that you can create and manage in Kubernetes. When using Polyaxon, pods are created automatically for each job or service.

Creating a Pod

Users can create a pod directly using the Kubernetes API, or by using a configuration file like the following:

apiVersion: v1
kind: Pod
metadata:
  name: busybox
spec:
  containers:
    - name: busybox
      image: busybox
      command: [ 'sh', '-c', 'echo The app is running! && sleep 3600' ]

To create the Pod shown above, run the following command:

$ kubectl apply -f pod.yaml

Executing a command in a Pod

To execute a command in a Pod, use the kubectl exec command. For example, to execute the command sh in the Pod busybox:

$ kubectl exec -it busybox -- sh

Another way to execute a command in a Pod is to use the kubectl run command. For example, to execute the command echo hello in a new Pod running the image busybox:

$ kubectl run busybox --image=busybox --restart=Never -- ls

For pods managed by a Polyaxon workload, the user can use the Polyaxon CLI or UI to execute commands without having to learn about the underlying Kubernetes concepts.

Show logs for a Pod

To show the logs for a Pod, use the kubectl logs command. For example, to show the logs for the Pod busybox:

$ kubectl logs busybox

To show the logs and follow the output, use the -f flag. For example, to show the logs for the Pod busybox and follow the output:

$ kubectl logs -f busybox

For pods managed by a Polyaxon workload, the user can use the Polyaxon CLI or UI to streams logs without having to learn about the underlying Kubernetes concepts.

Notes

  • Pods are generally not created directly and are created using workload resources. However, it’s useful to understand the Pod structure, and sometimes it is useful to create a Pod directly for debugging reasons.
  • Pods are intended to be disposable and replaceable, depending on the restart or failure policy defined by the workload, if a Pod gets deleted or fails for any reason, a new Pod will be created.
  • Users interacting with Polyaxon’s API will never need to create pods directly, it’s also important to note that pods created manually will not be controlled and managed by Polyaxon.
  • Pods managed by a Polyaxon workload will automatically deleted when the workload is finished running.

Jobs

A job creates one or more pods and ensures that a specified number of them successfully terminate. Most of the machine learning workload is executed as a job, and Polyaxon provides a few job types that are supported by the platform. Jobs comes with a few extra features, such as retries, backoff, and timeout, which are exposed by Polyaxon’s interface.

Creating a Job

A user can create a job directly using the Kubernetes API, or by using a configuration file like the following:

apiVersion: batch/v1
kind: Job
metadata:
  name: pi
spec:
  template:
    spec:
      containers:
        - name: pi
          image: perl
          command: [ "perl",  "-Mbignum=bpi", "-wle", "print bpi(2000)" ]
    restartPolicy: Never
  backoffLimit: 4

To create the Job shown above, run the following command:

$ kubectl apply -f job.yaml

Notes

Users interacting with Polyaxon’s API will never need to create jobs directly, it’s also important to note that jobs created manually will not be controlled and managed by Polyaxon.

Services

A service is a named abstraction that groups pods and provides a single point of entry for accessing them. Services are used to expose notebooks, tensorboards, and other services.

Creating a Service

A user can create a service directly using the Kubernetes API, or by using a configuration file like the following:

apiVersion: v1
kind: Service
metadata:
  name: tensorboard
spec:
  selector:
    app: tensorboard
  type: ClusterIP
  ports:
    - protocol: TCP
      port: 6006
      targetPort: 6006

To create the Service shown above, run the following command:

$ kubectl apply -f service.yaml

Deployments

In order to expose a service, a deployment needs to be created as well, which is a higher-level concept that manages a set of pods. A deployment is responsible for creating and updating pods, and it is also responsible for managing the replica sets that are created by the deployment.

Creating a Deployment

A user can create a deployment directly using the Kubernetes API, or by using a configuration file like the following:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: tensorboard
spec:
  selector:
    matchLabels:
      app: tensorboard
  template:
    metadata:
      labels:
        app: tensorboard
    spec:
    containers:
      - name: tensorboard
        image: tensorflow/tensorflow:1.13.1-py3
        command: [ "tensorboard" ]
        args: [...]

Exposing a Service

Depending on the service type, a service can be exposed in different ways. In the case of Service type ClusterIP, the service is only accessible from within the cluster, but can be port-forwarded to be accessed locally.

To port-forward the service tensorboard to the local machine, run the following command:

$ kubectl port-forward service/tensorboard 6006:6006

Notes

Users interacting with Polyaxon’s API will never need to create services directly, it’s also important to note that services created manually will not be controlled and managed by Polyaxon.

Services managed by Polyaxon have a much simpler configuration, and they are created automatically when a user creates a notebook or a tensorboard. They are also exposed via Polyaxon’s gateway, which means that users can access them via the Polyaxon’s UI or CLI and do not have to learn about the underlying Kubernetes concepts. Users who are using one of our commercial offerings will also benefit from a complete authentication and authorization layer for all the services exposed.

Closing thoughts

Although Polyaxon provides a simple interface and configuration for scheduling and running workloads, it does not prevent users from interacting with the underlying Kubernetes API. Our specification provides full access to customize the underlying Kubernetes resources via: