You can use one or multiple volumes to access data directly on your machine learning experiments and jobs
Define or create a PVC
You need to define or create a PVC in the same namespace as Polyaxon CE or Polyaxon Agent.
Under the same namespace where you are deploying Polyaxon, e.g. polyaxon
, create a PVC using kubectl
kubectl create -f data-pvc.yaml -n polyaxon
Tip: Please visit the Kubernetes documentation to learn about persistent volumes.
Now you can use this PVC to mount data to your experiments and jobs in Polyaxon
connections:
- name: dataset1
kind: volume_claim
schema:
mountPath: /plx-data
volumeClaim: polyaxon-pvc-data
To mount the data with the read-only option:
connections:
- name: dataset1
kind: volume_claim
schema:
mountPath: /plx-data
volumeClaim: polyaxon-pvc-data
readOnly: true
If you want ot access multiple datasets:
connections:
- name: dataset1
kind: volume_claim
schema:
mountPath: /plx-dataset1
volumeClaim: polyaxon-pvc-data1
readOnly: true
- name: dataset2
kind: volume_claim
schema:
mountPath: /plx-dataset2
volumeClaim: polyaxon-pvc-data2
readOnly: true
Update/Install Polyaxon CE or Polyaxon Agent deployment
You can deploy/upgrade your Polyaxon CE or Polyaxon Agent deployment with access to data on the PVC.
Access to the dataset in your experiments/jobs
To expose the connection secret to one of the containers in your jobs or services:
run:
kind: job
connections: [dataset1]
Or
run:
kind: job
connections: [dataset1, s3-dataset1]
Use the initializer to load the dataset
To use the artifacts initializer to load the dataset
run:
kind: job
init:
- artifacts: {dirs: [...], files: [...]}
connection: "dataset1"