You can use one or multiple NFS servers to access data directly on your machine learning experiments and jobs

Overview

This guide shows how to use an NFS server to mount data to your jobs and experiments.

This guide uses the click-to-deploy single-node file server on Google Cloud Platform to create a ZFS file server running on a single Google Compute Engine instance, but the same principle applies to an NFS server running on any platform.

Create a Single Node Filer

Using click-to-deploy single-node file server, you need to create a filer: polyaxon-nfs, and keep the default value data, and check enable NFS sharing. You can set the storage to 50GB for example.

Create a folder for hosting your data

Use ssh to create a folder for your data plx-data under /data:

gcloud --project "polyaxon-test" compute ssh --ssh-flag=-L3000:localhost:3000 --zone=us-central1-b polyaxon-nfs-vm
cd /data
mkdir -m 777 plx-data

Get the ip address of the filer

gcloud --project "polyaxon-test" compute instances describe polyaxon-nfs-vm --zone=us-central1-b --format='value(networkInterfaces[0].networkIP)'

You might need to use the correct project name and zone.

Create a PVC with the correct ip addresses

Create data-pvc.yaml containing the following PVS definition:

apiVersion: v1
kind: PersistentVolume
metadata:
  name: polyaxon-pv-data
spec:
  capacity:
    storage: 45Gi
  accessModes:
    - ReadWriteMany
  nfs:
    server: 10.138.0.3  # Use the right IP
    path: "/data/plx-data"
  claimRef:
    namespace: polyaxon
    name: polyaxon-pvc-data
---
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: polyaxon-pvc-data
spec:
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 45Gi

Use kubectl to create the PVC based on the nfs server

Under the same namespace where you are deploying Polyaxon, e.g. polyaxon, create the PVC using kubectl

kubectl create -f data-pvc.yaml -n polyaxon

Use the PVC as an artifacts store in Polyaxon

In order to use the PVC with Polyaxon, you can follow the artifacts on Persistent Volume Claim.