You can use one or multiple NFS servers to access data directly on your machine learning experiments and jobs
Overview
This guide shows how to use an NFS server to mount data to your jobs and experiments.
This guide uses the click-to-deploy single-node file server on Google Cloud Platform to create a ZFS file server running on a single Google Compute Engine instance, but the same principle applies to an NFS server running on any platform.
Create a Single Node Filer
Using click-to-deploy single-node file server,
you need to create a filer: polyaxon-nfs
, and keep the default value data
, and check enable NFS sharing
. You can set the storage to 50GB for example.
Create a folder for hosting your data
Use ssh to create a folder for your data plx-data
under /data
:
gcloud --project "polyaxon-test" compute ssh --ssh-flag=-L3000:localhost:3000 --zone=us-central1-b polyaxon-nfs-vm
cd /data
mkdir -m 777 plx-data
Get the ip address of the filer
gcloud --project "polyaxon-test" compute instances describe polyaxon-nfs-vm --zone=us-central1-b --format='value(networkInterfaces[0].networkIP)'
You might need to use the correct project name and zone.
Create a PVC with the correct ip addresses
Create data-pvc.yaml
containing the following PVS definition:
apiVersion: v1
kind: PersistentVolume
metadata:
name: polyaxon-pv-data
spec:
capacity:
storage: 45Gi
accessModes:
- ReadWriteMany
nfs:
server: 10.138.0.3 # Use the right IP
path: "/data/plx-data"
claimRef:
namespace: polyaxon
name: polyaxon-pvc-data
---
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: polyaxon-pvc-data
spec:
accessModes:
- ReadWriteMany
resources:
requests:
storage: 45Gi
Use kubectl to create the PVC based on the nfs server
Under the same namespace where you are deploying Polyaxon, e.g. polyaxon
, create the PVC using kubectl
kubectl create -f data-pvc.yaml -n polyaxon
Use the PVC as an artifacts store in Polyaxon
In order to use the PVC with Polyaxon, you can follow the artifacts on Persistent Volume Claim.