You can use one or multiple buckets on S3 to access data directly on your machine learning experiments and jobs.

Create an S3 bucket

You should create an S3 bucket (e.g. plx-storage).

You need to expose information about how to connect to the blob storage, the standard way is to expose these keys:

  • AWS_ACCESS_KEY_ID
  • AWS_SECRET_ACCESS_KEY

And optionally these keys:

  • AWS_ENDPOINT_URL
  • AWS_ACCESS_KEY_ID
  • AWS_SECRET_ACCESS_KEY
  • AWS_SECURITY_TOKEN
  • AWS_REGION

Create a secret or a config map for storing these keys

We recommend using a secret to store your access information json object:

kubectl create secret -n polyaxon generic s3-secret --from-literal=AWS_ACCESS_KEY_ID=key-id --from-literal=AWS_SECRET_ACCESS_KEY=hash-key

Use the secret name and secret key in your data persistence definition

connections:
- name: s3-dataset1
  kind: wasb
  schema:
    bucket: "s3://bucket/"
  secret:
    name: "s3-secret"

If you want ot access multiple datasets using the same secret:

connections:
- name: s3-dataset1
  kind: wasb
  schema:
    bucket: "s3://bucket/path1"
  secret:
    name: "s3-secret"
- name: s3-dataset1
  kind: wasb
  schema:
    bucket: "s3://bucket/path2"
  secret:
    name: "s3-secret"

Update/Install Polyaxon deployment

You can deploy/upgrade your Polyaxon CE or Polyaxon Agent deployment with access to data on S3.

Access to the dataset in your experiments/jobs

To expose the connection secret to one of the containers in your jobs or services:

run:
  kind: job
  connections: [s3-dataset1]

Or

run:
  kind: job
  connections: [s3-dataset1, azure-dataset1]

Use the initializer to load the dataset

To use the artifacts initializer to load the dataset

run:
  kind: job
  init:
   - artifacts: [dirs: [...], files: [...]]
     connection: "s3-dataset1"

Use Polyaxon to access the dataset

This is optional, you can use any language or logic to interacts with S3 buckets.

Polyaxon has some built-in logic that you can leverage if you want.

To use that logic:

pip install polyaxon[s3]

All possible functions to use:

from polyaxon.connections.aws.s3 import S3Service

store = S3Service(...)

store.delete()
store.ls()
store.upload_file()
store.upload_dir()
store.download_file()
store.download_dir()