You can use one or multiple blobs on Azure Storage to access data directly on your machine learning experiments and jobs.

Create an Azure Storage account

You should create a storage account (e.g. plx-storage) and a blob (e.g. data).

You need to expose information about how to connect to the blob storage, the standard way is to expose these keys:

  • AZURE_ACCOUNT_NAME
  • AZURE_ACCOUNT_KEY
  • AZURE_CONNECTION_STRING

Create a secret or a config map for storing these keys

We recommend using a secret to store your access information json object:

kubectl create secret -n polyaxon generic az-secret --from-literal=AZURE_ACCOUNT_NAME=account --from-literal=AZURE_ACCOUNT_KEY=hash-key

Use the secret to add a connection

connections:
- name: azure-dataset1
  kind: wasb
  schema:
    bucket: "wasbs://[email protected]/"
  secret:
    name: "az-secret"

If you want ot access multiple datasets using the same secret:

persistence:
- name: azure-dataset1
  kind: wasb
  schema:
    bucket: "wasbs://[email protected]/"
  secret:
    name: "az-secret"
- name: azure-dataset2
  kind: wasb
  schema:
    bucket: "wasbs://[email protected]/"
  secret:
    name: "az-secret"

Update/Install Polyaxon CE or Polyaxon Agent deployment

You can deploy/upgrade your Polyaxon CE or Polyaxon Agent deployment with access to data on Azure.

Access to the dataset in your experiments/jobs

To expose the connection secret to one of the containers in your jobs or services:

run:
  kind: job
  connections: [azure-dataset1]

Or

run:
  kind: job
  connections: [azure-dataset1, s3-dataset1]

Use the initializer to load the dataset

To use the artifacts initializer to load the dataset

run:
  kind: job
  init:
   - artifacts: {dirs: [...], files: [...]}
     connection: "azure-dataset1"

Access the dataset programmatically

This is optional, you can use any language or logic to interact with Azure Blob Storage buckets.

For instance you can install Azure Blob Storage Python SDK and it will be configured automatically when you request the Azure Blob Storage connection.

You can also use Polyaxon’s fs library to get a fully resolved adlfs instance:

pip install polyaxon[azure]

Creating a sync instance of the adlfs client:

from polyaxon.fs import get_fs_from_name

...
fs = get_fs_from_name("azure-dataset1")  # You can pass additional kwargs to the function
...

Creating an async instance of the adlfs client:

from polyaxon.fs import get_fs_from_name

...
fs = get_fs_from_name("azure-dataset1",
                      asynchronous=True)  # You can pass additional kwargs to the function
...