Polyaxon allows users to achieve up to 10x speedups in data preprocessing and train models at scale using Rapids.

Note: Users should also look at CuPy which is a NumPy-compatible array library accelerated by CUDA.

Requirements

To use Polyaxon and RAPIDS to accelerate model training, there are a few requirements:

  • Check the OS and CUDA version requirements.
  • Use NVIDIA P100 or later generation GPUs.

Docker images

Polyaxon schedules containerized workload, which makes creating compatible docker images with Rapids very simple.

Specifying requirements via conda

name: Rapids
channels:
- rapidsai
- nvidia
- conda-forge
dependencies:
- rapids=0.X

Or conda command:

conda create -n rapids-0.19 -c rapidsai -c nvidia -c conda-forge \
    rapids-blazing=0.19 python=3.7 cudatoolkit=10.2

Using Rapids base docker image

image: rapidsai/rapidsai:0.19-cuda10.2-runtime-ubuntu18.04-py3.7

...

After building and pushing your custom images to a Docker registry, you can run jobs, experiments, or notebooks with the Rapids suite of libraries.

Using Rapids

  • For data manipulation, users can leverage the cuDF which is a drop-in replacement for pandas for manipulating Dataframe.
  • For feature engineering, NVTabular, which sits atop RAPIDS, offers high-level abstractions for feature engineering and building recommenders.
  • For ML algorithms, Rapids offers cuML a GPU-accelerated version of sklearn’s algorithms.

By using the Rapids libraries, Polyaxon’s users can easily scale their data processing and model development with very few changes to their code.