Polyaxon allows users to achieve up to 10x speedups in data preprocessing and train models at scale using Rapids.
Note: Users should also look at CuPy which is a NumPy-compatible array library accelerated by CUDA.
Requirements
To use Polyaxon and RAPIDS to accelerate model training, there are a few requirements:
- Check the OS and CUDA version requirements.
- Use NVIDIA P100 or later generation GPUs.
Docker images
Polyaxon schedules containerized workload, which makes creating compatible docker images with Rapids very simple.
Specifying requirements via conda
name: Rapids
channels:
- rapidsai
- nvidia
- conda-forge
dependencies:
- rapids=0.X
Or conda command:
conda create -n rapids-0.19 -c rapidsai -c nvidia -c conda-forge \
rapids-blazing=0.19 python=3.7 cudatoolkit=10.2
Using Rapids base docker image
image: rapidsai/rapidsai:0.19-cuda10.2-runtime-ubuntu18.04-py3.7
...
After building and pushing your custom images to a Docker registry, you can run jobs, experiments, or notebooks with the Rapids suite of libraries.
Using Rapids
- For data manipulation, users can leverage the cuDF which is a drop-in replacement for pandas for manipulating Dataframe.
- For feature engineering, NVTabular, which sits atop RAPIDS, offers high-level abstractions for feature engineering and building recommenders.
- For ML algorithms, Rapids offers cuML a GPU-accelerated version of sklearn’s algorithms.
By using the Rapids libraries, Polyaxon’s users can easily scale their data processing and model development with very few changes to their code.