Polyaxon allows to schedule Scikit-learn experiments, and supports tracking metrics, outputs, and models.

With Polyaxon you can:

  • log hyperparameters for every run
  • see learning curves for metrics during training
  • see hardware consumption and stdout/stderr output during training
  • log images, charts, and other assets
  • log git commit information
  • log env information
  • log model

Tracking API

Polyaxon provides a tracking API to track experiment and report metrics, artifacts, logs, and results to the Polyaxon dashboard.

You can use the tracking API to create a custom tracking experience with Scikit-learn.

Setup

pip install polyaxon

Initialize your script with Polyaxon

This is an optional step if you need to perform some manual tracking or to track some information before passing the callback.

from polyaxon import tracking

tracking.init(...)

Polyaxon callbacks

Polyaxon provides callbacks to report metrics automatically for classifiers and regressors:

from polyaxon.tracking.contrib.scikit import log_classifier, log_regressor

# Regressor
log_regressor(regressor, X_test, y_test)

# Classier
log_classifier(classifier, X_test, y_test)

Manual logging

If you want to have more control and use Polyaxon to log metrics in your custom Scikit-learn scripts:

from polyaxon import tracking

# Log your metrics
tracking.log_metrics(metric1=value1, metric2=value2, ...)

Example

Example classifier

In this example we will go through the process of logging a classifier information and logging a pickled model.

This example can be used with the offline mode POLYAXON_OFFLINE=true and it does not require a Polyaxon API to run locally.

To see how this can be turned to a declarative approach to be submitted to a Polyaxon cluster, please check this example

import pickle
from sklearn.datasets import load_digits
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.model_selection import train_test_split

from polyaxon.tracking.contrib.scikit import log_classifier
from polyaxon import tracking

parameters = {
    'n_estimators': 120,
    'learning_rate': 0.12,
    'min_samples_split': 3,
    'min_samples_leaf': 2,
}
gbc = GradientBoostingClassifier(**parameters)

X, y = load_digits()
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=28743)

# Polyaxon
tracking.init(name="classifier", is_offline=True)
tracking.log_inputs(**parameters)
tracking.log_data_ref(content=X_train, name='x_train', is_input=True)
tracking.log_data_ref(content=y_train, name='y_train', is_input=True)
tracking.log_data_ref(content=X_test, name='x_test', is_input=True)
tracking.log_data_ref(content=y_test, name='y_test', is_input=True)

gbc.fit(X_train, y_train)

# Polyaxon
log_classifier(gbc, X_test, y_test)

# Logging the model as pickle
with tempfile.TemporaryDirectory() as d:
    model_path = os.path.join(d, "model.pkl")
    with open(model_path, "wb") as out:
        pickle.dump(gbc, out)
    tracking.log_model(model_path, name="model", framework="scikit-learn", versioned=False)

# End
tracking.end()

Note: the versioned was removed in version >v1.17 and is the default behavior.

Example regressor

In this example we will go through the process of logging a regressor information and logging a joblib model.

This example can be used with the offline mode POLYAXON_OFFLINE=true and it does not require a Polyaxon API to run locally.

To see how this can be turned to a declarative approach to be submitted to a Polyaxon cluster, please check this example

import os

import joblib
import tempfile
from sklearn.datasets import load_boston
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split

from polyaxon.tracking.contrib.scikit import log_regressor
from polyaxon import tracking

parameters = {
    'n_estimators': 70,
    'max_depth': 7,
    'min_samples_split': 3,
}
rfr = RandomForestRegressor(**parameters)

X, y = load_boston(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=28743)

# Polyaxon
tracking.init(name="regressor")
tracking.log_inputs(**parameters)
tracking.log_data_ref(content=X_train, name='x_train', is_input=True)
tracking.log_data_ref(content=y_train, name='y_train', is_input=True)
tracking.log_data_ref(content=X_test, name='x_test', is_input=True)
tracking.log_data_ref(content=y_test, name='y_test', is_input=True)

rfr.fit(X_train, y_train)

# Polyaxon
log_regressor(rfr, X_test, y_test)

# Logging the model as joblib
with tempfile.TemporaryDirectory() as d:
    model_path = os.path.join(d, "model.joblib")
    joblib.dump(rfr, model_path)
    tracking.log_model(model_path, name="model", framework="scikit-learn", versioned=False)

Note: the versioned was removed in version >v1.17 and is the default behavior.