Polyaxon allows to schedule XGBoost experiments and supports tracking metrics, outputs, and models.

With Polyaxon you can:

  • log hyperparameters for every run
  • see learning curves for losses and metrics during training
  • see hardware consumption and stdout/stderr output during training
  • log images, charts, and other assets
  • log git commit information
  • log env information
  • log model
  • ...

Tracking API

Polyaxon provides a tracking API to track experiment and report metrics, artifacts, logs, and results to the Polyaxon dashboard.

You can use the tracking API to create a custom tracking experience with XGBoost.

Setup

In order to use Polyaxon tracking with XGBoost, you need to install Polyaxon Client

pip install polyaxon

Initialize your script with Polyaxon

This is an optional step if you need to perform some manual tracking or to track some information before passing the callback.

from polyaxon import tracking

tracking.init(...)

XGBoost callback

Polyaxon provides an XGBoost callback, you can use this callback with your experiment to report metrics automatically and other charts automatically:

from polyaxon import tracking
from polyaxon.tracking.contrib.xgboost import polyaxon_callback

# ...
tracking.init()
#...
model.train(params, data, callbacks=[polyaxon_callback(log_importance=True)])

Customizing the callback

Creating the callback will use the current initialized run, but you can use a different run if you need to have more control:

from polyaxon.tracking import Run
from polyaxon.tracking.contrib.xgboost import polyaxon_callback

run = Run(...)

model.train(params, data, callbacks=[polyaxon_callback(run=run, log_importance=True)])

Manual logging

If you want to have more control and use Polyaxon to log metrics in your custom XGBoost scripts:

  • log metrics
tracking.log_mtrics(metric1=value1, metric2=value2, ...)

Example

In this example we will go through the process of logging an XGBoost model using Polyaxon's callback.

This example can be used with the offline mode POLYAXON_OFFLINE=true and it does not require a Polyaxon API to run locally.

To see how this can be turned to a declarative approach to be submitted to a Polyaxon cluster, please check this example

import argparse
import logging

import pandas as pd
import xgboost as xgb

# Polyaxon
from polyaxon import tracking
from polyaxon.tracking.contrib.xgboost import polyaxon_callback

from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split

logger = logging.getLogger()

if __name__ == '__main__':
    parser = argparse.ArgumentParser()
    parser.add_argument(
        '--max_depth',
        type=int,
        default=5
    )
    parser.add_argument(
        '--eta',
        type=int,
        default=0.5
    )
    parser.add_argument(
        '--gamma',
        type=int,
        default=0.1
    )
    parser.add_argument(
        '--subsample',
        type=int,
        default=1
    )
    parser.add_argument(
        '--lambda',
        type=int,
        default=1,
        dest='lambda_',
    )
    parser.add_argument(
        '--alpha',
        type=float,
        default=0.35
    )
    parser.add_argument(
        '--objective',
        type=str,
        default='reg:squarederror'
    )
    parser.add_argument(
        '--cross_validate',
        type=bool,
        default=False
    )

    args = parser.parse_args()

    params = {
        'max_depth': args.max_depth,
        'eta': args.eta,
        'gamma': args.gamma,
        'subsample': args.subsample,
        'lambda': args.lambda_,
        'alpha': args.alpha,
        'objective': args.objective,
        'eval_metric': ['mae', 'rmse']
    }

    # Polyaxon
    tracking.init()

    boston = load_boston()
    data = pd.DataFrame(boston.data)
    data.columns = boston.feature_names
    data['PRICE'] = boston.target
    X, y = data.iloc[:, :-1], data.iloc[:, -1]
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=1012)

    # Polyaxon
    tracking.log_data_ref(content=X_train, name='x_train')
    tracking.log_data_ref(content=y_train, name='y_train')
    tracking.log_data_ref(content=X_test, name='X_test')
    tracking.log_data_ref(content=y_test, name='y_train')

    dtrain = xgb.DMatrix(X_train, label=y_train)
    dtest = xgb.DMatrix(X_test, label=y_test)

    if args.cross_validate:
        xgb.cv(params, dtrain, num_boost_round=20, nfold=7,
               callbacks=[polyaxon_callback(log_importance=True)])
    else:
        xgb.train(params, dtrain, 20, [(dtest, 'eval'), (dtrain, 'train')],
                  callbacks=[polyaxon_callback(log_importance=True)])