OCDocker.OCScore.XGBoost.XGBoostOptimizer module¶

Module to run the Extreme Gradient Boost algorithm.

It is imported as:

import OCDocker.OCScore.XGBoost.OCxgboost as OCxgboost

class OCDocker.OCScore.XGBoost.XGBoostOptimizer.XGBoostOptimizer(X_train, y_train, X_test, y_test, X_validation=None, y_validation=None, storage='sqlite:///pre_xgboost.db', params=None, early_stopping_rounds=20, use_gpu=False, random_state=42, verbose=False)[source]¶

Bases: object

Class to optimize XGBoost hyperparameters using Optuna.

Parameters:

X_train (np.ndarray | pd.DataFrame | pd.Series) – The training dataset.
y_train (np.ndarray | pd.DataFrame | pd.Series) – The training labels.
X_test (np.ndarray | pd.DataFrame | pd.Series) – The test dataset.
y_test (np.ndarray | pd.DataFrame | pd.Series) – The test labels.
X_validation (np.ndarray | pd.DataFrame | pd.Series, optional) – The validation dataset and labels. Default is None.
y_validation (np.ndarray | pd.DataFrame | pd.Series, optional) – The validation labels. Default is None.
storage (str, optional) – The storage path/URL for the Optuna study. Default is “sqlite:///pre_xgboost.db”.
params (dict, optional) – The hyperparameters for the XGBoost model. Default is None (treated as an empty dictionary).
early_stopping_rounds (int, optional) – The number of early stopping rounds for the XGBoost model. Default is 20.
use_gpu (bool, optional) – Whether to use the GPU for training the XGBoost model. Default is False.
random_state (int, optional) – The random state for reproducibility. Default is 42.
verbose (bool, optional) – Whether to print the training logs. Default is False.

__init__(X_train, y_train, X_test, y_test, X_validation=None, y_validation=None, storage='sqlite:///pre_xgboost.db', params=None, early_stopping_rounds=20, use_gpu=False, random_state=42, verbose=False)[source]¶

Initializes the PreXGBoostOptimizer with training data and configuration.

Parameters:

X_train (np.ndarray | pd.DataFrame | pd.Series) – The training dataset.
y_train (np.ndarray | pd.DataFrame | pd.Series) – The training labels.
X_test (np.ndarray | pd.DataFrame | pd.Series) – The test dataset.
y_test (np.ndarray | pd.DataFrame | pd.Series) – The test labels.
X_validation (np.ndarray | pd.DataFrame | pd.Series, optional) – The validation dataset and labels. Default is None.
y_validation (np.ndarray | pd.DataFrame | pd.Series, optional) – The validation labels. Default is None.
params (dict, optional) – The hyperparameters for the XGBoost model. Default is None (treated as an empty dictionary).
early_stopping_rounds (int, optional) – The number of early stopping rounds for the XGBoost model. Default is 50.
use_gpu (bool, optional) – Whether to use the GPU for training the XGBoost model. Default is False.
load_if_exists (bool, optional) – Whether to load the study if it exists. Default is True.
random_state (int, optional) – The random state for reproducibility. Default is 42.
verbose (bool, optional) – Whether to print the training logs. Default is False.
storage (str) –

Return type:

None

objective(trial)[source]¶

The objective function for Optuna optimization to tune XGBoost hyperparameters.

Parameters:: trial (optuna.trial._trial.Trial) – A single trial object which suggests hyperparameters.
Returns:: The AUC of the model as a result of the suggested hyperparameters. If the validation dataset is provided, returns a tuple of AUC and RMSE.
Return type:: float | tuple[float, float]

optimize(direction='minimize', n_trials=1000, n_jobs=1, study_name='XGBoost pre-optimization', load_if_exists=True)[source]¶

Optimizes XGBoost hyperparameters using Optuna.

Parameters:

directions (str | list, optional) – The direction of the optimization. Default is “maximize”.
n_trials (int, optional) – The number of trials for Optuna optimization. Default is 100.
direction (str) –
n_jobs (int) –
study_name (str) –
load_if_exists (bool) –

Returns:

optuna.study.Study – The Optuna study object.
n_trials (int, optional) – The number of trials for Optuna optimization. Default is 1000.
n_jobs (int, optional) – The number of jobs to run in parallel. Default is 1.
study_name (str, optional) – The name of the study. Default is “XGBoost pre-optimization”.
dict – The best hyperparameters.
float – The best AUC score.

Return type:

optuna.study.Study