OCDocker.OCScore.XGBoost.XGBoostOptimizer module

Module to run the Extreme Gradient Boost algorithm.

It is imported as:

import OCDocker.OCScore.XGBoost.OCxgboost as OCxgboost

class OCDocker.OCScore.XGBoost.XGBoostOptimizer.XGBoostOptimizer(X_train, y_train, X_test, y_test, X_validation=None, y_validation=None, storage='sqlite:///pre_xgboost.db', params=None, early_stopping_rounds=20, use_gpu=False, random_state=42, verbose=False)[source]

Bases: object

Class to optimize XGBoost hyperparameters using Optuna.

Parameters:
  • X_train (np.ndarray | pd.DataFrame | pd.Series) – The training dataset.

  • y_train (np.ndarray | pd.DataFrame | pd.Series) – The training labels.

  • X_test (np.ndarray | pd.DataFrame | pd.Series) – The test dataset.

  • y_test (np.ndarray | pd.DataFrame | pd.Series) – The test labels.

  • X_validation (np.ndarray | pd.DataFrame | pd.Series, optional) – The validation dataset and labels. Default is None.

  • y_validation (np.ndarray | pd.DataFrame | pd.Series, optional) – The validation labels. Default is None.

  • storage (str, optional) – The storage path/URL for the Optuna study. Default is “sqlite:///pre_xgboost.db”.

  • params (dict, optional) – The hyperparameters for the XGBoost model. Default is None (treated as an empty dictionary).

  • early_stopping_rounds (int, optional) – The number of early stopping rounds for the XGBoost model. Default is 20.

  • use_gpu (bool, optional) – Whether to use the GPU for training the XGBoost model. Default is False.

  • random_state (int, optional) – The random state for reproducibility. Default is 42.

  • verbose (bool, optional) – Whether to print the training logs. Default is False.

__init__(X_train, y_train, X_test, y_test, X_validation=None, y_validation=None, storage='sqlite:///pre_xgboost.db', params=None, early_stopping_rounds=20, use_gpu=False, random_state=42, verbose=False)[source]

Initializes the PreXGBoostOptimizer with training data and configuration.

Parameters:
  • X_train (np.ndarray | pd.DataFrame | pd.Series) – The training dataset.

  • y_train (np.ndarray | pd.DataFrame | pd.Series) – The training labels.

  • X_test (np.ndarray | pd.DataFrame | pd.Series) – The test dataset.

  • y_test (np.ndarray | pd.DataFrame | pd.Series) – The test labels.

  • X_validation (np.ndarray | pd.DataFrame | pd.Series, optional) – The validation dataset and labels. Default is None.

  • y_validation (np.ndarray | pd.DataFrame | pd.Series, optional) – The validation labels. Default is None.

  • params (dict, optional) – The hyperparameters for the XGBoost model. Default is None (treated as an empty dictionary).

  • early_stopping_rounds (int, optional) – The number of early stopping rounds for the XGBoost model. Default is 50.

  • use_gpu (bool, optional) – Whether to use the GPU for training the XGBoost model. Default is False.

  • load_if_exists (bool, optional) – Whether to load the study if it exists. Default is True.

  • random_state (int, optional) – The random state for reproducibility. Default is 42.

  • verbose (bool, optional) – Whether to print the training logs. Default is False.

  • storage (str) –

Return type:

None

objective(trial)[source]

The objective function for Optuna optimization to tune XGBoost hyperparameters.

Parameters:

trial (optuna.trial._trial.Trial) – A single trial object which suggests hyperparameters.

Returns:

The AUC of the model as a result of the suggested hyperparameters. If the validation dataset is provided, returns a tuple of AUC and RMSE.

Return type:

float | tuple[float, float]

optimize(direction='minimize', n_trials=1000, n_jobs=1, study_name='XGBoost pre-optimization', load_if_exists=True)[source]

Optimizes XGBoost hyperparameters using Optuna.

Parameters:
  • directions (str | list, optional) – The direction of the optimization. Default is “maximize”.

  • n_trials (int, optional) – The number of trials for Optuna optimization. Default is 100.

  • direction (str) –

  • n_jobs (int) –

  • study_name (str) –

  • load_if_exists (bool) –

Returns:

  • optuna.study.Study – The Optuna study object.

  • n_trials (int, optional) – The number of trials for Optuna optimization. Default is 1000.

  • n_jobs (int, optional) – The number of jobs to run in parallel. Default is 1.

  • study_name (str, optional) – The name of the study. Default is “XGBoost pre-optimization”.

  • dict – The best hyperparameters.

  • float – The best AUC score.

Return type:

optuna.study.Study