OCDocker.OCScore.XGBoost.XGBoostOptimizer module¶
Module to run the Extreme Gradient Boost algorithm.
It is imported as:
import OCDocker.OCScore.XGBoost.OCxgboost as OCxgboost
- class OCDocker.OCScore.XGBoost.XGBoostOptimizer.XGBoostOptimizer(X_train, y_train, X_test, y_test, X_validation=None, y_validation=None, storage='sqlite:///pre_xgboost.db', params=None, early_stopping_rounds=20, use_gpu=False, random_state=42, verbose=False)[source]¶
Bases:
objectClass to optimize XGBoost hyperparameters using Optuna.
- Parameters:
X_train (np.ndarray | pd.DataFrame | pd.Series) – The training dataset.
y_train (np.ndarray | pd.DataFrame | pd.Series) – The training labels.
X_test (np.ndarray | pd.DataFrame | pd.Series) – The test dataset.
y_test (np.ndarray | pd.DataFrame | pd.Series) – The test labels.
X_validation (np.ndarray | pd.DataFrame | pd.Series, optional) – The validation dataset and labels. Default is None.
y_validation (np.ndarray | pd.DataFrame | pd.Series, optional) – The validation labels. Default is None.
storage (str, optional) – The storage path/URL for the Optuna study. Default is “sqlite:///pre_xgboost.db”.
params (dict, optional) – The hyperparameters for the XGBoost model. Default is None (treated as an empty dictionary).
early_stopping_rounds (int, optional) – The number of early stopping rounds for the XGBoost model. Default is 20.
use_gpu (bool, optional) – Whether to use the GPU for training the XGBoost model. Default is False.
random_state (int, optional) – The random state for reproducibility. Default is 42.
verbose (bool, optional) – Whether to print the training logs. Default is False.
- __init__(X_train, y_train, X_test, y_test, X_validation=None, y_validation=None, storage='sqlite:///pre_xgboost.db', params=None, early_stopping_rounds=20, use_gpu=False, random_state=42, verbose=False)[source]¶
Initializes the PreXGBoostOptimizer with training data and configuration.
- Parameters:
X_train (np.ndarray | pd.DataFrame | pd.Series) – The training dataset.
y_train (np.ndarray | pd.DataFrame | pd.Series) – The training labels.
X_test (np.ndarray | pd.DataFrame | pd.Series) – The test dataset.
y_test (np.ndarray | pd.DataFrame | pd.Series) – The test labels.
X_validation (np.ndarray | pd.DataFrame | pd.Series, optional) – The validation dataset and labels. Default is None.
y_validation (np.ndarray | pd.DataFrame | pd.Series, optional) – The validation labels. Default is None.
params (dict, optional) – The hyperparameters for the XGBoost model. Default is None (treated as an empty dictionary).
early_stopping_rounds (int, optional) – The number of early stopping rounds for the XGBoost model. Default is 50.
use_gpu (bool, optional) – Whether to use the GPU for training the XGBoost model. Default is False.
load_if_exists (bool, optional) – Whether to load the study if it exists. Default is True.
random_state (int, optional) – The random state for reproducibility. Default is 42.
verbose (bool, optional) – Whether to print the training logs. Default is False.
storage (str) –
- Return type:
None
- objective(trial)[source]¶
The objective function for Optuna optimization to tune XGBoost hyperparameters.
- Parameters:
trial (optuna.trial._trial.Trial) – A single trial object which suggests hyperparameters.
- Returns:
The AUC of the model as a result of the suggested hyperparameters. If the validation dataset is provided, returns a tuple of AUC and RMSE.
- Return type:
float | tuple[float, float]
- optimize(direction='minimize', n_trials=1000, n_jobs=1, study_name='XGBoost pre-optimization', load_if_exists=True)[source]¶
Optimizes XGBoost hyperparameters using Optuna.
- Parameters:
directions (str | list, optional) – The direction of the optimization. Default is “maximize”.
n_trials (int, optional) – The number of trials for Optuna optimization. Default is 100.
direction (str) –
n_jobs (int) –
study_name (str) –
load_if_exists (bool) –
- Returns:
optuna.study.Study – The Optuna study object.
n_trials (int, optional) – The number of trials for Optuna optimization. Default is 1000.
n_jobs (int, optional) – The number of jobs to run in parallel. Default is 1.
study_name (str, optional) – The name of the study. Default is “XGBoost pre-optimization”.
dict – The best hyperparameters.
float – The best AUC score.
- Return type:
optuna.study.Study