OCDocker.OCScore.Analysis.SHAP.Data module

Data loading and preparation for SHAP analysis.

Usage:

from OCDocker.OCScore.Analysis.SHAP.Data import load_and_prepare_data

class OCDocker.OCScore.Analysis.SHAP.Data.DataHandles(X_train, X_val, X_test, y_val, feature_names)[source]

Bases: object

Data container for SHAP analysis datasets.

Parameters:
  • X_train (DataFrame) –

  • X_val (DataFrame) –

  • X_test (DataFrame) –

  • y_val (ndarray) –

  • feature_names (List[str]) –

X_train

Training feature matrix.

Type:

pd.DataFrame

X_val

Validation feature matrix.

Type:

pd.DataFrame

X_test

Test feature matrix.

Type:

pd.DataFrame

y_val

Validation target values.

Type:

np.ndarray

feature_names

List of feature column names.

Type:

List[str]

X_train: DataFrame
X_val: DataFrame
X_test: DataFrame
y_val: ndarray
feature_names: List[str]
OCDocker.OCScore.Analysis.SHAP.Data.load_and_prepare_data(df_path, base_models_folder, study_number, use_pca=False, use_pdb_train=True, random_seed=42)[source]

Load and prepare datasets for SHAP analysis.

Parameters:
  • df_path (str) – Path to the main dataframe file.

  • base_models_folder (str) – Base path to the models folder.

  • study_number (int) – Study number identifier.

  • use_pca (bool, optional) – Whether to use PCA-transformed features. Default is False.

  • use_pdb_train (bool, optional) – Whether to use PDBbind training data. Default is True.

  • random_seed (int, optional) – Random seed for reproducibility. Default is 42.

Returns:

Container with train/val/test feature matrices, validation targets, and feature names.

Return type:

DataHandles