OCDocker.OCScore.Analysis.SHAP.Data module¶
Data loading and preparation for SHAP analysis.
Usage:
from OCDocker.OCScore.Analysis.SHAP.Data import load_and_prepare_data
- class OCDocker.OCScore.Analysis.SHAP.Data.DataHandles(X_train, X_val, X_test, y_val, feature_names)[source]
Bases:
objectData container for SHAP analysis datasets.
- Parameters:
X_train (DataFrame) –
X_val (DataFrame) –
X_test (DataFrame) –
y_val (ndarray) –
feature_names (List[str]) –
- X_train
Training feature matrix.
- Type:
pd.DataFrame
- X_val
Validation feature matrix.
- Type:
pd.DataFrame
- X_test
Test feature matrix.
- Type:
pd.DataFrame
- y_val
Validation target values.
- Type:
np.ndarray
- feature_names
List of feature column names.
- Type:
List[str]
- X_train: DataFrame
- X_val: DataFrame
- X_test: DataFrame
- y_val: ndarray
- feature_names: List[str]
- OCDocker.OCScore.Analysis.SHAP.Data.load_and_prepare_data(df_path, base_models_folder, study_number, use_pca=False, use_pdb_train=True, random_seed=42)[source]
Load and prepare datasets for SHAP analysis.
- Parameters:
df_path (str) – Path to the main dataframe file.
base_models_folder (str) – Base path to the models folder.
study_number (int) – Study number identifier.
use_pca (bool, optional) – Whether to use PCA-transformed features. Default is False.
use_pdb_train (bool, optional) – Whether to use PDBbind training data. Default is True.
random_seed (int, optional) – Random seed for reproducibility. Default is 42.
- Returns:
Container with train/val/test feature matrices, validation targets, and feature names.
- Return type:
DataHandles