OCDocker.OCScore.Analysis.SHAP package

Submodules

Module contents

Re-export SHAP public API for convenience.

Usage:

import OCDocker.OCScore.Analysis.SHAP as ocshap

Modules

  • Cli: Command-line entry point for SHAP runs.

  • Data: Data loading and preparation helpers.

  • Explain: SHAP computation helpers.

  • Model: Neural network builder for SHAP runs.

  • Plots: SHAP visualization utilities.

  • Runner: End-to-end SHAP workflow runner.

  • Studies: Optuna study selection helpers.

OCDocker.OCScore.Analysis.SHAP.run_shap_analysis(studies, df_path, base_models_folder, study_number, out_dir, background_size=None, eval_size=None, explainer='deep', stratify_by=None, seed=0, save_csv=True)[source]

Run complete SHAP analysis workflow.

Parameters:
  • studies (StudyHandles) – Handles to Optuna studies for selecting best model parameters.

  • df_path (str) – Path to the main dataframe file.

  • base_models_folder (str) – Base path to the models folder.

  • study_number (int) – Study number identifier.

  • out_dir (str) – Output directory for SHAP results.

  • background_size (Optional[int], optional) – Number of samples to use for SHAP background. If None, uses all training data. Default is None.

  • eval_size (Optional[int], optional) – Number of samples to evaluate SHAP values for. If None, uses all test data. Default is None.

  • explainer (str, optional) – SHAP explainer type: “deep” or “kernel”. Default is “deep”.

  • stratify_by (Optional[List[str]], optional) – Column names to stratify sampling by. Default is None.

  • seed (int, optional) – Random seed for reproducibility. Default is 0.

  • save_csv (bool, optional) – Whether to save SHAP values as CSV file. Default is True.

Returns:

Container with paths to all generated output files.

Return type:

OutputPaths

class OCDocker.OCScore.Analysis.SHAP.OutputPaths(out_dir, feature_importance_png, beeswarm_png, shap_values_npy, shap_values_csv=None)[source]

Bases: object

Container for SHAP analysis output file paths.

Parameters:
  • out_dir (str) –

  • feature_importance_png (str) –

  • beeswarm_png (str) –

  • shap_values_npy (str) –

  • shap_values_csv (str | None) –

out_dir

Base output directory.

Type:

str

feature_importance_png

Path to feature importance bar plot PNG file.

Type:

str

beeswarm_png

Path to SHAP beeswarm plot PNG file.

Type:

str

shap_values_npy

Path to SHAP values NumPy array file.

Type:

str

shap_values_csv

Path to SHAP values CSV file. None if CSV was not saved. Default is None.

Type:

Optional[str], optional

out_dir: str
feature_importance_png: str
beeswarm_png: str
shap_values_npy: str
shap_values_csv: str | None = None
class OCDocker.OCScore.Analysis.SHAP.StudyHandles(ao_study_name, nn_study_name, seed_study_name, mask_study_name, storage)[source]

Bases: object

Container for Optuna study handles and storage information.

Parameters:
  • ao_study_name (str) –

  • nn_study_name (str) –

  • seed_study_name (str) –

  • mask_study_name (str) –

  • storage (str) –

ao_study_name

Name of the autoencoder optimization study.

Type:

str

nn_study_name

Name of the neural network optimization study.

Type:

str

seed_study_name

Name of the random seed optimization study.

Type:

str

mask_study_name

Name of the feature mask optimization study.

Type:

str

storage

Storage path/URL for Optuna studies.

Type:

str

ao_study_name: str
nn_study_name: str
seed_study_name: str
mask_study_name: str
storage: str
class OCDocker.OCScore.Analysis.SHAP.BestSelections(autoencoder_params, nn_params, seed, mask)[source]

Bases: object

Container for best parameters selected from Optuna studies.

Parameters:
  • autoencoder_params (Dict[str, int | float | str | bool]) –

  • nn_params (Dict[str, int | float | str | bool]) –

  • seed (int) –

  • mask (ndarray) –

autoencoder_params

Best autoencoder parameters.

Type:

Dict[str, Union[int, float, str, bool]]

nn_params

Best neural network parameters.

Type:

Dict[str, Union[int, float, str, bool]]

seed

Best random seed.

Type:

int

mask

Best feature mask as a binary array.

Type:

np.ndarray

autoencoder_params: Dict[str, int | float | str | bool]
nn_params: Dict[str, int | float | str | bool]
seed: int
mask: ndarray
OCDocker.OCScore.Analysis.SHAP.select_best_from_studies(handles)[source]

Select best parameters from multiple Optuna optimization studies.

Parameters:

handles (StudyHandles) – Container with study names and storage information.

Returns:

Container with best parameters from all studies (autoencoder, neural network, seed, mask).

Return type:

BestSelections

class OCDocker.OCScore.Analysis.SHAP.DataHandles(X_train, X_val, X_test, y_val, feature_names)[source]

Bases: object

Data container for SHAP analysis datasets.

Parameters:
  • X_train (DataFrame) –

  • X_val (DataFrame) –

  • X_test (DataFrame) –

  • y_val (ndarray) –

  • feature_names (List[str]) –

X_train

Training feature matrix.

Type:

pd.DataFrame

X_val

Validation feature matrix.

Type:

pd.DataFrame

X_test

Test feature matrix.

Type:

pd.DataFrame

y_val

Validation target values.

Type:

np.ndarray

feature_names

List of feature column names.

Type:

List[str]

X_train: DataFrame
X_val: DataFrame
X_test: DataFrame
y_val: ndarray
feature_names: List[str]
OCDocker.OCScore.Analysis.SHAP.load_and_prepare_data(df_path, base_models_folder, study_number, use_pca=False, use_pdb_train=True, random_seed=42)[source]

Load and prepare datasets for SHAP analysis.

Parameters:
  • df_path (str) – Path to the main dataframe file.

  • base_models_folder (str) – Base path to the models folder.

  • study_number (int) – Study number identifier.

  • use_pca (bool, optional) – Whether to use PCA-transformed features. Default is False.

  • use_pdb_train (bool, optional) – Whether to use PDBbind training data. Default is True.

  • random_seed (int, optional) – Random seed for reproducibility. Default is 42.

Returns:

Container with train/val/test feature matrices, validation targets, and feature names.

Return type:

DataHandles

OCDocker.OCScore.Analysis.SHAP.build_neural_net(input_dim, autoencoder_params, nn_params, seed, mask=None, use_gpu=None, verbose=False)[source]

Build and configure a neural network for SHAP analysis.

Parameters:
  • input_dim (int) – Number of input features.

  • autoencoder_params (Dict[str, Union[int, float, str, bool]]) – Parameters for the autoencoder component.

  • nn_params (Dict[str, Union[int, float, str, bool]]) – Parameters for the neural network component.

  • seed (int) – Random seed for reproducibility.

  • mask (Optional[list[int] | list[bool]], optional) – Feature mask to apply. Default is None.

  • use_gpu (Optional[bool], optional) – Whether to use GPU. If None, auto-detects CUDA availability. Default is None.

  • verbose (bool, optional) – Whether to print verbose output. Default is False.

Returns:

Configured neural network in evaluation mode.

Return type:

NeuralNet

OCDocker.OCScore.Analysis.SHAP.compute_shap_values(neural, X_background, X_eval, explainer='deep', background_size=None, eval_size=None, stratify_by=None, rng_seed=0)[source]

Compute SHAP values for a neural network model using Deep or Kernel explainer.

Parameters:
  • neural (object) – Object with a .NN attribute that is a PyTorch neural network model.

  • X_background (pd.DataFrame) – Background dataset for SHAP. Should contain the same features as X_eval.

  • X_eval (pd.DataFrame) – Evaluation dataset for which to compute SHAP values.

  • explainer (str) – Type of SHAP explainer to use: “deep” or “kernel”. Default is “deep”.

  • background_size (int, optional) – Number of samples to draw from X_background. If None, use all. Default is None.

  • eval_size (int, optional) – Number of samples to draw from X_eval. If None, use all. Default is None.

  • stratify_by (list of str, optional) – Column names to stratify sampling by. If None or empty, no stratification is done. Default is None.

  • rng_seed (int) – Random seed for reproducibility. Default is 0.

Returns:

SHAP values as a 2D array of shape (n_samples, n_features).

Return type:

np.ndarray