OCDocker.OCScore.Analysis package¶
Submodules¶
- OCDocker.OCScore.Analysis.Correlation package
- OCDocker.OCScore.Analysis.FeatureImportance module
- OCDocker.OCScore.Analysis.Metrics package
- OCDocker.OCScore.Analysis.NNUtils package
- OCDocker.OCScore.Analysis.PerformanceEvaluation package
- OCDocker.OCScore.Analysis.Plotting package
- Modules
plot_combined_metric_scatter()plot_boxplots()plot_barplots()plot_scatterplot()plot_bar_with_significance()plot_heatmap()plot_normality_and_variance_diagnostics()plot_pca_importance_barplot()plot_pca_importance_histogram()save_pca_importance_groups()save_pca_importance_bins()set_color_mapping()- Submodules
- OCDocker.OCScore.Analysis.RankingMetrics
- OCDocker.OCScore.Analysis.SHAP package
- Submodules
- OCDocker.OCScore.Analysis.SHAP.Cli module
- OCDocker.OCScore.Analysis.SHAP.Data module
- OCDocker.OCScore.Analysis.SHAP.Explain module
- OCDocker.OCScore.Analysis.SHAP.Model module
- OCDocker.OCScore.Analysis.SHAP.Plots module
- OCDocker.OCScore.Analysis.SHAP.Runner module
- OCDocker.OCScore.Analysis.SHAP.Studies module
- Module contents
- Submodules
- OCDocker.OCScore.Analysis.Impact package
- OCDocker.OCScore.Analysis.StatTests package
- OCDocker.OCScore.Analysis.StudyProcessing package
Module contents¶
Unified exports for the OCScore Analysis package.
Usage:
import OCDocker.OCScore.Analysis as ocanalysis
Modules¶
Correlation: Correlation analysis helpers.
FeatureImportance: SHAP-style feature importance utilities.
Impact: Feature impact summaries and plots.
Metrics: Metric computation helpers.
NNUtils: Neural network helper utilities.
PerformanceEvaluation: Performance evaluation workflows.
Plotting: Plotting helpers for analyses.
RankingMetrics: Ranking metrics and tables.
SHAP: SHAP analysis workflows.
StatTests: Statistical test helpers.
StudyProcessing: Study parsing and aggregation utilities.
- OCDocker.OCScore.Analysis.run_shap_analysis(studies, df_path, base_models_folder, study_number, out_dir, background_size=None, eval_size=None, explainer='deep', stratify_by=None, seed=0, save_csv=True)[source]
Run complete SHAP analysis workflow.
- Parameters:
studies (StudyHandles) – Handles to Optuna studies for selecting best model parameters.
df_path (str) – Path to the main dataframe file.
base_models_folder (str) – Base path to the models folder.
study_number (int) – Study number identifier.
out_dir (str) – Output directory for SHAP results.
background_size (Optional[int], optional) – Number of samples to use for SHAP background. If None, uses all training data. Default is None.
eval_size (Optional[int], optional) – Number of samples to evaluate SHAP values for. If None, uses all test data. Default is None.
explainer (str, optional) – SHAP explainer type: “deep” or “kernel”. Default is “deep”.
stratify_by (Optional[List[str]], optional) – Column names to stratify sampling by. Default is None.
seed (int, optional) – Random seed for reproducibility. Default is 0.
save_csv (bool, optional) – Whether to save SHAP values as CSV file. Default is True.
- Returns:
Container with paths to all generated output files.
- Return type:
OutputPaths
- class OCDocker.OCScore.Analysis.OutputPaths(out_dir, feature_importance_png, beeswarm_png, shap_values_npy, shap_values_csv=None)[source]
Bases:
objectContainer for SHAP analysis output file paths.
- Parameters:
out_dir (str) –
feature_importance_png (str) –
beeswarm_png (str) –
shap_values_npy (str) –
shap_values_csv (str | None) –
- out_dir
Base output directory.
- Type:
str
- feature_importance_png
Path to feature importance bar plot PNG file.
- Type:
str
- beeswarm_png
Path to SHAP beeswarm plot PNG file.
- Type:
str
- shap_values_npy
Path to SHAP values NumPy array file.
- Type:
str
- shap_values_csv
Path to SHAP values CSV file. None if CSV was not saved. Default is None.
- Type:
Optional[str], optional
- out_dir: str
- feature_importance_png: str
- beeswarm_png: str
- shap_values_npy: str
- shap_values_csv: str | None = None
- class OCDocker.OCScore.Analysis.StudyHandles(ao_study_name, nn_study_name, seed_study_name, mask_study_name, storage)[source]
Bases:
objectContainer for Optuna study handles and storage information.
- Parameters:
ao_study_name (str) –
nn_study_name (str) –
seed_study_name (str) –
mask_study_name (str) –
storage (str) –
- ao_study_name
Name of the autoencoder optimization study.
- Type:
str
- nn_study_name
Name of the neural network optimization study.
- Type:
str
- seed_study_name
Name of the random seed optimization study.
- Type:
str
- mask_study_name
Name of the feature mask optimization study.
- Type:
str
- storage
Storage path/URL for Optuna studies.
- Type:
str
- ao_study_name: str
- nn_study_name: str
- seed_study_name: str
- mask_study_name: str
- storage: str
- class OCDocker.OCScore.Analysis.BestSelections(autoencoder_params, nn_params, seed, mask)[source]
Bases:
objectContainer for best parameters selected from Optuna studies.
- Parameters:
autoencoder_params (Dict[str, int | float | str | bool]) –
nn_params (Dict[str, int | float | str | bool]) –
seed (int) –
mask (ndarray) –
- autoencoder_params
Best autoencoder parameters.
- Type:
Dict[str, Union[int, float, str, bool]]
- nn_params
Best neural network parameters.
- Type:
Dict[str, Union[int, float, str, bool]]
- seed
Best random seed.
- Type:
int
- mask
Best feature mask as a binary array.
- Type:
np.ndarray
- autoencoder_params: Dict[str, int | float | str | bool]
- nn_params: Dict[str, int | float | str | bool]
- seed: int
- mask: ndarray
- OCDocker.OCScore.Analysis.select_best_from_studies(handles)[source]
Select best parameters from multiple Optuna optimization studies.
- Parameters:
handles (StudyHandles) – Container with study names and storage information.
- Returns:
Container with best parameters from all studies (autoencoder, neural network, seed, mask).
- Return type:
BestSelections
- class OCDocker.OCScore.Analysis.DataHandles(X_train, X_val, X_test, y_val, feature_names)[source]
Bases:
objectData container for SHAP analysis datasets.
- Parameters:
X_train (DataFrame) –
X_val (DataFrame) –
X_test (DataFrame) –
y_val (ndarray) –
feature_names (List[str]) –
- X_train
Training feature matrix.
- Type:
pd.DataFrame
- X_val
Validation feature matrix.
- Type:
pd.DataFrame
- X_test
Test feature matrix.
- Type:
pd.DataFrame
- y_val
Validation target values.
- Type:
np.ndarray
- feature_names
List of feature column names.
- Type:
List[str]
- X_train: DataFrame
- X_val: DataFrame
- X_test: DataFrame
- y_val: ndarray
- feature_names: List[str]
- OCDocker.OCScore.Analysis.load_and_prepare_data(df_path, base_models_folder, study_number, use_pca=False, use_pdb_train=True, random_seed=42)[source]
Load and prepare datasets for SHAP analysis.
- Parameters:
df_path (str) – Path to the main dataframe file.
base_models_folder (str) – Base path to the models folder.
study_number (int) – Study number identifier.
use_pca (bool, optional) – Whether to use PCA-transformed features. Default is False.
use_pdb_train (bool, optional) – Whether to use PDBbind training data. Default is True.
random_seed (int, optional) – Random seed for reproducibility. Default is 42.
- Returns:
Container with train/val/test feature matrices, validation targets, and feature names.
- Return type:
DataHandles
- OCDocker.OCScore.Analysis.build_neural_net(input_dim, autoencoder_params, nn_params, seed, mask=None, use_gpu=None, verbose=False)[source]
Build and configure a neural network for SHAP analysis.
- Parameters:
input_dim (int) – Number of input features.
autoencoder_params (Dict[str, Union[int, float, str, bool]]) – Parameters for the autoencoder component.
nn_params (Dict[str, Union[int, float, str, bool]]) – Parameters for the neural network component.
seed (int) – Random seed for reproducibility.
mask (Optional[list[int] | list[bool]], optional) – Feature mask to apply. Default is None.
use_gpu (Optional[bool], optional) – Whether to use GPU. If None, auto-detects CUDA availability. Default is None.
verbose (bool, optional) – Whether to print verbose output. Default is False.
- Returns:
Configured neural network in evaluation mode.
- Return type:
- OCDocker.OCScore.Analysis.compute_shap_values(neural, X_background, X_eval, explainer='deep', background_size=None, eval_size=None, stratify_by=None, rng_seed=0)[source]
Compute SHAP values for a neural network model using Deep or Kernel explainer.
- Parameters:
neural (object) – Object with a .NN attribute that is a PyTorch neural network model.
X_background (pd.DataFrame) – Background dataset for SHAP. Should contain the same features as X_eval.
X_eval (pd.DataFrame) – Evaluation dataset for which to compute SHAP values.
explainer (str) – Type of SHAP explainer to use: “deep” or “kernel”. Default is “deep”.
background_size (int, optional) – Number of samples to draw from X_background. If None, use all. Default is None.
eval_size (int, optional) – Number of samples to draw from X_eval. If None, use all. Default is None.
stratify_by (list of str, optional) – Column names to stratify sampling by. If None or empty, no stratification is done. Default is None.
rng_seed (int) – Random seed for reproducibility. Default is 0.
- Returns:
SHAP values as a 2D array of shape (n_samples, n_features).
- Return type:
np.ndarray
- OCDocker.OCScore.Analysis.build_impact_overview(chi_df, contingency_dict, metric, presence_level=1, beneficial_custom=None, tau=0.05)[source]
Build a clear impact table with NBS, direction, strength and stats.
- Parameters:
chi_df (pd.DataFrame) – DataFrame with chi-square outcomes, requires at least columns [‘Feature’, “Cramér’s V”, ‘Chi2 Statistic’, ‘p-value’].
contingency_dict (dict[str, pd.DataFrame]) – Mapping from feature -> contingency table with rows as presence (0/1) and columns as ordered categories (strings).
metric (str) – Metric name used to identify beneficial categories (‘AUC’ or ‘RMSE’).
presence_level (Union[int, str], optional) – Row key considered as presence (default: 1). If not found, falls back.
beneficial_custom (Optional[Iterable[str]], optional) – Explicit set of beneficial categories to use instead of defaults.
tau (float, optional) – Tolerance to classify neutral direction by Normalized Binding Score < tau (default: 0.05).
- Returns:
Sorted DataFrame with columns: ‘Feature’, ‘NBS’, ‘Direction’, ‘Strength’, ‘Chi2’, ‘p-value’, ‘CramersV’, ‘FavoredCategory’, ‘HurtCategory’, ‘Normalized Binding Score’, ‘NegLog10P’.
- Return type:
pd.DataFrame
- OCDocker.OCScore.Analysis.plot_impact_arrows_inline_labels(impact_df, title, outpath=None, tau=0.05, thresholds=(0.1, 0.2, 0.35), xpad=0.025, height_per_feature=0.42, max_height=28.0, font_size=10)[source]
Render an arrow plot with inline feature labels based on NBS.
- Parameters:
impact_df (pd.DataFrame) – DataFrame with columns [‘Feature’,’NBS’] and optionally ‘Direction’.
title (str) – Plot title.
outpath (Optional[str], optional) – Output image path. If None, the figure is not saved to disk.
tau (float, optional) – Neutrality threshold on original NBS scale (default: 0.05).
thresholds (Sequence[float], optional) – Thresholds for marker strength derived from Normalized Binding Score (scaled) (default: 0.10, 0.20, 0.35).
xpad (float, optional) – Horizontal text offset relative to marker (default: 0.025).
height_per_feature (float, optional) – Figure height contribution per feature (default: 0.42).
max_height (float, optional) – Maximum figure height (default: 28.0).
font_size (int, optional) – Font size for labels (default: 10).
- Return type:
None
- OCDocker.OCScore.Analysis.get_neutral_features(impact_df, tau=0.05)[source]
Return a sorted list of neutral features by Direction or Normalized Binding Score < tau.
- Parameters:
impact_df (pd.DataFrame) – DataFrame returned by build_impact_overview, with ‘NBS’ and ‘Direction’.
tau (float, optional) – Neutrality threshold on original NBS scale (default: 0.05).
- Returns:
Sorted list of neutral feature names.
- Return type:
list[str]