OCDocker.OCScore.Analysis.FeatureImportance module

Test2 SHAP utilities for reproducible computation (no I/O, no plots).

Usage:

from OCDocker.OCScore.Analysis.FeatureImportance import compute_shap_values

Public API: - build_stratified_background - make_explainer - compute_shap_values - shap_importance_table

OCDocker.OCScore.Analysis.FeatureImportance.build_stratified_background(X, meta, strata_cols, per_stratum=50, seed=0)[source]

Build a stratified background set by sampling up to per_stratum rows from each combination in strata_cols (e.g., [“target”,”active”]). Preserves class/target balance in the background while bounding its size.

Parameters:
  • X (Union[np.ndarray, pd.DataFrame]) – Feature matrix.

  • meta (pd.DataFrame) – Metadata DataFrame with stratification columns.

  • strata_cols (Sequence[str]) – Column names to stratify by.

  • per_stratum (int, optional) – Number of samples per stratum. Default is 50.

  • seed (int, optional) – Random seed. Default is 0.

Returns:

Background array with shape (n_bg, n_features).

Return type:

np.ndarray

OCDocker.OCScore.Analysis.FeatureImportance.compute_shap_values(explainer, X_eval, task='binary', nsamples='auto', class_index=1)[source]

Compute SHAP values for the evaluation set.

Parameters:
  • explainer (Any) – SHAP explainer object.

  • X_eval (Union[np.ndarray, pd.DataFrame]) – Evaluation dataset.

  • task (str, optional) – Task type. Default is “binary”.

  • nsamples (Optional[Union[int, str]], optional) – Number of samples for KernelExplainer. Ignored by Tree/Deep explainers when not applicable. Default is “auto”.

  • class_index (int, optional) – For binary classification with explainers returning per-class arrays (list), select this class index. Default is 1.

Returns:

Dictionary with keys: - “shap_values”: (n_samples, n_features) array - “base_values”: (n_samples,) or scalar

Return type:

Dict[str, np.ndarray]

OCDocker.OCScore.Analysis.FeatureImportance.make_explainer(model, background, method='auto', link=None, predict_fn=None)[source]

Create a SHAP Explainer for the given model and background.

Parameters:
  • model (Any) – The model to explain. Can be tree model, PyTorch/TensorFlow model, or any model.

  • background (np.ndarray) – Background dataset for SHAP explainer.

  • method (str, optional) – Method to use: “auto” (TreeExplainer if tree model; DeepExplainer if torch/TF; else KernelExplainer), “tree”, “deep”, or “kernel”. Default is “auto”.

  • link (Optional[str], optional) – Optional link function (e.g., “logit”) for KernelExplainer. Default is None.

  • predict_fn (Optional[Callable], optional) – Override prediction function (expects shape (n, n_classes) or (n,)). Default is None.

Returns:

Tuple of (explainer, predict_proba_index). predict_proba_index = 1 is commonly used for binary classification when the explainer returns per-class SHAP values (lists).

Return type:

Tuple[Any, int]

OCDocker.OCScore.Analysis.FeatureImportance.shap_importance_table(shap_values, feature_names=None, k=None)[source]

Compute mean absolute SHAP values per feature and return a ranked table.

Parameters:
  • shap_values (np.ndarray) – SHAP values array of shape (n_samples, n_features).

  • feature_names (Optional[Sequence[str]], optional) – Names of features. If None, generates names like “f0”, “f1”, etc. Default is None.

  • k (int | None) –

Returns:

DataFrame with columns: [“feature”, “mean_abs_shap”, “rank”]

Return type:

pd.DataFrame