OCDocker.OCScore.Analysis.FeatureImportance module¶

Test2 SHAP utilities for reproducible computation (no I/O, no plots).

Usage:

from OCDocker.OCScore.Analysis.FeatureImportance import compute_shap_values

Public API: - build_stratified_background - make_explainer - compute_shap_values - shap_importance_table

OCDocker.OCScore.Analysis.FeatureImportance.build_stratified_background(X, meta, strata_cols, per_stratum=50, seed=0)[source]¶

Build a stratified background set by sampling up to per_stratum rows from each combination in strata_cols (e.g., [“target”,”active”]). Preserves class/target balance in the background while bounding its size.

Parameters:

X (Union[np.ndarray, pd.DataFrame]) – Feature matrix.
meta (pd.DataFrame) – Metadata DataFrame with stratification columns.
strata_cols (Sequence[str]) – Column names to stratify by.
per_stratum (int, optional) – Number of samples per stratum. Default is 50.
seed (int, optional) – Random seed. Default is 0.

Returns:

Background array with shape (n_bg, n_features).

Return type:

np.ndarray

OCDocker.OCScore.Analysis.FeatureImportance.compute_shap_values(explainer, X_eval, task='binary', nsamples='auto', class_index=1)[source]¶

Compute SHAP values for the evaluation set.

Parameters:

explainer (Any) – SHAP explainer object.
X_eval (Union[np.ndarray, pd.DataFrame]) – Evaluation dataset.
task (str, optional) – Task type. Default is “binary”.
nsamples (Optional[Union[int, str]], optional) – Number of samples for KernelExplainer. Ignored by Tree/Deep explainers when not applicable. Default is “auto”.
class_index (int, optional) – For binary classification with explainers returning per-class arrays (list), select this class index. Default is 1.

Returns:

Dictionary with keys: - “shap_values”: (n_samples, n_features) array - “base_values”: (n_samples,) or scalar

Return type:

Dict[str, np.ndarray]

OCDocker.OCScore.Analysis.FeatureImportance.make_explainer(model, background, method='auto', link=None, predict_fn=None)[source]¶

Create a SHAP Explainer for the given model and background.

Parameters:

model (Any) – The model to explain. Can be tree model, PyTorch/TensorFlow model, or any model.
background (np.ndarray) – Background dataset for SHAP explainer.
method (str, optional) – Method to use: “auto” (TreeExplainer if tree model; DeepExplainer if torch/TF; else KernelExplainer), “tree”, “deep”, or “kernel”. Default is “auto”.
link (Optional[str], optional) – Optional link function (e.g., “logit”) for KernelExplainer. Default is None.
predict_fn (Optional[Callable], optional) – Override prediction function (expects shape (n, n_classes) or (n,)). Default is None.

Returns:

Tuple of (explainer, predict_proba_index). predict_proba_index = 1 is commonly used for binary classification when the explainer returns per-class SHAP values (lists).

Return type:

Tuple[Any, int]

OCDocker.OCScore.Analysis.FeatureImportance.shap_importance_table(shap_values, feature_names=None, k=None)[source]¶

Compute mean absolute SHAP values per feature and return a ranked table.

Parameters:

shap_values (np.ndarray) – SHAP values array of shape (n_samples, n_features).
feature_names (Optional[Sequence[str]], optional) – Names of features. If None, generates names like “f0”, “f1”, etc. Default is None.
k (int | None) –

Returns:

DataFrame with columns: [“feature”, “mean_abs_shap”, “rank”]

Return type:

pd.DataFrame