OCDocker.OCScore.Analysis.FeatureImportance module¶
Test2 SHAP utilities for reproducible computation (no I/O, no plots).
Usage:
from OCDocker.OCScore.Analysis.FeatureImportance import compute_shap_values
Public API: - build_stratified_background - make_explainer - compute_shap_values - shap_importance_table
- OCDocker.OCScore.Analysis.FeatureImportance.build_stratified_background(X, meta, strata_cols, per_stratum=50, seed=0)[source]¶
Build a stratified background set by sampling up to per_stratum rows from each combination in strata_cols (e.g., [“target”,”active”]). Preserves class/target balance in the background while bounding its size.
- Parameters:
X (Union[np.ndarray, pd.DataFrame]) – Feature matrix.
meta (pd.DataFrame) – Metadata DataFrame with stratification columns.
strata_cols (Sequence[str]) – Column names to stratify by.
per_stratum (int, optional) – Number of samples per stratum. Default is 50.
seed (int, optional) – Random seed. Default is 0.
- Returns:
Background array with shape (n_bg, n_features).
- Return type:
np.ndarray
- OCDocker.OCScore.Analysis.FeatureImportance.compute_shap_values(explainer, X_eval, task='binary', nsamples='auto', class_index=1)[source]¶
Compute SHAP values for the evaluation set.
- Parameters:
explainer (Any) – SHAP explainer object.
X_eval (Union[np.ndarray, pd.DataFrame]) – Evaluation dataset.
task (str, optional) – Task type. Default is “binary”.
nsamples (Optional[Union[int, str]], optional) – Number of samples for KernelExplainer. Ignored by Tree/Deep explainers when not applicable. Default is “auto”.
class_index (int, optional) – For binary classification with explainers returning per-class arrays (list), select this class index. Default is 1.
- Returns:
Dictionary with keys: - “shap_values”: (n_samples, n_features) array - “base_values”: (n_samples,) or scalar
- Return type:
Dict[str, np.ndarray]
- OCDocker.OCScore.Analysis.FeatureImportance.make_explainer(model, background, method='auto', link=None, predict_fn=None)[source]¶
Create a SHAP Explainer for the given model and background.
- Parameters:
model (Any) – The model to explain. Can be tree model, PyTorch/TensorFlow model, or any model.
background (np.ndarray) – Background dataset for SHAP explainer.
method (str, optional) – Method to use: “auto” (TreeExplainer if tree model; DeepExplainer if torch/TF; else KernelExplainer), “tree”, “deep”, or “kernel”. Default is “auto”.
link (Optional[str], optional) – Optional link function (e.g., “logit”) for KernelExplainer. Default is None.
predict_fn (Optional[Callable], optional) – Override prediction function (expects shape (n, n_classes) or (n,)). Default is None.
- Returns:
Tuple of (explainer, predict_proba_index). predict_proba_index = 1 is commonly used for binary classification when the explainer returns per-class SHAP values (lists).
- Return type:
Tuple[Any, int]
- OCDocker.OCScore.Analysis.FeatureImportance.shap_importance_table(shap_values, feature_names=None, k=None)[source]¶
Compute mean absolute SHAP values per feature and return a ranked table.
- Parameters:
shap_values (np.ndarray) – SHAP values array of shape (n_samples, n_features).
feature_names (Optional[Sequence[str]], optional) – Names of features. If None, generates names like “f0”, “f1”, etc. Default is None.
k (int | None) –
- Returns:
DataFrame with columns: [“feature”, “mean_abs_shap”, “rank”]
- Return type:
pd.DataFrame