OCDocker.OCScore.Analysis.RankingMetrics¶
Ranking metrics and tables for Test2-style analyses (no CLI, no I/O).
Usage:
import OCDocker.OCScore.Analysis.RankingMetrics as ocrank
This module consolidates ROC/PR/EF-ROC with bootstrap CIs and provides tabular outputs consistent with your existing analysis style.
Public API (metrics/tables): - roc_auc_per_target - pr_auc_per_target - efroc_per_target - roc_auc_pooled - pr_auc_pooled - efroc_pooled - build_test2_tables - build_summary_table
- class OCDocker.OCScore.Analysis.RankingMetrics.BootstrapCI(point, low, high)[source]
Bases:
objectBootstrap confidence interval dataclass.
- Parameters:
point (float) –
low (float) –
high (float) –
- point
The point estimate (mean/median) of the metric.
- Type:
float
- low
The lower bound of the confidence interval (e.g., 2.5th percentile).
- Type:
float
- high
The upper bound of the confidence interval (e.g., 97.5th percentile).
- Type:
float
- point: float
- low: float
- high: float
- OCDocker.OCScore.Analysis.RankingMetrics.roc_auc_per_target(df, target_col, label_col, score_cols, n_boot=500, seed=0, positive_label=None, auto_flip=True)[source]
Compute ROC AUC with 95% CI per target for each score model.
- Parameters:
df (pd.DataFrame) – DataFrame with the data.
target_col (str) – Column name for the target.
label_col (str) – Column name for the labels.
score_cols (Sequence[str]) – Column names for the score models.
n_boot (int) – Number of bootstrap iterations.
seed (int) – Random seed for reproducibility.
positive_label (Optional[Union[str, int]]) – Label to treat as positive (1). If None, infers from data.
auto_flip (bool) – Whether to flip the scores if the ROC AUC is less than 0.5.
- Returns:
DataFrame with columns: [“target”, “model”, “roc_auc”, “ci_low”, “ci_high”, “n_pos”, “n_neg”].
- Return type:
pd.DataFrame
- OCDocker.OCScore.Analysis.RankingMetrics.pr_auc_per_target(df, target_col, label_col, score_cols, n_boot=500, seed=0, positive_label=None, auto_flip=True)[source]
Compute PR AUC (Average Precision) with 95% CI per target for each score model.
- Parameters:
df (pd.DataFrame) – DataFrame with the data.
target_col (str) – Column name for the target.
label_col (str) – Column name for the labels.
score_cols (Sequence[str]) – Column names for the score models.
n_boot (int) – Number of bootstrap iterations.
seed (int) – Random seed for reproducibility.
positive_label (Optional[Union[str, int]]) – Label to treat as positive (1). If None, infers from data.
auto_flip (bool) – Whether to flip the scores if the PR AUC is less than 0.5.
- Returns:
DataFrame with columns: [“target”, “model”, “pr_auc”, “ci_low”, “ci_high”, “n_pos”, “n_neg”].
- Return type:
pd.DataFrame
- OCDocker.OCScore.Analysis.RankingMetrics.efroc_per_target(df, target_col, label_col, score_cols, epsilons=(0.01, 0.05, 0.1), n_boot=500, seed=0, positive_label=None, auto_flip=True)[source]
Compute EF-ROC per target for each score model, with bootstrap CIs.
- Parameters:
df (pd.DataFrame) – DataFrame with the data.
target_col (str) – Column name for the target.
label_col (str) – Column name for the labels.
score_cols (Sequence[str]) – Column names for the score models.
epsilons (Sequence[float]) – List of epsilon (FPR) values to evaluate.
n_boot (int) – Number of bootstrap iterations.
seed (int) – Random seed for reproducibility.
positive_label (Optional[Union[str, int]]) – Label to treat as positive (1). If None, infers from data.
auto_flip (bool) – Whether to flip the scores if the EF-ROC is less than 0.5.
- Returns:
DataFrame with columns: [“target”, “model”, “epsilon”, “ef_roc”, “ci_low”, “ci_high”, “tpr_at_epsilon”, “n_pos”, “n_neg”].
- Return type:
pd.DataFrame
- OCDocker.OCScore.Analysis.RankingMetrics.roc_auc_pooled(df, label_col, score_cols, n_boot=500, seed=0, positive_label=None, auto_flip=True)[source]
Compute pooled ROC AUC with 95% CI for each score model.
- Parameters:
df (pd.DataFrame) – DataFrame with the data.
label_col (str) – Column name for the labels.
score_cols (Sequence[str]) – Column names for the score models.
n_boot (int) – Number of bootstrap iterations.
seed (int) – Random seed for reproducibility.
positive_label (Optional[Union[str, int]]) – Label to treat as positive (1). If None, infers from data.
auto_flip (bool) – Whether to flip the scores if the ROC AUC is less than 0.5.
- Returns:
DataFrame with columns: [“model”, “roc_auc”, “ci_low”, “ci_high”, “n_pos”, “n_neg”].
- Return type:
pd.DataFrame
- OCDocker.OCScore.Analysis.RankingMetrics.pr_auc_pooled(df, label_col, score_cols, n_boot=500, seed=0, positive_label=None, auto_flip=True)[source]
Compute pooled PR AUC (Average Precision) with 95% CI for each score model.
- Parameters:
df (pd.DataFrame) – DataFrame with the data.
label_col (str) – Column name for the labels.
score_cols (Sequence[str]) – Column names for the score models.
n_boot (int) – Number of bootstrap iterations.
seed (int) – Random seed for reproducibility.
positive_label (Optional[Union[str, int]]) – Label to treat as positive (1). If None, infers from data.
auto_flip (bool) – Whether to flip the scores if the PR AUC is less than 0.5.
- Returns:
DataFrame with columns: [“model”, “pr_auc”, “ci_low”, “ci_high”, “n_pos”, “n_neg”].
- Return type:
pd.DataFrame
- OCDocker.OCScore.Analysis.RankingMetrics.efroc_pooled(df, label_col, score_cols, epsilons=(0.01, 0.05, 0.1), n_boot=500, seed=0, positive_label=None, auto_flip=True)[source]
Compute pooled EF-ROC for each score model with bootstrap CIs.
- Parameters:
df (pd.DataFrame) – DataFrame with the data.
label_col (str) – Column name for the labels.
score_cols (Sequence[str]) – Column names for the score models.
epsilons (Sequence[float]) – List of epsilon (FPR) values to evaluate.
n_boot (int) – Number of bootstrap iterations.
seed (int) – Random seed for reproducibility.
positive_label (Optional[Union[str, int]]) – Label to treat as positive (1). If None, infers from data.
auto_flip (bool) – Whether to flip the scores if the EF-ROC is less than 0.5.
- Returns:
DataFrame with columns: [“model”, “epsilon”, “ef_roc”, “ci_low”, “ci_high”, “tpr_at_epsilon”, “n_pos”, “n_neg”].
- Return type:
pd.DataFrame
- OCDocker.OCScore.Analysis.RankingMetrics.build_test2_tables(df, models, target_col='target', label_col='active', positive_label=None, n_boot=500, seed=0, epsilons=(0.01, 0.05, 0.1), auto_flip=True)[source]
Convenience wrapper to compute all tables at once.
- Parameters:
df (pd.DataFrame) – DataFrame with the data.
models (Sequence[str]) – Column names for the score models.
target_col (str) – Column name for the target.
label_col (str) – Column name for the labels.
positive_label (Optional[Union[str, int]]) – Label to treat as positive (1). If None, infers from data.
n_boot (int) – Number of bootstrap iterations.
seed (int) – Random seed for reproducibility.
epsilons (Sequence[float]) – List of epsilon (FPR) values to evaluate.
auto_flip (bool) – Whether to flip the scores if the ROC AUC is less than 0.5.
- Returns:
Mapping with the following keys: - “roc_auc_per_target”: ROC AUC per target with bootstrap CIs. - “pr_auc_per_target”: PR AUC per target with bootstrap CIs. - “efroc_per_target”: EF-ROC table per target across epsilons. - “roc_auc_pooled”: Pooled ROC AUC across all targets with CIs. - “pr_auc_pooled”: Pooled PR AUC across all targets with CIs. - “efroc_pooled”: Pooled EF-ROC across epsilons with CIs. - “summary”: Compact table combining pooled ROC/PR with counts.
- Return type:
dict[str, pandas.DataFrame]
- OCDocker.OCScore.Analysis.RankingMetrics.build_summary_table(summary_targets, summary_pooled, models, eps=(1, 5, 10, 20, 30), include_pr_auc=False, pr_summary_targets=None, pr_summary_pooled=None)[source]
Create a presentation table combining median EF-ROC across targets and pooled EF-ROC at given epsilons.
Optionally include PR-AUC (median and pooled).
- Parameters:
summary_targets (pd.DataFrame) – DataFrame with the targets.
summary_pooled (pd.DataFrame) – DataFrame with the pooled targets.
models (Sequence[str]) – Column names for the score models.
eps (Sequence[int]) – List of epsilon (FPR) values to evaluate.
include_pr_auc (bool) – Whether to include the PR-AUC.
pr_summary_targets (Optional[pd.DataFrame]) – DataFrame with the PR-AUC targets.
pr_summary_pooled (Optional[pd.DataFrame]) – DataFrame with the PR-AUC pooled.
- Returns:
DataFrame with the summary table.
- Return type:
pd.DataFrame