OCDocker.OCScore.Analysis.RankingMetrics¶

Ranking metrics and tables for Test2-style analyses (no CLI, no I/O).

Usage:

import OCDocker.OCScore.Analysis.RankingMetrics as ocrank

This module consolidates ROC/PR/EF-ROC with bootstrap CIs and provides tabular outputs consistent with your existing analysis style.

Public API (metrics/tables): - roc_auc_per_target - pr_auc_per_target - efroc_per_target - roc_auc_pooled - pr_auc_pooled - efroc_pooled - build_test2_tables - build_summary_table

class OCDocker.OCScore.Analysis.RankingMetrics.BootstrapCI(point, low, high)[source]

Bases: object

Bootstrap confidence interval dataclass.

Parameters:

point (float) –
low (float) –
high (float) –

point

The point estimate (mean/median) of the metric.

Type:: float

low

The lower bound of the confidence interval (e.g., 2.5th percentile).

Type:: float

high

The upper bound of the confidence interval (e.g., 97.5th percentile).

Type:: float

point: float

low: float

high: float

OCDocker.OCScore.Analysis.RankingMetrics.roc_auc_per_target(df, target_col, label_col, score_cols, n_boot=500, seed=0, positive_label=None, auto_flip=True)[source]

Compute ROC AUC with 95% CI per target for each score model.

Parameters:

df (pd.DataFrame) – DataFrame with the data.
target_col (str) – Column name for the target.
label_col (str) – Column name for the labels.
score_cols (Sequence[str]) – Column names for the score models.
n_boot (int) – Number of bootstrap iterations.
seed (int) – Random seed for reproducibility.
positive_label (Optional[Union[str, int]]) – Label to treat as positive (1). If None, infers from data.
auto_flip (bool) – Whether to flip the scores if the ROC AUC is less than 0.5.

Returns:

DataFrame with columns: [“target”, “model”, “roc_auc”, “ci_low”, “ci_high”, “n_pos”, “n_neg”].

Return type:

pd.DataFrame

OCDocker.OCScore.Analysis.RankingMetrics.pr_auc_per_target(df, target_col, label_col, score_cols, n_boot=500, seed=0, positive_label=None, auto_flip=True)[source]

Compute PR AUC (Average Precision) with 95% CI per target for each score model.

Parameters:

df (pd.DataFrame) – DataFrame with the data.
target_col (str) – Column name for the target.
label_col (str) – Column name for the labels.
score_cols (Sequence[str]) – Column names for the score models.
n_boot (int) – Number of bootstrap iterations.
seed (int) – Random seed for reproducibility.
positive_label (Optional[Union[str, int]]) – Label to treat as positive (1). If None, infers from data.
auto_flip (bool) – Whether to flip the scores if the PR AUC is less than 0.5.

Returns:

DataFrame with columns: [“target”, “model”, “pr_auc”, “ci_low”, “ci_high”, “n_pos”, “n_neg”].

Return type:

pd.DataFrame

OCDocker.OCScore.Analysis.RankingMetrics.efroc_per_target(df, target_col, label_col, score_cols, epsilons=(0.01, 0.05, 0.1), n_boot=500, seed=0, positive_label=None, auto_flip=True)[source]

Compute EF-ROC per target for each score model, with bootstrap CIs.

Parameters:

df (pd.DataFrame) – DataFrame with the data.
target_col (str) – Column name for the target.
label_col (str) – Column name for the labels.
score_cols (Sequence[str]) – Column names for the score models.
epsilons (Sequence[float]) – List of epsilon (FPR) values to evaluate.
n_boot (int) – Number of bootstrap iterations.
seed (int) – Random seed for reproducibility.
positive_label (Optional[Union[str, int]]) – Label to treat as positive (1). If None, infers from data.
auto_flip (bool) – Whether to flip the scores if the EF-ROC is less than 0.5.

Returns:

DataFrame with columns: [“target”, “model”, “epsilon”, “ef_roc”, “ci_low”, “ci_high”, “tpr_at_epsilon”, “n_pos”, “n_neg”].

Return type:

pd.DataFrame

OCDocker.OCScore.Analysis.RankingMetrics.roc_auc_pooled(df, label_col, score_cols, n_boot=500, seed=0, positive_label=None, auto_flip=True)[source]

Compute pooled ROC AUC with 95% CI for each score model.

Parameters:

df (pd.DataFrame) – DataFrame with the data.
label_col (str) – Column name for the labels.
score_cols (Sequence[str]) – Column names for the score models.
n_boot (int) – Number of bootstrap iterations.
seed (int) – Random seed for reproducibility.
positive_label (Optional[Union[str, int]]) – Label to treat as positive (1). If None, infers from data.
auto_flip (bool) – Whether to flip the scores if the ROC AUC is less than 0.5.

Returns:

DataFrame with columns: [“model”, “roc_auc”, “ci_low”, “ci_high”, “n_pos”, “n_neg”].

Return type:

pd.DataFrame

OCDocker.OCScore.Analysis.RankingMetrics.pr_auc_pooled(df, label_col, score_cols, n_boot=500, seed=0, positive_label=None, auto_flip=True)[source]

Compute pooled PR AUC (Average Precision) with 95% CI for each score model.

Parameters:

df (pd.DataFrame) – DataFrame with the data.
label_col (str) – Column name for the labels.
score_cols (Sequence[str]) – Column names for the score models.
n_boot (int) – Number of bootstrap iterations.
seed (int) – Random seed for reproducibility.
positive_label (Optional[Union[str, int]]) – Label to treat as positive (1). If None, infers from data.
auto_flip (bool) – Whether to flip the scores if the PR AUC is less than 0.5.

Returns:

DataFrame with columns: [“model”, “pr_auc”, “ci_low”, “ci_high”, “n_pos”, “n_neg”].

Return type:

pd.DataFrame

OCDocker.OCScore.Analysis.RankingMetrics.efroc_pooled(df, label_col, score_cols, epsilons=(0.01, 0.05, 0.1), n_boot=500, seed=0, positive_label=None, auto_flip=True)[source]

Compute pooled EF-ROC for each score model with bootstrap CIs.

Parameters:

df (pd.DataFrame) – DataFrame with the data.
label_col (str) – Column name for the labels.
score_cols (Sequence[str]) – Column names for the score models.
epsilons (Sequence[float]) – List of epsilon (FPR) values to evaluate.
n_boot (int) – Number of bootstrap iterations.
seed (int) – Random seed for reproducibility.
positive_label (Optional[Union[str, int]]) – Label to treat as positive (1). If None, infers from data.
auto_flip (bool) – Whether to flip the scores if the EF-ROC is less than 0.5.

Returns:

DataFrame with columns: [“model”, “epsilon”, “ef_roc”, “ci_low”, “ci_high”, “tpr_at_epsilon”, “n_pos”, “n_neg”].

Return type:

pd.DataFrame

OCDocker.OCScore.Analysis.RankingMetrics.build_test2_tables(df, models, target_col='target', label_col='active', positive_label=None, n_boot=500, seed=0, epsilons=(0.01, 0.05, 0.1), auto_flip=True)[source]

Convenience wrapper to compute all tables at once.

Parameters:

df (pd.DataFrame) – DataFrame with the data.
models (Sequence[str]) – Column names for the score models.
target_col (str) – Column name for the target.
label_col (str) – Column name for the labels.
positive_label (Optional[Union[str, int]]) – Label to treat as positive (1). If None, infers from data.
n_boot (int) – Number of bootstrap iterations.
seed (int) – Random seed for reproducibility.
epsilons (Sequence[float]) – List of epsilon (FPR) values to evaluate.
auto_flip (bool) – Whether to flip the scores if the ROC AUC is less than 0.5.

Returns:

Mapping with the following keys: - “roc_auc_per_target”: ROC AUC per target with bootstrap CIs. - “pr_auc_per_target”: PR AUC per target with bootstrap CIs. - “efroc_per_target”: EF-ROC table per target across epsilons. - “roc_auc_pooled”: Pooled ROC AUC across all targets with CIs. - “pr_auc_pooled”: Pooled PR AUC across all targets with CIs. - “efroc_pooled”: Pooled EF-ROC across epsilons with CIs. - “summary”: Compact table combining pooled ROC/PR with counts.

Return type:

dict[str, pandas.DataFrame]

OCDocker.OCScore.Analysis.RankingMetrics.build_summary_table(summary_targets, summary_pooled, models, eps=(1, 5, 10, 20, 30), include_pr_auc=False, pr_summary_targets=None, pr_summary_pooled=None)[source]

Create a presentation table combining median EF-ROC across targets and pooled EF-ROC at given epsilons.

Optionally include PR-AUC (median and pooled).

Parameters:

summary_targets (pd.DataFrame) – DataFrame with the targets.
summary_pooled (pd.DataFrame) – DataFrame with the pooled targets.
models (Sequence[str]) – Column names for the score models.
eps (Sequence[int]) – List of epsilon (FPR) values to evaluate.
include_pr_auc (bool) – Whether to include the PR-AUC.
pr_summary_targets (Optional[pd.DataFrame]) – DataFrame with the PR-AUC targets.
pr_summary_pooled (Optional[pd.DataFrame]) – DataFrame with the PR-AUC pooled.

Returns:

DataFrame with the summary table.

Return type:

pd.DataFrame