OCDocker.OCScore.DNN.future.datasets module¶
Datasets and samplers for the future DNN pipeline.
- class OCDocker.OCScore.DNN.future.datasets.EnergyDataset(*args, **kwargs)[source]¶
Bases:
DatasetDataset for regression targets (energy labels).
- Parameters:
features (np.ndarray) – Input features.
energies (np.ndarray) – Regression targets (e.g., energies).
mask (np.ndarray | None, optional) – Feature mask for single-branch inputs.
Notes
Returns (features, energy) where energy has shape (1,).
Examples
>>> import numpy as np >>> from OCDocker.OCScore.DNN.future.datasets import EnergyDataset >>> features = np.random.rand(100, 20) # 100 samples, 20 features each >>> energies = np.random.rand(100) # 100 energy labels >>> mask = np.random.randint(0, 2, size=(100, 20)) # Random binary mask >>> dataset = EnergyDataset(features, energies, mask) >>> sample_features, sample_energy = dataset[0] >>> print(sample_features.shape) # torch.Size([20]) >>> print(sample_energy.shape) # torch.Size([1])
- __getitem__(idx)[source]¶
Return a dataset sample.
- Parameters:
idx (int) – Sample index.
- Returns:
Features and energy target tensors.
- Return type:
tuple
- class OCDocker.OCScore.DNN.future.datasets.TargetRankingDataset(*args, **kwargs)[source]¶
Bases:
DatasetDataset for ranking with per-target grouping.
- Parameters:
features (np.ndarray) – Input features.
labels (np.ndarray) – Binary labels (1 for active, 0 for decoy).
target_ids (Sequence[str]) – Target identifiers per sample (used for grouping).
mask (np.ndarray | None, optional) – Feature mask for single-branch inputs.
Notes
Returns (features, label, target_id) where target_id is an integer index. Target ids are stable based on first appearance order in target_ids.
- __getitem__(idx)[source]¶
Return a dataset sample.
- Parameters:
idx (int) – Sample index.
- Returns:
Features, label, and target id.
- Return type:
tuple
- __init__(features, labels, target_ids, mask=None)[source]¶
Initialize target ranking dataset.
- Parameters:
features (np.ndarray) – Input features.
labels (np.ndarray) – Binary labels.
target_ids (Sequence[str]) – Target identifiers.
mask (np.ndarray | None, optional) – Feature mask, by default None.
- Return type:
None
- class OCDocker.OCScore.DNN.future.datasets.TargetBatchSampler(*args, **kwargs)[source]¶
Bases:
List[int]Sampler that yields batches grouped by target.
- Parameters:
target_to_indices (dict[int, list[int]]) – Mapping from target id to list of indices.
batch_size (int | None, optional) – If provided, limits batch size per target. If None, uses full target.
shuffle (bool, optional) – Shuffle target order each epoch. Default True.
split_target_batches (bool, optional) – If True, split each target into multiple batches of size batch_size. If False, sample a single batch per target. Default False.
Notes
This sampler groups indices by target id to preserve per-target ranking structure during training.
Examples
>>> from OCDocker.OCScore.DNN.future.datasets import TargetBatchSampler >>> target_to_indices = { ... 0: [0, 1, 2, 3], ... 1: [4, 5, 6], ... 2: [7, 8] ... } >>> sampler = TargetBatchSampler(target_to_indices, batch_size=2, shuffle=False, ... split_target_batches=True) >>> for batch in sampler: ... print(batch) [0, 1] [2, 3] [4, 5] [6] [7, 8]
- __init__(target_to_indices, batch_size=None, shuffle=True, split_target_batches=False)[source]¶
Initialize target batch sampler.
- Parameters:
target_to_indices (dict[int, list[int]]) – Mapping from target id to indices.
batch_size (int | None, optional) – Maximum batch size per target, by default None.
shuffle (bool, optional) – Shuffle targets each epoch, by default True.
split_target_batches (bool, optional) – Split targets into multiple batches, by default False.
- Return type:
None