OCDocker.OCScore.DNN.future.datasets module¶

Datasets and samplers for the future DNN pipeline.

class OCDocker.OCScore.DNN.future.datasets.EnergyDataset(*args, **kwargs)[source]¶

Bases: Dataset

Dataset for regression targets (energy labels).

Parameters:

features (np.ndarray) – Input features.
energies (np.ndarray) – Regression targets (e.g., energies).
mask (np.ndarray | None, optional) – Feature mask for single-branch inputs.

Notes

Returns (features, energy) where energy has shape (1,).

Examples

>>> import numpy as np
>>> from OCDocker.OCScore.DNN.future.datasets import EnergyDataset
>>> features = np.random.rand(100, 20)  # 100 samples, 20 features each
>>> energies = np.random.rand(100)      # 100 energy labels
>>> mask = np.random.randint(0, 2, size=(100, 20))  # Random binary mask
>>> dataset = EnergyDataset(features, energies, mask)
>>> sample_features, sample_energy = dataset[0]
>>> print(sample_features.shape)  # torch.Size([20])
>>> print(sample_energy.shape)    # torch.Size([1])

__getitem__(idx)[source]¶

Return a dataset sample.

Parameters:: idx (int) – Sample index.
Returns:: Features and energy target tensors.
Return type:: tuple

__init__(features, energies, mask=None)[source]¶

Initialize energy dataset.

Parameters:

features (np.ndarray) – Input features.
energies (np.ndarray) – Energy targets.
mask (np.ndarray | None, optional) – Feature mask, by default None.

Return type:

None

__len__()[source]¶

Return dataset length.

Returns:: Number of samples.
Return type:: int

class OCDocker.OCScore.DNN.future.datasets.TargetRankingDataset(*args, **kwargs)[source]¶

Bases: Dataset

Dataset for ranking with per-target grouping.

Parameters:

features (np.ndarray) – Input features.
labels (np.ndarray) – Binary labels (1 for active, 0 for decoy).
target_ids (Sequence[str]) – Target identifiers per sample (used for grouping).
mask (np.ndarray | None, optional) – Feature mask for single-branch inputs.

Notes

Returns (features, label, target_id) where target_id is an integer index. Target ids are stable based on first appearance order in target_ids.

__getitem__(idx)[source]¶

Return a dataset sample.

Parameters:: idx (int) – Sample index.
Returns:: Features, label, and target id.
Return type:: tuple

__init__(features, labels, target_ids, mask=None)[source]¶

Initialize target ranking dataset.

Parameters:

features (np.ndarray) – Input features.
labels (np.ndarray) – Binary labels.
target_ids (Sequence[str]) – Target identifiers.
mask (np.ndarray | None, optional) – Feature mask, by default None.

Return type:

None

__len__()[source]¶

Return dataset length.

Returns:: Number of samples.
Return type:: int

class OCDocker.OCScore.DNN.future.datasets.TargetBatchSampler(*args, **kwargs)[source]¶

Bases: List[int]

Sampler that yields batches grouped by target.

Parameters:

target_to_indices (dict[int, list[int]]) – Mapping from target id to list of indices.
batch_size (int | None, optional) – If provided, limits batch size per target. If None, uses full target.
shuffle (bool, optional) – Shuffle target order each epoch. Default True.
split_target_batches (bool, optional) – If True, split each target into multiple batches of size batch_size. If False, sample a single batch per target. Default False.

Notes

This sampler groups indices by target id to preserve per-target ranking structure during training.

Examples

>>> from OCDocker.OCScore.DNN.future.datasets import TargetBatchSampler
>>> target_to_indices = {
...     0: [0, 1, 2, 3],
...     1: [4, 5, 6],
...     2: [7, 8]
... }
>>> sampler = TargetBatchSampler(target_to_indices, batch_size=2, shuffle=False,
...                              split_target_batches=True)
>>> for batch in sampler:
...     print(batch)
[0, 1]
[2, 3]
[4, 5]
[6]
[7, 8]

__init__(target_to_indices, batch_size=None, shuffle=True, split_target_batches=False)[source]¶

Initialize target batch sampler.

Parameters:

target_to_indices (dict[int, list[int]]) – Mapping from target id to indices.
batch_size (int | None, optional) – Maximum batch size per target, by default None.
shuffle (bool, optional) – Shuffle targets each epoch, by default True.
split_target_batches (bool, optional) – Split targets into multiple batches, by default False.

Return type:

None

__iter__()[source]¶

Yield batches grouped by target.

Yields:: list[int] – Indices for a batch.

__len__()[source]¶

Return number of batches.

Returns:: Number of batches per epoch.
Return type:: int