OCDocker.OCScore.Dimensionality.future.datasets module

Datasets for the future Autoencoder pipeline.

class OCDocker.OCScore.Dimensionality.future.datasets.AutoencoderDataset(*args, **kwargs)[source]

Bases: Dataset

Dataset for autoencoder training with optional energy targets.

Parameters:
  • features (np.ndarray) – Input feature matrix.

  • energies (np.ndarray | None, optional) – Energy labels for supervised head. If None, all samples are unlabeled.

  • feature_mask (np.ndarray | None, optional) – Feature mask (element-wise) applied to inputs.

Notes

The dataset always returns a triplet: - features: torch.Tensor of shape (F,) - energies: torch.Tensor of shape (1,) (filled with 0.0 if missing) - energy_mask: torch.Tensor bool indicating if the energy label is valid

Examples

>>> import numpy as np
>>> from OCDocker.OCScore.Dimensionality.future.datasets import AutoencoderDataset
>>> features = np.random.rand(100, 20)  # 100 samples, 20 features each
>>> energies = np.random.rand(100)      # 100 energy labels
>>> feature_mask = np.random.randint(0, 2, size=(100, 20))  # Random binary mask
>>> dataset = AutoencoderDataset(features, energies, feature_mask)
>>> sample_features, sample_energy, sample_mask = dataset[0]
>>> print(sample_features.shape)  # torch.Size([20])
>>> print(sample_energy.shape)    # torch.Size([1])
>>> print(sample_mask)            # tensor(True) or tensor(False)
__getitem__(idx)[source]

Return a dataset sample.

Parameters:

idx (int) – Sample index.

Returns:

Features, energies, and energy mask tensors.

Return type:

tuple[torch.Tensor, torch.Tensor, torch.Tensor]

__init__(features, energies=None, feature_mask=None)[source]

Initialize dataset.

Parameters:
  • features (np.ndarray) – Input feature matrix.

  • energies (np.ndarray | None, optional) – Energy targets, by default None.

  • feature_mask (np.ndarray | None, optional) – Feature mask, by default None.

Return type:

None

__len__()[source]

Return dataset length.

Returns:

Number of samples.

Return type:

int