OCDocker.OCScore.Dimensionality.future.datasets module¶
Datasets for the future Autoencoder pipeline.
- class OCDocker.OCScore.Dimensionality.future.datasets.AutoencoderDataset(*args, **kwargs)[source]¶
Bases:
DatasetDataset for autoencoder training with optional energy targets.
- Parameters:
features (np.ndarray) – Input feature matrix.
energies (np.ndarray | None, optional) – Energy labels for supervised head. If None, all samples are unlabeled.
feature_mask (np.ndarray | None, optional) – Feature mask (element-wise) applied to inputs.
Notes
The dataset always returns a triplet: - features: torch.Tensor of shape (F,) - energies: torch.Tensor of shape (1,) (filled with 0.0 if missing) - energy_mask: torch.Tensor bool indicating if the energy label is valid
Examples
>>> import numpy as np >>> from OCDocker.OCScore.Dimensionality.future.datasets import AutoencoderDataset >>> features = np.random.rand(100, 20) # 100 samples, 20 features each >>> energies = np.random.rand(100) # 100 energy labels >>> feature_mask = np.random.randint(0, 2, size=(100, 20)) # Random binary mask >>> dataset = AutoencoderDataset(features, energies, feature_mask) >>> sample_features, sample_energy, sample_mask = dataset[0] >>> print(sample_features.shape) # torch.Size([20]) >>> print(sample_energy.shape) # torch.Size([1]) >>> print(sample_mask) # tensor(True) or tensor(False)
- __getitem__(idx)[source]¶
Return a dataset sample.
- Parameters:
idx (int) – Sample index.
- Returns:
Features, energies, and energy mask tensors.
- Return type:
tuple[torch.Tensor, torch.Tensor, torch.Tensor]
- __init__(features, energies=None, feature_mask=None)[source]¶
Initialize dataset.
- Parameters:
features (np.ndarray) – Input feature matrix.
energies (np.ndarray | None, optional) – Energy targets, by default None.
feature_mask (np.ndarray | None, optional) – Feature mask, by default None.
- Return type:
None