OCDocker.OCScore.Dimensionality.future.AutoencoderOptimizer module¶
Module to perform the optimization of the future Autoencoder pipeline.
It is imported as:
from OCDocker.OCScore.Dimensionality.future.AutoencoderOptimizer import AutoencoderOptimizer
- class OCDocker.OCScore.Dimensionality.future.AutoencoderOptimizer.AutoencoderOptimizer(X_train, X_test, X_validation=None, encoding_dims=(16, 256), storage='sqlite:///autoencoder.db', models_folder='./models/Autoencoder/', random_seed=42, use_gpu=True, verbose=False, y_train=None, y_test=None, y_validation=None, X_unlabeled=None, future_config=None)[source]¶
Bases:
objectFuture Autoencoder optimizer (denoising + multi-task).
- Parameters:
X_train (np.ndarray | pd.DataFrame | pd.Series) – Training features.
X_test (np.ndarray | pd.DataFrame | pd.Series) – Test features.
X_validation (np.ndarray | pd.DataFrame | pd.Series | None, optional) – Validation features. Default None.
encoding_dims (tuple, optional) – Min/max latent dimensions. Default (16, 256).
storage (str, optional) – Optuna storage string. Default “sqlite:///autoencoder.db”.
models_folder (str, optional) – Folder to save models. Default “./models/Autoencoder/”.
random_seed (int, optional) – Random seed. Default 42.
use_gpu (bool, optional) – Use GPU if available. Default True.
verbose (bool, optional) – Verbose mode. Default False.
y_train (np.ndarray | pd.Series | None, optional) – Energy labels for training. Default None.
y_test (np.ndarray | pd.Series | None, optional) – Energy labels for testing. Default None.
y_validation (np.ndarray | pd.Series | None, optional) – Energy labels for validation. Default None.
X_unlabeled (np.ndarray | pd.DataFrame | None, optional) – Extra unlabeled data for reconstruction. Default None.
future_config (dict | None, optional) – Configuration overrides for the future pipeline.
Notes
The configuration supports two training stages: - stage1: denoising reconstruction + optional energy supervision (default enabled). - stage2: optional fine-tuning stage with alternate weights/noise settings.
Data Flow¶
Features: X_train is used for training; X_validation (if provided) is used for validation, otherwise X_test is used as the evaluation split.
Energy labels: y_train/y_validation/y_test are optional. If labels are not provided, the energy head is disabled and only reconstruction is optimized.
Extra unlabeled data: X_unlabeled is concatenated to X_train for reconstruction-only learning (no energy labels are expected for it).
Configuration¶
The future_config dict is merged into the defaults using keys below:
- model
- encoder_hidden_sizeslist[int]
Hidden sizes for the encoder (excluding latent).
- latent_dimint
Latent embedding dimension.
- decoder_sizeslist[int] | None
Decoder sizes; if None, a mirrored decoder is built.
- activationstr
Activation for encoder/decoder hidden layers.
- latent_activationstr
Activation applied to latent embeddings.
- decoder_output_activationstr
Activation for the decoder output layer.
- dropout, latent_dropoutfloat
Dropout probabilities.
- normstr
Normalization type (“batch”, “layer”, “none”).
- use_vaebool
Enable VAE reparameterization and KL term.
- energy_head_sizeslist[int] | None
Hidden sizes for energy head (None disables).
- stage1 / stage2
- enabledbool
Whether to run the stage.
- epochs, batch_sizeint
Training schedule and batch size.
- lr, weight_decayfloat
Optimizer hyperparameters.
- clip_gradfloat
Gradient clipping max-norm (0 disables).
- recon_lossstr
Reconstruction loss type (“mse”, “rmse”, “mae”, “huber”).
- energy_lossstr
Energy loss type (“mse”, “rmse”, “mae”, “huber”).
- huber_deltafloat
Delta parameter for Huber/SmoothL1.
- lambda_recon, lambda_energyfloat
Weights for reconstruction and energy losses.
- lambda_l2float
L2 penalty weight on latent embeddings.
- lambda_contractivefloat
Contractive penalty weight (Jacobian norm).
- beta_vaefloat
KL weight when use_vae is True.
- noise_typestr
Noise type (“mask”, “gaussian”, “swap”, “mask+gaussian”, “none”).
- mask_prob, gaussian_std, swap_probfloat
Noise parameters.
- ramp_epochs_energy, ramp_epochs_reconint
Epochs for ramping loss weights.
- ramp_typestr
Ramp schedule (“linear” or “sigmoid”).
- early_stopping_patienceint
Stop after this many epochs without improvement.
- mixed_precisionbool
Enable AMP when running on CUDA.
- optimization
- loss_balancingstr
“fixed”, “uncertainty”, or “gradnorm”.
- gradnorm_alphafloat
GradNorm alpha (only used when loss_balancing=”gradnorm”).
- objective_metricstr
Metric key to optimize (e.g., “val_combined_loss”).
- search_vaebool
If True, Optuna can toggle VAE usage and beta.
- checkpoint
- save_bestbool
Save best model checkpoint.
- save_encoderbool
Save encoder-only checkpoint.
- data
- use_energy_headbool
If False, energy head is disabled regardless of labels.
Example
>>> trainer = AutoencoderOptimizer(X_train, X_test, X_validation, verbose=True) >>> study = trainer.optimize(n_trials=10)
- __init__(X_train, X_test, X_validation=None, encoding_dims=(16, 256), storage='sqlite:///autoencoder.db', models_folder='./models/Autoencoder/', random_seed=42, use_gpu=True, verbose=False, y_train=None, y_test=None, y_validation=None, X_unlabeled=None, future_config=None)[source]¶
Initialize the future autoencoder optimizer.
- Parameters:
X_train (np.ndarray | pd.DataFrame | pd.Series) – Training features.
X_test (np.ndarray | pd.DataFrame | pd.Series) – Test features.
X_validation (np.ndarray | pd.DataFrame | pd.Series | None, optional) – Validation features, by default None.
encoding_dims (tuple, optional) – Latent dimension bounds, by default (16, 256).
storage (str, optional) – Optuna storage string, by default “sqlite:///autoencoder.db”.
models_folder (str, optional) – Folder to save checkpoints, by default “./models/Autoencoder/”.
random_seed (int, optional) – Random seed, by default 42.
use_gpu (bool, optional) – Use GPU if available, by default True.
verbose (bool, optional) – Verbose mode, by default False.
y_train (np.ndarray | pd.Series | None, optional) – Training energy labels, by default None.
y_test (np.ndarray | pd.Series | None, optional) – Test energy labels, by default None.
y_validation (np.ndarray | pd.Series | None, optional) – Validation energy labels, by default None.
X_unlabeled (np.ndarray | pd.DataFrame | None, optional) – Extra unlabeled features, by default None.
future_config (dict | None, optional) – Configuration overrides, by default None.
- Return type:
None
- objective(trial)[source]¶
Objective function for Optuna optimization.
- Parameters:
trial (optuna.Trial) – Optuna trial instance.
- Returns:
Objective value.
- Return type:
float
- optimize(direction='minimize', n_trials=10, study_name='AE_Future_Optimization', load_if_exists=True, sampler=optuna.samplers.TPESampler, n_jobs=1)[source]¶
Optimize the autoencoder using Optuna.
- Parameters:
direction (str, optional) – Optimization direction. Default “minimize”.
n_trials (int, optional) – Number of trials. Default 10.
study_name (str, optional) – Study name. Default “AE_Future_Optimization”.
load_if_exists (bool, optional) – Load existing study if present. Default True.
sampler (optuna.samplers.BaseSampler, optional) – Optuna sampler. Default TPESampler().
n_jobs (int, optional) – Number of parallel jobs. Default 1.
- Returns:
Optuna study object.
- Return type:
optuna.study.Study