OCDocker.OCScore.Dimensionality.AutoencoderOptimizer module

Module to perform the optimization of the Autoencoder.

It is imported as:

from OCDocker.OCScore.NN.AutoencoderOptimizer import AutoencoderOptimizer

class OCDocker.OCScore.Dimensionality.AutoencoderOptimizer.AutoencoderDataset(*args, **kwargs)[source]

Bases: Dataset

Dataset class for the Autoencoder. It is used to create the DataLoader for the training and testing of the Autoencoder.

Parameters:

features (torch.Tensor) – The features to be used in the Autoencoder. It should be a torch.Tensor of shape (n_samples, n_features).

__getitem__(idx)[source]

Returns the features and the target for the given index. It is used by the DataLoader to get the samples from the dataset.

Parameters:

idx (int) – The index of the sample to be returned.

Returns:

The features and the target for the given index. It is used by the DataLoader to get the samples from the dataset.

Return type:

tuple

__init__(features)[source]

Constructor for the AutoencoderDataset class. It is used to create the DataLoader for the training and testing of the Autoencoder.

Parameters:

features (torch.Tensor) – The features to be used in the Autoencoder. It should be a torch.Tensor of shape (n_samples, n_features).

Return type:

None

__len__()[source]

Returns the length of the dataset. It is used by the DataLoader to know how many samples are in the dataset.

Returns:

The length of the dataset. It is used by the DataLoader to know how many samples are in the dataset.

Return type:

int

class OCDocker.OCScore.Dimensionality.AutoencoderOptimizer.Autoencoder(*args, **kwargs)[source]

Bases: Module

Autoencoder class. It is used to create the Autoencoder model. It is a subclass of nn.Module. It is used to create the Autoencoder model.

Parameters:
  • input_size (int) – The size of the input. It should be a positive integer.

  • encoding_dim (list) – The size of the encoding. It should be a list of integers.

  • encoder_activation_fn (list[tuple(type[nn.Module], dict[str, Any]]) – The activation functions to be used in the encoder. It should be a list of tuples where each tuple will be the activation function and its parameters.

  • decoder_activation_fn (list[tuple(type[nn.Module], dict[str, Any]]) – The activation functions to be used in the decoder. It should be a list of tuples where each tuple will be the activation function and its parameters.

  • decoding_dim (list) – The size of the decoding. It should be a list of integers.

  • device (torch.device, optional) – The device to be used. It should be a torch.device. Default is torch.device(“cpu”).

__init__(input_size, encoding_dim, encoder_activation_fn, decoder_activation_fn, decoding_dim, device=torch.device)[source]

Constructor for the Autoencoder class. It is used to create the Autoencoder model.

Parameters:
  • input_size (int) – The size of the input. It should be a positive integer.

  • encoding_dim (list) – The size of the encoding. It should be a list of integers.

  • encoder_activation_fn (list[tuple[type[nn.Module], dict[str, Any]]]) – The activation functions to be used in the encoder. It should be a list of tuples where each tuple will be the activation function and its parameters.

  • decoder_activation_fn (list[tuple[type[nn.Module], dict[str, Any]]]) – The activation functions to be used in the decoder. It should be a list of tuples where each tuple will be the activation function and its parameters.

  • decoding_dim (list) – The size of the decoding. It should be a list of integers.

  • device (torch.device, optional) – The device to be used. It should be a torch.device. Default is torch.device(“cpu”).

Return type:

None

forward(x)[source]

Forward pass of the Autoencoder. It is used to pass the input through the encoder and decoder.

Parameters:

x (torch.Tensor) – The input to be passed through the Autoencoder. It should be a torch.Tensor of shape (n_samples, n_features).

Returns:

The output of the Autoencoder. It should be a torch.Tensor of shape (n_samples, n_features).

Return type:

torch.Tensor

get_decoder()[source]

Get the decoder. It is used to get the decoder of the Autoencoder.

Returns:

The decoder of the Autoencoder. It is used to get the decoder of the Autoencoder.

Return type:

nn.Module

get_decoder_topology()[source]

Get the topology of the decoder. It is used to get the layers of the decoder.

Returns:

The topology of the decoder. It is used to get the layers of the decoder.

Return type:

list

get_encoder()[source]

Get the encoder. It is used to get the encoder of the Autoencoder.

Returns:

The encoder of the Autoencoder. It is used to get the encoder of the Autoencoder.

Return type:

nn.Module

get_encoder_topology()[source]

Get the topology of the encoder. It is used to get the layers of the encoder.

Returns:

The topology of the encoder. It is used to get the layers of the encoder.

Return type:

list

class OCDocker.OCScore.Dimensionality.AutoencoderOptimizer.AutoencoderOptimizer(X_train, X_test, X_validation=None, encoding_dims=(16, 256), storage='sqlite:///autoencoder.db', models_folder='./models/Autoencoder/', random_seed=42, use_gpu=True, verbose=False)[source]

Bases: object

AutoencoderOptimizer class. It is used to optimize the Autoencoder using Optuna. It is used to create the AutoencoderOptimizer object.

Parameters:
  • X_train (Union[np.ndarray, pd.DataFrame, pd.Series]) – The training data to be used in the Autoencoder. It should be a numpy array, pandas DataFrame or pandas Series.

  • X_test (Union[np.ndarray, pd.DataFrame, pd.Series]) – The testing data to be used in the Autoencoder. It should be a numpy array, pandas DataFrame or pandas Series.

  • X_validation (Union[None, Union[np.ndarray, pd.DataFrame, pd.Series]], optional) – The validation data to be used in the Autoencoder. It should be a numpy array, pandas DataFrame or pandas Series. Default is None.

  • encoding_dims (tuple, optional) – The dimensions of the encoding. It should be a tuple of two integers. Default is (16, 256).

  • storage (str, optional) – The storage string for the study. It should be a string. Default is “sqlite:///autoencoder.db”.

  • models_folder (str, optional) – The folder where the models will be saved. It should be a string. Default is “./models/Autoencoder/”.

  • random_seed (int, optional) – The random seed to be used in the Autoencoder. It should be a positive integer. Default is 42.

  • use_gpu (bool, optional) – If True, the Autoencoder will use the GPU. It should be a boolean. Default is True.

  • verbose (bool, optional) – If True, the Autoencoder will print the training and testing information. It should be a boolean. Default is False.

device: torch.device
__init__(X_train, X_test, X_validation=None, encoding_dims=(16, 256), storage='sqlite:///autoencoder.db', models_folder='./models/Autoencoder/', random_seed=42, use_gpu=True, verbose=False)[source]

Constructor for the AutoencoderOptimizer class. It is used to create the AutoencoderOptimizer object.

Parameters:
  • X_train (Union[np.ndarray, pd.DataFrame, pd.Series]) – The training data to be used in the Autoencoder. It should be a numpy array, pandas DataFrame or pandas Series.

  • X_test (Union[np.ndarray, pd.DataFrame, pd.Series]) – The testing data to be used in the Autoencoder. It should be a numpy array, pandas DataFrame or pandas Series.

  • X_validation (Union[None, Union[np.ndarray, pd.DataFrame, pd.Series]], optional) – The validation data to be used in the Autoencoder. It should be a numpy array, pandas DataFrame or pandas Series. Default is None.

  • encoding_dims (tuple, optional) – The dimensions of the encoding. It should be a tuple of two integers. Default is (16, 256).

  • storage (str, optional) – The storage string for the study. It should be a string. Default is “sqlite:///autoencoder.db”.

  • models_folder (str, optional) – The folder where the models will be saved. It should be a string. Default is “./models/Autoencoder/”.

  • random_seed (int, optional) – The random seed to be used in the Autoencoder. It should be a positive integer. Default is 42.

  • use_gpu (bool, optional) – If True, the Autoencoder will use the GPU. It should be a boolean. Default is True.

  • verbose (bool, optional) – If True, the Autoencoder will print the training and testing information. It should be a boolean. Default is False.

Return type:

None

X_train: torch.Tensor
train_loader: torch.utils.data.DataLoader | None
X_test: torch.Tensor
test_loader: torch.utils.data.DataLoader | None
X_validation: torch.Tensor | None
validation_loader: torch.utils.data.DataLoader | None
evaluate_autoencoder(model, criterion, loader=None)[source]

Evaluate the Autoencoder. It is used to evaluate the Autoencoder.

Parameters:
  • model (nn.Module) – The Autoencoder model to be evaluated. It should be a nn.Module.

  • criterion (nn.Module) – The loss function to be used in the Autoencoder. It should be a nn.Module.

  • loader (Union[None, DataLoader], optional) – The DataLoader to be used in the Autoencoder. It should be a DataLoader. Default is None.

Returns:

The RMSE of the Autoencoder. It is used to get the RMSE of the Autoencoder.

Return type:

float

objective(trial)[source]

Objective function for the Optuna optimization. It is used to optimize the Autoencoder.

Parameters:

trial (optuna.Trial) – The Optuna trial to be used in the Autoencoder. It should be a optuna.Trial.

Returns:

The RMSE of the Autoencoder. It is used to get the RMSE of the Autoencoder.

Return type:

float

optimize(direction='maximize', n_trials=10, study_name='NN_Optimization', load_if_exists=True, sampler=optuna.samplers.TPESampler, n_jobs=1)[source]

Optimize the Autoencoder. It is used to optimize the Autoencoder.

Parameters:
  • direction (str, optional) – The direction of the optimization. It should be a string. Default is “maximize”.

  • n_trials (int, optional) – The number of trials to be used in the Autoencoder. It should be a positive integer. Default is 10.

  • study_name (str, optional) – The name of the study. It should be a string. Default is “NN_Optimization”.

  • load_if_exists (bool, optional) – If True, the study will be loaded if it exists. It should be a boolean. Default is True.

  • sampler (optuna.samplers.BaseSampler, optional) – The sampler to be used in the Autoencoder. It should be a optuna.samplers.BaseSampler. Default is TPESampler().

  • n_jobs (int, optional) – The number of jobs to be used in the Autoencoder. It should be a positive integer. Default is 1.

Returns:

The Optuna study. It is used to get the study of the Autoencoder.

Return type:

optuna.study.Study

set_random_seed()[source]

Set the random seed for the Autoencoder. It is used to set the random seed for the Autoencoder.

Return type:

None

train_autoencoder(model, optimizer, criterion, clip_grad, epochs, trial)[source]

Train the Autoencoder. It is used to train the Autoencoder.

Parameters:
  • model (nn.Module) – The Autoencoder model to be trained. It should be a nn.Module.

  • optimizer (optim.Optimizer) – The optimizer to be used in the Autoencoder. It should be a optim.Optimizer.

  • criterion (nn.Module) – The loss function to be used in the Autoencoder. It should be a nn.Module.

  • clip_grad (float) – The gradient clipping value to be used in the Autoencoder. It should be a float.

  • epochs (int) – The number of epochs to be used in the Autoencoder. It should be a positive integer.

  • trial (optuna.Trial) – The Optuna trial to be used in the Autoencoder. It should be a optuna.Trial.

Returns:

The best validation and training RMSE. It is used to get the best validation and training RMSE.

Return type:

tuple