OCDocker.OCScore.Optimization.Transformer module

Module with a helper to perform the optimization of the Transformer parameters model using Optuna.

It is imported as:

import OCDocker.OCScore.Optimization.Transformer as octrans

OCDocker.OCScore.Optimization.Transformer.optimize_Transformer(df_path, storage_id, base_models_folder, data=None, storage='sqlite:///Transformer_optimization.db', use_pdb_train=True, no_scores=False, only_scores=False, use_PCA=False, pca_type=95, pca_model='', run_Trans_optimization=True, num_processes_Trans=4, total_trials_Trans=1000, random_seed=42, load_if_exists=True, use_gpu=True, parallel_backend='joblib', verbose=False)[source]

Function to optimize the Transformer model using Optuna.

Parameters:
  • df_path (str) – The path to the dataset file.

  • storage_id (int) – The storage ID of the dataset.

  • base_models_folder (str) – The base models folder.

  • data (dict, optional) – The data dictionary. Default is None (treated as empty dict). If not empty, the data dictionary will be used instead of loading the data. This is useful for multiprocessing to avoid loading the data multiple times.

  • storage (str, optional) – The storage string for the database. The default is “sqlite:///Transformer_optimization.db”.

  • use_pdb_train (bool, optional) – Whether to use the PDB train dataset. The default is True.

  • no_scores (bool, optional) – Whether to use the no scores dataset. The default is True.

  • only_scores (bool, optional) – Whether to use the only scores dataset. The default is True.

  • use_PCA (bool, optional) – Whether to use PCA. The default is True.

  • pca_type (int, optional) – The PCA type to use. The default is 95.

  • pca_model (Union[str, PCA], optional) – The PCA model to use. Default is “”.

  • run_Trans_optimization (bool, optional) – Whether to run the Transformer optimization. The default is False.

  • num_processes (int, optional) – The number of processes to use. The default is 4.

  • total_trials (int, optional) – The total number of trials to run. The default is 2000.

  • random_seed (int, optional) – The random seed to use. The default is 42.

  • load_if_exists (bool, optional) – Whether to load the study if it already exists. The default is True.

  • use_gpu (bool, optional) – Whether to use the GPU. The default is True.

  • parallel_backend (str, optional) – The parallel backend to use. The default is “joblib”. Options are “joblib” and “multiprocessing”. [ATTENTION] multiprocessing has shown to have some nasty bugs while testing this library. It is highly recommended to use joblib.

  • verbose (bool, optional) – Whether to print verbose output. The default is False.

  • num_processes_Trans (int) –

  • total_trials_Trans (int) –

Raises:

ValueError – If the parallel backend is invalid.

Return type:

None

OCDocker.OCScore.Optimization.Transformer.optimize(df_path, storage_id, base_models_folder, data=None, storage='sqlite:///Transformer_optimization.db', use_pdb_train=True, no_scores=False, only_scores=False, use_PCA=False, pca_type=95, pca_model='', run_Trans_optimization=True, num_processes_Trans=4, total_trials_Trans=1000, random_seed=42, load_if_exists=True, use_gpu=True, parallel_backend='joblib', verbose=False)

Function to optimize the Transformer model using Optuna.

Parameters:
  • df_path (str) – The path to the dataset file.

  • storage_id (int) – The storage ID of the dataset.

  • base_models_folder (str) – The base models folder.

  • data (dict, optional) – The data dictionary. Default is None (treated as empty dict). If not empty, the data dictionary will be used instead of loading the data. This is useful for multiprocessing to avoid loading the data multiple times.

  • storage (str, optional) – The storage string for the database. The default is “sqlite:///Transformer_optimization.db”.

  • use_pdb_train (bool, optional) – Whether to use the PDB train dataset. The default is True.

  • no_scores (bool, optional) – Whether to use the no scores dataset. The default is True.

  • only_scores (bool, optional) – Whether to use the only scores dataset. The default is True.

  • use_PCA (bool, optional) – Whether to use PCA. The default is True.

  • pca_type (int, optional) – The PCA type to use. The default is 95.

  • pca_model (Union[str, PCA], optional) – The PCA model to use. Default is “”.

  • run_Trans_optimization (bool, optional) – Whether to run the Transformer optimization. The default is False.

  • num_processes (int, optional) – The number of processes to use. The default is 4.

  • total_trials (int, optional) – The total number of trials to run. The default is 2000.

  • random_seed (int, optional) – The random seed to use. The default is 42.

  • load_if_exists (bool, optional) – Whether to load the study if it already exists. The default is True.

  • use_gpu (bool, optional) – Whether to use the GPU. The default is True.

  • parallel_backend (str, optional) – The parallel backend to use. The default is “joblib”. Options are “joblib” and “multiprocessing”. [ATTENTION] multiprocessing has shown to have some nasty bugs while testing this library. It is highly recommended to use joblib.

  • verbose (bool, optional) – Whether to print verbose output. The default is False.

  • num_processes_Trans (int) –

  • total_trials_Trans (int) –

Raises:

ValueError – If the parallel backend is invalid.

Return type:

None