OCDocker.OCScore.Scoring module¶
Set of functions to manage scoring and prediction in OCDocker in the context of scoring functions.
Usage:
import OCDocker.OCScore.Scoring as ocscoring
- OCDocker.OCScore.Scoring.get_score(model_path, data=None, pca_model=None, mask=None, score_columns_list=['SMINA', 'VINA', 'ODDT', 'PLANTS'], scaler='standard', scaler_path=None, invert_conditionally=True, normalize=True, no_scores=False, only_scores=False, columns_to_skip_pca=None, serialization_method='auto', use_gpu=True, enforce_reference_column_order=True)[source]¶
Get scores by loading a model and applying the same preprocessing pipeline.
This function loads a trained model and applies it to input data following the same preprocessing pipeline used during training. The data can be provided as a DataFrame or read from a database.
- Parameters:
model_path (str) – Path to the saved model file.
data (pd.DataFrame | str, optional) – Input data. Can be: - A pandas DataFrame with the features - A string path to a CSV file - None to read from database (requires DB setup) Default is None.
pca_model (str | PCA, optional) – Path to the PCA model file or a PCA model object. If provided, PCA transformation will be applied. If None, no PCA is used. Default is None.
mask (list | np.ndarray, optional) – Feature mask array of 0s and 1s to filter features before prediction. Length should match the number of features after preprocessing. 1 means keep the feature, 0 means remove it. Default is None (no masking applied).
score_columns_list (list[str], optional) – List of score column prefixes to identify score columns. Default is [“SMINA”, “VINA”, “ODDT”, “PLANTS”].
scaler (str, optional) – Scaler to use for normalization if scaler_path is not provided. Options are “standard” or “minmax”. If scaler_path is provided, this is ignored. Default is “standard”.
scaler_path (str, optional) – Path to a saved scaler file (saved with joblib/pickle). If provided, the saved scaler will be loaded and used instead of creating a new one. This ensures the same scaling parameters from training are applied. Default is None.
invert_conditionally (bool, optional) – Whether to invert values conditionally (for VINA, SMINA, PLANTS columns). Default is True.
normalize (bool, optional) – Whether to normalize the data. Default is True.
no_scores (bool, optional) – If True, remove score columns from the data. Default is False.
only_scores (bool, optional) – If True, keep only score columns and metadata. Default is False.
columns_to_skip_pca (list[str], optional) – List of columns to skip during PCA transformation. If None, defaults to metadata columns: [“receptor”, “ligand”, “name”, “type”, “db”]. Default is None.
serialization_method (str, optional) – Serialization method used to save the model. Options are “joblib” or “pickle”. Default is “joblib”.
enforce_reference_column_order (bool, optional) – If True, reorder columns to match the reference_column_order from the config file before any preprocessing (scaling/PCA/masking). This is critical to keep feature alignment consistent with training. Default is True.
use_gpu (bool) –
- Returns:
Predicted scores. Returns a DataFrame if input was a DataFrame (preserving metadata columns), otherwise returns a numpy array.
- Return type:
pd.DataFrame | np.ndarray
- Raises:
FileNotFoundError – If the model file or PCA model file is not found.
ValueError – If data is None and database is not available, or if invalid parameters are provided.