Manual¶
Overview¶
This manual describes the main workflows, inputs and outputs, configuration, and data layout for OCDocker. For full API reference see OCDocker and the package pages (for example OCDocker.Docking package and OCDocker.OCScore package).
Core workflow¶
Configure external tools and databases in
OCDocker.cfg(or setOCDOCKER_CONFIG).Prepare receptor, ligand, and a binding box.
Dock with
vs(single engine) orpipeline(multi-engine).Optionally rescore, analyze, and store metadata.
Key concepts¶
Receptor: a protein structure (usually
.pdb) represented byOCDocker.Receptor.Ligand: a small molecule (
.smi,.sdf,.mol2, or.pdbqt) represented byOCDocker.Ligand.Box: a PDB file defining the search space via
REMARKlines with center and dimensions.Preparation: - Vina/Smina use MGLTools to prepare
.pdbqt(fallback to OpenBabel when enabled). - PLANTS uses SPORES to prepare.mol2.Rescoring: Vina/Smina/PLANTS scoring functions and optional ODDT models.
Inputs¶
Receptor¶
Preferred:
.pdb.Optional: pre-prepared receptor files (
.pdbqtfor Vina/Smina,.mol2for PLANTS).
Ligand¶
.smi,.sdf,.mol2, or.pdbqtare accepted by the CLI.Ligands are prepared automatically when needed.
Box file format¶
The docking box is a PDB file with REMARK lines for center and dimensions. A minimal example:
HEADER CORNERS OF BOX
REMARK CENTER (X Y Z) 12.345 23.456 34.567
REMARK DIMENSIONS (X Y Z) 20.000 20.000 20.000
Boxes can be created programmatically via Ligand.create_box() or in preprocessing pipelines.
Outputs¶
Single engine (ocdocker vs)¶
The CLI creates engine-specific folders next to the ligand directory (vinaFiles, sminaFiles,
or plantsFiles) and writes:
conf_*.txt: generated docking config*.log: docking logs*.pdbqtor PLANTS output directoriesprepare_receptor.logandprepare_ligand.logfor preparation stepsoptional rescoring logs and pose splits
Pipeline (ocdocker pipeline)¶
Outputs are written under --outdir and typically include:
vinaFiles/,sminaFiles/,plantsFiles/(engine runs)prepared_receptor.*andprepared_ligand.*poses_mol2/(poses converted to MOL2)rmsd_matrix.csvandclustering_dendrogram.pngcluster_assignments.csvandclustering_info.json(when clustering succeeds)representative.mol2(selected pose)summary.json(rescoring summary)oddt_rescoring/(if ODDT rescoring is enabled)
Some files appear only when the corresponding step is enabled.
Configuration¶
OCDocker reads configuration from OCDocker.cfg or OCDocker.yml. Use:
ocdocker init-config --conf OCDocker.cfg
# or:
ocdocker init-config --conf OCDocker.yml
Key sections in config files (see OCDocker.cfg.example / OCDocker.yml.example):
Database:
DB_BACKEND,HOST,USER,PASSWORD,DATABASE,OPTIMIZEDB,PORTSQLite:
DB_BACKEND=sqliteand optionalSQLITE_PATHExternal tools:
vina,smina,plants,spores,pythonsh,prepare_ligand,prepare_receptor,obabel,oddtEngine defaults:
vina_*,smina_*,plants_*Data directories:
ocdb(datasets),pca(PCA models)
Environment variables¶
Common:
OCDOCKER_CONFIG: path toOCDocker.cfgOCDOCKER_DB_BACKEND/DB_BACKEND: choose backend (postgresql,mysql,sqlite)OCDOCKER_SQLITE_PATH: explicit SQLite file pathOCDOCKER_NO_AUTO_BOOTSTRAP: disable import-time bootstrapOCDOCKER_TIMEOUT: default timeout (seconds) for external tools
Advanced (debugging subprocesses):
OCDOCKER_SUBPROCESS_TAIL_LINES: number of log tail lines to include on errorsOCDOCKER_DEBUG_SUBPROCESS: include stdout tail and env snapshot in failure reportsOCDOCKER_RAISE_SUBPROCESS: raise exceptions instead of returning error codesOCDOCKER_SKIP_ODDT: skip importing ODDT during bootstrapOCDOCKER_ALLOW_SCRIPT_EXEC: allow trusted in-process script executionOCDOCKER_ALLOW_UNSAFE_DESERIALIZATION: allow trusted pickle/joblib/torch deserialization
Memory collection strategy¶
Processing pipelines use shared helpers in
OCDocker.Processing.GarbageCollection.For small workloads (
<= 8items), explicit GC is eager (gc.collect()every item).For larger workloads, GC runs periodically (every
32processed items) to reduce overhead.Both preprocessing and postprocessing still run a final
gc.collect()at routine end.
CLI¶
Main commands (see ocdocker <command> --help):
vs: single-engine docking with optional rescoring of all posespipeline: multi-engine docking, RMSD clustering, representative selection, rescoringshap: OCScore SHAP analysisconsole: interactive console with OCDocker pre-loadedscript: run a Python script with OCDocker pre-loaded (requires--allow-unsafe-execorOCDOCKER_ALLOW_SCRIPT_EXEC=1)doctor: diagnostics (binaries, deps, DB)manifest: generate reproducibility manifest (versions/runtime/tooling)init-config: create a starter config fileversion: print installed version
Global options:
--conf,--multiprocess,--no-multiprocess,--update-databases--output-level,--overwrite,--log-file,--no-stdout-log
Programmatic manifest API:
import OCDocker.Toolbox.Reproducibility as ocrepro
manifest = ocrepro.generate_reproducibility_manifest(include_python_packages=False)
_ = ocrepro.write_reproducibility_manifest("reproducibility_manifest.json")
Trusted runtime helper¶
For trusted scripts that need deserialization gates enabled at runtime:
from OCDocker.Toolbox.Security import allow_unsafe_runtime
allow_unsafe_runtime(deserialization=True, script_exec=False)
Examples¶
ocdocker vs \
--engine vina \
--receptor path/to/receptor.pdb \
--ligand path/to/ligand.sdf \
--box path/to/box0.pdb \
--timeout 600 \
--store-db
ocdocker pipeline \
--receptor path/to/receptor.pdb \
--ligand path/to/ligand.sdf \
--box path/to/box0.pdb \
--engines vina,smina,plants \
--rescoring-engines vina,smina,oddt \
--timeout 900 \
--store-db
Python API¶
Minimal example using Vina:
import OCDocker.Receptor as ocr
import OCDocker.Ligand as ocl
import OCDocker.Docking.Vina as ocvina
receptor = ocr.Receptor("receptor.pdb", name="receptor")
ligand = ocl.Ligand("ligand.sdf", name="ligand")
vina = ocvina.Vina(
"conf_vina.txt",
"box0.pdb",
receptor,
"prepared_receptor.pdbqt",
ligand,
"prepared_ligand.pdbqt",
"vina.log",
"vina_output.pdbqt",
name="vina_run",
overwrite_config=True,
)
vina.run_prepare_receptor()
vina.run_prepare_ligand()
vina.run_docking()
Data layout for screening sets¶
OCDocker supports a simple folder layout for virtual screening datasets:
receptor/
compounds/
candidates/
molecule_1/
molecule_2/
decoys/
molecule_A/
molecule_B/
ligands/
molecule_a/
molecule_b/
receptor: receptor structure (.pdb)candidates: unknown binders (typical screening set)decoys: negative controls for evaluationligands: known actives (training/validation)
Rescoring and OCScore¶
Engine rescoring: Vina/Smina/PLANTS scoring functions are configured in
OCDocker.cfg.ODDT rescoring: optional ML-based scoring via
OCDocker.Rescoring.ODDT.OCScore: training, optimization, and analysis pipelines for consensus scoring. See OCDocker.OCScore package and the examples under Examples.
Database and persistence¶
Default backend is PostgreSQL; MySQL and SQLite are also supported.
Use
--store-dbin CLI commands to store receptor/ligand descriptors plus supported rescoring columns in the database.Database schemas are defined under OCDocker.DB package and related model pages.
Diagnostics and troubleshooting¶
Run
ocdocker doctorto validate configuration, binaries, Python deps, and DB access.If docking tools fail, confirm paths in
OCDocker.cfgand review preparation logs.