Regression-Based Active Learning for Accessible Acceleration of Ultra-Large Library Docking

Egor Marin*, Margarita Kovaleva, Maria Kadukova, Khalid Mustafin, Polina Khorn, Andrey Rogachev, Alexey Mishin, Albert Guskov, Valentin Borshchevskiy*

*Corresponding author voor dit werk

OnderzoeksoutputAcademicpeer review

6 Citaten (Scopus)
56 Downloads (Pure)

Samenvatting

Structure-based drug discovery is a process for both hit finding and optimization that relies on a validated three-dimensional model of a target biomolecule, used to rationalize the structure–function relationship for this particular target. An ultralarge virtual screening approach has emerged recently for rapid discovery of high-affinity hit compounds, but it requires substantial computational resources. This study shows that active learning with simple linear regression models can accelerate virtual screening, retrieving up to 90% of the top-1% of the docking hit list after docking just 10% of the ligands. The results demonstrate that it is unnecessary to use complex models, such as deep learning approaches, to predict the imprecise results of ligand docking with a low sampling depth. Furthermore, we explore active learning meta-parameters and find that constant batch size models with a simple ensembling method provide the best ligand retrieval rate. Finally, our approach is validated on the ultralarge size virtual screening data set, retrieving 70% of the top-0.05% of ligands after screening only 2% of the library. Altogether, this work provides a computationally accessible approach for accelerated virtual screening that can serve as a blueprint for the future design of low-compute agents for exploration of the chemical space via large-scale accelerated docking. With recent breakthroughs in protein structure prediction, this method can significantly increase accessibility for the academic community and aid in the rapid discovery of high-affinity hit compounds for various targets.
Originele taal-2English
Artikelnummer3c01661
Pagina's (van-tot)2612–2623
Aantal pagina's12
TijdschriftJournal of chemical information and modeling
Volume64
Nummer van het tijdschrift7
Vroegere onlinedatum29-dec.-2023
DOI's
StatusPublished - 8-apr.-2024

Vingerafdruk

Duik in de onderzoeksthema's van 'Regression-Based Active Learning for Accessible Acceleration of Ultra-Large Library Docking'. Samen vormen ze een unieke vingerafdruk.

Citeer dit