Exploration of structure-activity relationships for the SARS-CoV-2 macrodomain from shape-based fragment linking and active learning
One of the known challenges in drug discovery is finding good initial hit compounds in the vast expanse of druglike chemical space. Traditional high throughput screening (HTS) libraries only cover a small area of chemical space and are costly to run, while computational search methods require a trade-off between accuracy and speed, and still struggle to assess large enough regions of chemical space. Recently, fragment screening has been adopted to find new hit matter. This month’s computational chemistry paper of the month utilises data from a recent X-ray crystal structure, fragment screen of the SARS-CoV-2 nonstructural protein 3 macrodomain (Mac1), along with a shape-based computational screening methodology (FrankenROCS), to screen Enamine’s HTS and REAL libraries.
Starting from 214 crystalised fragments, FrankenROCS was used to pair the fragments and apply rapid overlay of chemical structures (ROCS) to screen the Enamine HTS library (2.1 million compounds) for hits with high shape similarity to the fragment pairs. 1000s of FrankenROCS results were manually curated to 39 purchased compounds, which were soaked into Mac1 crystals, resulting in 10 resolved structures. To provide an even deeper search, FrankenROCS was combined with Thompson Sampling (TS), an active learning algorithm that searches reagent space, to search the Enamine REAL library. Enamine REAL is composed of building blocks (reagents) that can be linked by chemical reactions to create over 22 billion theoretical products – it would be infeasible to assess each of these by ROCS. TS utilises the building block architecture of Enamine REAL to initially explore the reagent space through iterative learning cycles, and then prioritise the search space to screen only the products that are most likely to be hits, significantly reducing the computational effort. TS-FrankenROCS combined with automated filtering and clustering identified 36 molecules, 32 of which were synthesised and soaked with Mac1.
The second half of the paper focuses on the follow up of the hits. Although many of the hits contain carboxylic acids that bind to the Mac1 oxyanion site, the authors note the importance of the identification of several neutral compounds, since a carboxylic acid series identified in a previous campaign was found to be poorly permeable and unstable. The most potent neutral hit was chosen for SAR exploration, which finally resulted in several low micromolar lead-like analogues that exhibited good solubility, cellular permeability and mouse microsomal stability.
This work demonstrates the usefulness of FrankenROCS and TS for deep virtual screening of very large combinatorial libraries, coupled with traditional medicinal chemistry design, to find new lead-like series that can take projects in new directions. The publication of neutral SARS-CoV-2 Mac1 compounds and 137 Mac1 crystal structures also provides a wealth of data for those working in the area.
Correy, G.J. et. al. Science Advances. 2025. 11(22) DOI: 10.1126/sciadv.ads7187