This is an unusual pick for the bucket list – an editorial. However, its an editorial that contains much that echoes through the years as one over-hyped method for prediction is replaced by another. Maggiore notes the mismatch between the dimensionality of chemical space and that representations of it that we use in many statistical models. He also highlights the importance of activity cliffs. These are the large discontinuities in activity that we expect when thinking about molecules fitting into active sites where it is easy to imagine how a small change in structure might take a molecule from binding tightly to not binding at all (because it is now too big or places a hydrogen bond donor towards a donor on the protein etc). These activity cliffs undermine the similarity principle that much QSAR modeling relies upon and are often not well characterized in the activity data – inactive compounds tend not to be followed up experimentally. The fact that each set of descriptors provides a very different map of chemical space is also a problem – every molecule’s nearest neighbor set can change when different sets of descriptors are used. The chastening conclusion is that “all QSAR models are flawed to some degree” – recognizing and dealing with this truth is one of the challenges for chemoinformatics.
On outliers and activity cliffs – why QSAR often disappoints.
Maggiore J. Chem. Inf. Model. 2006, 46, 1535.
#BucketListPapers #DrugDiscovery #MedicinalChemistry #BeTheBestChemistYouCanBe