As the abstract says “The maximum achievable accuracy of in silico models depends on the quality of the experimental data.” The paper describes beautifully, methods to examine a given dataset and so “defines a natural upper limit to the predictive performance possible.” This #BucketListPapers uses publicly available data from ChEMBL and Figure 5 illustrates the spread in the data for pairs for measurements for a given molecule – the 2.5 log lines really show how much measurements can vary. An appreciation of this degree in variation is important when building ML models for drug discovery.
Kramer : The Experimental Uncertainty of Heterogeneous Public Ki Data.
J. Med. Chem. 2012, 55, 5165-5173
A great follow up, by the same author, is the impact of uncertainty on Matched Molecular Pair Analyis.
Matched Molecular Pair Analysis: Significance and the Impact of Experimental Uncertainty.
J.Med. Chem. 2014, 57, 9, 3786–3802.
#BucketListPapers #DrugDiscovery #MedicinalChemistry #BeTheBestChemistYouCanBe