Inflation of correlation in the pursuit of drug-likeness. Kenny and Montanari.

Picture BLP 26 100

Kenny, P.W. & Montanari, C.A.  J Comput Aided Mol Des (2013) 27: 1.

 

One of the themes of the bucket list is papers that provide guidance that will help avoid some of the bad practice that the unwary can stumble into. In this example, Kenny and Montanari dissect a number of published studies that feature some bad practice in data analysis. The most important of these is the effect of binning continuous data (as discussed briefly in the previous entry). The point is made very clearly in the graph below – when all of the data (left hand side) are thrown into bins and then the average (or other summary statistic) is used to represent the data in the bin (right hand side), a rather weak correlation can be made to appear like a very strong one. Furthermore, when comparing the means for each bin, a large dataset like the one shown can be used to show that there is a statistically significant difference between the means for each bin. A graphical examination of the data makes it clear that, statistically significant or not, such differences are of little use to the drug discoverer who is as likely to see molecules get worse as they are to get better if they follow the apparent trend shown in the right hand plot. It is important to be very cautious about binning continuous data and to be aware of the variation in data – the real data should be shown wherever possible and standard deviations as well as standard means presented so that the reader can make an informed judgement. These problems can reach another level when the y-axis is also a categorical variable and a number of published reports are evaluated where some problematic graphical representations of the data are used.

#BucketListPapers #DrugDiscovery #MedicinalChemistry #BeTheBestChemistYouCanBe