Machine Learning May Sometimes Simply Capture Literature Popularity Trends: A Case Study of Heterocyclic Suzuki−Miyaura Coupling
In this provocative study from the University of Illinois and Allchemy Inc, the ability of machine learning methods to predict the optimal conditions for some Suzuki-Miayaura couplings has been tested and compared to simply the most popular methods according to frequency of use in the literature. The focus was specifically on solvent and base selection. The machine learning used neural network approaches (including feed forward and graph convolution), word-embedding and positive-unlabeled learning techniques. The authors make the point that synthetic chemistry outcomes are dependent on the materials available in each lab and localised curiosities such as individual chemists’ preferences and those of the groups in which they work. Most techniques have a satisfactory accuracy for base selection but poor performance with solvent selection. In terms of yield prediction, none of the machine learning performed significantly better than the baseline popularity weighted yields or even just assuming all yields are 77% (the average overall). Although there were small benefits to the use of some of the previously described methods for predicting reaction conditions, these were not consistent or marked to simply following popularity. Among the authors conclusions is “A way around this problem is to begin to augment the available literature data by systematic and standardized experiments in which reactions are repeated under multiple conditions such that meaningful conclusions about better vs worse ones can be learned.” As with much in the field of machine learning and AI in drug discovery, the gap in the field is not computational methods but data of an appropriate quality; investment elsewhere may prove to be displacement activity.
Bartosz A. Grzybowski et al