MedChemica Logo

Tel: +44 (0)1625 238843

MedChemica Logo
MedChemica Logo
Bucket List2021-10-06T11:40:31+00:00

MedChemica Bucket List

Accelerating the life sciences ecosystem

The MedChemica Bucket List.

The MedChemica Bucket List

BucketListPapers 67/100: From HTS hit to candiate drug.

Invacaftor discovery

Cystic fibrosis (CF) is a lethal genetic disease that affects approximately 70000 patients worldwide. The story of the discovery of Ivacaftor starts with a specialised high content screening campaign, which yielded a fairly average ‘looking’ hit. This modern discovery effort fully describes the complete work up and optimisation of a hit into a candidate drug – this is quite unusual, as many programs end up breaking down the work into several publications. On note is the full and detailed characterisation of the hit compound (about 2 uM), which put the project in a strong position. At this point a hit with a relatively low molecular weight of 368 and cLogP of 2.9, and well understood functional activity, was very attractive. The initial SAR exploration is excellent: a handful of well thought out compounds showed the binding mode of the series, and which tautomer form was key to binding – take note. The second exploration found a new hydrogen bond to explore and optimise, in the form of an indole. Take note again the team fully characterised this compound, and found excellent selectivity against a panel of protein targets, but poor solubility and sub-optimal pharmacokinetics – this again placed the project in a strong position. Next to understand the poor solubility the team proposed and modelled potential intramolecular hydrogen bonds and a planar structure, which were confirmed by a single crystal x-ray determinations – this is the definitive method. Efforts to disrupt the intramolecular H-bonds and planarity did find tert-butyl groups could be added. This approach would normally yield a molecule high in lipophilicity an unlikely to have good solubility. However, work to find an iso-steres to indole found a phenol group could be used. Normally a classic med-chem change is phenol to indole, but this is the reverse. Phenols are not normally desirable as they undergo secondary metabolism resulting in short half-life, but flanked with a t-butyl group this clearly does not occur as PK was good for this combination of groups.

Ivacaftor – Discovery of N-(2,4-Di-tert-butyl-5-hydroxyphenyl)-4-oxo-1,4-dihydroquinoline-3-carboxamide (VX-770, Ivacaftor), a Potent and Orally Bioavailable CFTR Potentiators.

J. Med. Chem. 2014, 57, 23, 9776–9795.

#BucketListPapers #DrugDiscovery #MedicinalChemistry #BeTheBestChemistYouCanBe


BucketListPapers 66/100: The first case study – Celecoxib – introducing a metabolically labile group to control half-life.

Selecting drug discovery case studies for the BucketList

Here starts a set of drug discovery ‘case studies’; selected papers that describes the research that led to an approved drug to market. If we just selected these papers on the basis of being an approved drug, we could probably generate another bucket list. We could also have chosen papers that are excellent examples of the current standard of scientific write ups. However, the most useful papers are those where the research has solved a particular problem, and this is well described, with clear tables of data. We think these papers have the most educational benefit, and serve as a good references. However there is a bias towards some of the more recent ‘drugs to market’, simple because of the standard of the write up, particular those in J.Med.Chem. Lastly, if we came across a paper that has been made open access by the authors, we selected that over others.

We should note some classics from history, as honourable mentions.

Let us mention the first beta-blocker: Propranolol which led to a Nobel prize. Black JW, Crowther AF, Shanks RG, Smith LH, Dornhorst AC (May 1964). “A New Adrenergic”.

Lancet. 1 (7342): 1080–1.

And Cimetidine, considered by many as the ‘first rationally designed’ drug.

Characterization And Development Of Cimetidine As A Histamine Hz-Receptor Antagonist. Gastroenterology 74:339-347, 1978

And the Captoprl story. Science, 1977, 196, 441-444.

On the subject of writing a quality med chem paper – please read:

Writing Your Next Medicinal Chemistry Article: Journal Bibliometrics and Guiding Principles for Industrial Authors

J. Med. Chem. 2020, 63, 14336−14356


Case Study : Celecoxib

Celecoxib SAR table

The inhibition of Cyclooxygenase-2 requires a molecule with a five membered ring substituted with a 1,2-diphenyl groups. The series discovered Pfizer was unusually stable, having an un-expectably long half-life in rodent. The solution was to introduce a metabolically labile group (4-Methyl – compound 1i – Table 1). This generally is the reverse of what is required in typical drug hunting projects, with the exception of inhaled drugs a short half-life is often desired. The approach to solving the problems, and late stage in-vivo profiling is well described.

Synthesis and Biological Evaluation of the 1,5-Diarylpyrazole Class of Cyclooxygenase-2 Inhibitors:  Identification of 4-[5-(4-Methylphenyl)-3- (trifluoromethyl)-1H-pyrazol-1-yl]benzenesulfonamide (SC-58635, Celecoxib)

J.Med. Chem. 1997, 40, 9, 1347–1365

#BucketListPapers #DrugDiscovery #MedicinalChemistry #BeTheBestChemistYouCanBe


BucketListPapers 65/100: Another invaluable list for your med chem knapsack.

Bad proteins in preclinical

We send this reference out to our clients and contacts more than any other paper. It describes the rationale, strategies and methodologies for in vitro pharmacological profiling at four major pharmaceutical companies (AstraZeneca, GlaxoSmithKline, Novartis and Pfizer), and illustrates, with examples of their impact, on the drug discovery process. For the first time there was a disclosed a list of proteins, that small molecule interaction has been linked to undesired side effects in patients. Such effects led to discontinuation of the drug at obvious high cost. The knowledge of this list, coupled with early in-vitro screening, is a must have for medicinal chemistry.

Reducing safety-related drug attrition: the use of in vitro pharmacological profiling.

Nature reviews Drug Discovery, 2012, 11, 909


BucketListPapers 64/100: Lipophilic candidate drugs tend to be discontinued.

Lipophilic and fate in dev

The first of these two papers, examining the physical chemical properties of drugs and the in-vivo and clinical outcomes, caused a stir when first published. This resulted in the one of the authors going ‘on tour’, around the conference circuit, for about a year, as we recall. A simple study comparing the properties of successful phase 1,2 and 3 drugs, against those that were discontinued produced a simple conclusion. Other properties were also examined and the authors subsequently produced more detailed studies that were subsequently only disclosed at conferences. The work led, we believe to the term ‘developability’, and various methods of calculating a score.

The second paper performs a more rigorous statistical study of compound properties and in-vivo outcomes. This work is quite detailed, and quite hard to follow the stats, but importantly the work considers the inter-connectivity of properties.

A Comparison of Physiochemical Property Profiles of Development and Marketed Oral Drugs

J. Med. Chem. 2003, 46, 7, 1250–1256

Relating Molecular Properties and in Vitro Assay Results to in Vivo Drug Disposition and Toxicity Outcomes

J.Med. Chem. 2012, 55, 14, 6455–6466.


BucketListPapers 63/100: Relating chemical properties to outcomes in early pre-clinical toxicology studies… and a missing conclusion?

Properties to tox

In early drug development, candidate compounds undergo testing in in-vivo animal safety models. The results of these are usually written up in text documents, and not usually broken up into data points to be loaded onto a database, for example. The drug discovery chemists of this bucket list paper took these “raw” text documents and entered the results into spreadsheets to allow study against the chemical properties of the drug candidates – this itself is pretty heroic. A complication of any analysis involving in-vivo data is the dose / concentration of compound in blood will be different from study to study. For statistical rigour, the group chose the 10uM Total Drug Threshold as there was an even number of “clean” versus “toxic” outcomes for the compounds studied. From this there came a ‘medicinal chemistry rule’: low-ClogP(<3)/high-TPSA(>75) are approximately 2.5 times more likely to be clean as to be toxic. However, the authors came to realise they had missed something in the analysis, and subsequently referred to it during conference talks. At the 1uM threshold (Figure 3 above) the majority of the compounds were clean. So highly potent compounds, with good bio-availability, enables low dosing is the best route to non-toxic results pre-clinically, irrespective of properties.

Physiochemical drug properties associated with in vivo toxicological outcomes

Bioorg. Med. Chem. Lett. 2008, 18, 4872 – 48755.

#BucketListPapers #DrugDiscovery #MedicinalChemistry #BeTheBestChemistYouCanBe


BucketListPapers 61/100: How molecular shape modelling can aid drug discovery.

Molecular shape influence

This perspective provides a detailed description of the ways in which molecular shape is modelled and utilised to enhance drug discovery. The authors address how molecular shape modelling has impacted three questions essential in medicinal chemistry: “What is the essence of a molecule? What is it made of? What will it do?” Starting with virtual screening, the perspective details how programs like ROCS (rapid overlay of chemical structures) and SQW (SemiQuantitative reWrite) can be used to find new ligands for targets given a known ligand. These programs use atom-centred Guassians, coloured by atom type, to represent molecules as functionalised volumes that can be overlayed and compared. The authors also highlight the importance of molecular shape in lead optimisation; specifically for the identification of bioisosteres for the improvement of pharmacokinetic properties while maintaining or improving target potency. Furthermore, an extensive review of the use of molecular shape modelling in protein crystallography and ligand pose prediction is given, highlighting technologies and examples where both the molecular shape and torsional strain are optimised to provide realistic ligand poses. Molecular shape is also shown to be a useful metric in the design of diverse compound libraries with algorithms developed to design the molecular shape space of interest and cluster compounds into diverse shape clusters. The authors also describe examples of how Guassian-based molecular shape representations can aid in the design of protein-protein inhibitors, which are notoriously hard to design due to the flat nature of the protein interaction surfaces. Next, an alternative method for molecular shape description is introduced that describes the shape as a surface by placing the molecule in a grid of points in space and recording the minimal distances of the points to the molecular surface; a method that has been utilised for 3D QSAR modelling. Finally, the authors present a comparison of several approximate shape methods that provide quick results for pre-filtering large datasets before more exhaustive calculations can be performed.

In summary, understanding molecular shape can make an impact in many areas of drug discovery and a variety of Guassian or surface -based modelling programs have been developed to aid the medicinal chemist. This review offers an extensive description of such approaches and their example uses, and is a great read for those wishing to develop their understanding of the field.

Molecular Shape and Medicinal Chemistry: A Perspective Med. Chem. 2010, 53, 10, 3862–3886

#BucketListPapers #DrugDiscovery #MedicinalChemistry #BeTheBestChemistYouCanBe


BucketListPapers 60/100: The General Solubility Equation.

general solubility equation

Compound solubility is one of the key physicochemical properties that is essential to optimise for drug formulation and systemic absorption. While there are several kinetic and thermodynamic solubility assays that can be incorporated into the compound optimisation cascade, it is also useful to employ theoretical calculations for very high-throughput predictions of aqueous solubility. The general solubility equation (GSE), first introduced in the 1980s and later optimised in the early 2000s, uses only 2 parameters (the Celsius melting point and octanol-water partition coefficient) and has been a standard within the pharmaceutical industry. This validation paper from 2001 compares the use of the GSE with a Monte Carlo simulation method on 150 compounds and determines that, on average, the GSE provides solubility predictions closer to experimental values. Considering the speed at which the GSE can be applied compared to Monte Carlo simulations, this highlights the usefulness of the simple equation for guiding compound design.

Prediction of Drug Solubility by the General Solubility Equation (GSE).

Chem. Inf. Comput. Sci. 2001, 41, 2, 354–357

#BucketListPapers #DrugDiscovery #MedicinalChemistry #BeTheBestChemistYouCanBe


BucketListPapers 59/100: How can computational chemists make real impact in drug discovery?

Fig2 WhatWorksWhatDoesnt

In this report from 2006, Martin assesses how computational chemists can positively impact drug hunting projects by forming close collaborations with medicinal chemists. The report starts by summarising the computational calculations that were the most popular with Abbott medicinal chemists at the time and showed data that indicated how computationally cataloguing structural alerts as SMARTS reduced the number of flagged compounds within Abbott’s compound library over time. Furthermore, several examples were identified where models with low predictability were still deemed useful. For example, log P predictors were shown to have low accuracy but considered to be “good enough” by medicinal chemists to predict relative log P values within a series. The report also highlights examples where computational chemists offered useful insight that aided decision-making, even when the data or models were inaccurate. An example of this was the observation by a computational chemist that the available data for modelling a compound series had a narrow log P range, which led to the prioritisation of more polar compounds that were subsequently optimised into a lead.

In summary, this report is still an important read for computational and medicinal chemists who work together today, serving as a reminder that drug discovery projects benefit when knowledge and insight is shared between the two fields.

What Works and What Does Not: Lessons From Experience in a Pharmaceutical Company

QSAR & Combinatorial Science, 2006, 25, 1192-1200.

#BucketListPapers #DrugDiscovery #MedicinalChemistry #BeTheBestChemistYouCanBe


BucketListPapers 58/100: Time for some Skepticism!

Fig3 Healthy skeptism

“Although the development of computational models to aid drug discovery has become an integral part of pharmaceutical research, the application of these models often fails to produce the expected impact on productivity.” The first line of the abstract of this paper introduces the subject and problem succinctly. This important paper covers into the reasons for this lack of success and covers some critical ground on assessing models and incorporating them into med chem workflows in a timely manner. The paper is on the heavy side of statistics but the charts in Fig. 3 should prove useful for assessing models. Following this paper a blog entry from Pat Walter was published online – this is well worth a read too.

Healthy skepticism: assessing realistic model performance

Drug Disco.Today, 2009, 14, 420-427

Here is a follow up blog that is well worth a read:


BucketListPapers 57/100: Are we as smart and consistent as we think we are?

how chemists choose compounds

These two papers should really make you think. First, in 2004 Pharmacia, submitted lists of compounds to their medicinal chemists to select as part of a compound acquisition initiative. By sending multiple lists (rather than one big list) it was possible to look at the consistency of the chemists in their selection choices. While the authors expected a difference between chemists, they did not expect an individual chemist to be inconsistent between each list. This is food for thought. In Novartis in 2012 sent multiple lists of compounds, but in addition the chemists were asked on what criteria they had chosen the compounds i.e. lipophilicity, size, diversity, novelty. The expectation was that calculations and filters would be applied to reduce the lists to a smaller sets to review by eye. From the returned selections it was possible to look at the spread of these properties. The actual selections did not reflect the criteria the chemists had said they used. Of most concern, nearly all chemists said they used novelty as a select, but only 2 of the 19 actually selected novel compounds. Overall, both studies showed considerable bias in compound selection, again considerable food for thought.

Assessment of the Consistency of Medicinal Chemists in Reviewing Sets of Compounds. J. Med. Chem. 2004, 47, 20, 4891–4896 

Inside the Mind of a Medicinal Chemist: The Role of Human Bias in Compound Prioritization during Drug Discovery. PLoS ONE 7(11): e48476.


BucketListPapers 56/100: How have acids and bases faired in drug discovery?

acids and base in drugs solubility

“It is fairly common for drugs to be classified as weak acids or bases or perhaps more accurately as acids, bases, neutral, or zwitterionic.” Often the acidic or basic group is key part of the pharmacophore, and as such tend not be optimised by fine tuning the pKa. This very useful review is a comprehensive study of the effect acidic and basic compounds.  Table 1 to 4 should be printed out by any compound designing chemists, and carried around as a reference. These summaries the effect on ADMET properties of ionised molecules from several dozen papers. The selected plot above (Figure 5) showing the clear effect of having ionisable group on aqueous solubility.  However, having read the rest of the paper you will be left with the view that having a neutral compound as a drug is the best outcome, given that both lipophiliic acids and bases tend to have some kind of ADMET issue.

Acidic and Basic Drugs in Medicinal Chemistry: A Perspective J. Med. Chem. 2014, 57, 23, 9701–9717

#BucketListPapers #DrugDiscovery #MedicinalChemistry #BeTheBestChemistYouCanBe


BucketListPapers 55/100: Methyl, Ethyl, Futile – We have all said it…

We distinctly remember when this paper came out, and it was not long after that the phrase entered the common lexicon of the medicinal chemists. The study was not that many years after the Rule-of-Five paper, and within our discipline, the naughties become a decade of looking at compounds and defining further guidelines (some becoming un-useful “rules”). This is a must read, as it discuss the key principal of finding the lipophilicity ‘sweet spot’. This is where binding affinity and absorption are sufficient, but not too high, where metabolism and safety concerns arise. The reason for “methyl, ethyl, futile” phrase is simple because it is too easy to increase the lipophilicity of a compound series and “potency” improves, leading to a false sense of progress on a project. Later on came the concept of efficiency in drug design; getting the most out of each atom and lipophilic group. Read the paper and it will improve your thinking in compound design.

Lipophilicity in PK design: methyl, ethyl, futile.   J Comput Aided Mol Des. 2001 Mar;15(3):273-86. 

#BucketListPapers #DrugDiscovery #MedicinalChemistry #BeTheBestChemistYouCanBe


BucketListPapers 54/100: The biggest screening libraries ever made: DELs

Picture BLP 54 100

The last post concerned fragment libraries – round built on the philosophy “small fragments can represent massive libraries” at the other end of the scale are DNA encoded combinatorial libraries (DELs).  DELs represent the technological offspring of combinatorial chemistry and molecular biology with a little classical protein biochemistry for good measure. With split pool synthesis to make vast libraries and encoding the sequence of chemistry used in a DNA sequence attached to the compounds, huge libraries can be made and potent ligands identified.  Chromatography with the protein target as ‘bait’ to fish out the most potent compounds followed by PCR to sequence the DNA tag establishes the identity of the best binders.  If you can do affinity chromatography with your protein target, DELs represent the other extreme approach to lead generation.  The first paper is a pure classic – Brenner and Lerner’s PNAS publication contains the essence of the technique in highly readable form.  It contains the brilliant line “we recently, in principle, solved the synthetic procedure for peptides”.   But actually to industrialise takes another 17 years.  There has been a huge number of synthetic chemistry devils to outwit in making the method scalable and Morgan et al’s 2009 paper shows one version of the production and screening of billion compound libraries and the identification of inhibitors reduced to practice.

“Encoded combinatorial chemistry”, Brenner and Lerner,  Proc. Natl. Acad. Sci (1992), 89, 5381-5383

“Design, synthesis and selection of DNA-encoded small-molecule libraries”, Morgan et al, Nature Chem. Bio. (2009), 5, 647 – 654

#BucketListPapers #DrugDiscovery #MedicinalChemistry #BeTheBestChemistYouCanBe



BucketListPapers 53/100: The rise and rise of Fragment Based Drug Discovery (FBDD)

1YSG SAR by NMR ligands

FBDD is now an established methods of drug discovery having resulted in drugs delivered to patients and multiple compounds in clinical trials.  For groups without access to a compound collection or where the belief is that the target belongs to a class where you have few ligands, FBDD is a logical choice.  The key requirement is that you can access structural information to drive synthesis to make the small, weak ligands more potent.  FBDD has also provided a framework for people to think about what constitutes a good ligand via the debate round ligand efficiency, and how to improve potency.

The first paper to read is Hadjuk, Fesik et al’s “SAR by NMR”

“Discovery of Potent Nonpeptide Inhibitors of Stromelysin Using SAR by NMR”  J. Am. Chem. Soc., (1997), 119, 5818–5827

And then to follow up:

Murray & Rees,  Nature Chemistry (2009), 1,187–192

Congreve, et al, J. Med. Chem. (2008), 51, 3661–3680

And finally:

“Twenty years on: the impact of fragments on drug discovery”

Erlanson,  Fesik, Hubbard, Jahnke & Jhoti, Nature Reviews Drug Discovery (2016), 15 , 605-619


#BucketListPapers #DrugDiscovery #MedicinalChemistry #BeTheBestChemistYouCanBe


BucketListPapers 52/100: Structural alerts for Mutagenicity – a must for every compound designer.

Picture BLP 52 100

We feature here, as the bucket list paper, one of first papers to publish in this key area by John Ashby, but in fact there are number of vital publications. Observations and testing results in the critical Ames assay were taken by the likes of Ashby and Tennant to derive and categorize a set of structural alerts for DNA reactivity that can identify potentially mutagenic compounds. There is some danger in doing this as Ashby states in his paper “It is obviously dangerous to simplify so complex an issue as chemical-structure/biological-activity relationships in chemical carcinogenicity and mutagenicity.”  None the less it is important to know which chemical groups frequently cause this type of toxicity to ensure correct screening and due process; avoiding them altogether is best.

We include a few other papers that work further to increase knowledge and develop computer models to predict tox. The tables in these papers should be printed out and stuck on the wall above your desk!


“Fundamental structural alerts to potential carcinogenicity or non-carcinogenicity” Ashby Environ. Mutagen. (1985),7, 919-921


This paper uses corresponding Ames test data (2401 mutagens and 1936 non-mutagens) to construct new criteria and alerts. SMARTS string representations of the specific toxicophores are available in the Supplementary Information:


“Derivation and Validation of Toxicophores for Mutagenicity Prediction” Kazius, McGuire, Bursi J. Med. Chem. (2005), 48, 312-320


And in vivo rat studies

“Structure−Activity Relationship Analysis of Rat Mammary Carcinogens” Cunningham, Moss, Lype, Qian, Qamar & Cunningham Chem. Res. Toxicol. (2008), 21, 10, 1970-1982


#BucketListPapers #DrugDiscovery #MedicinalChemistry #BeTheBestChemistYouCanBe


BucketListPapers 51/100: Neural Networks – back where it began

Our previous entry talked about the current theme of using deep neural networks, however it’s worth remembering that the field has been here before.  For a really clear and thoughtful exposition of the use of artificial neural networks see Salt and Livingstone’s 1992 paper, which in 6 clear pages covers the essentials of the technique, examples of how ANN’s can fit to different functions, many of the issues and two case studies. For an even more succinct and prescient view, Ichikawa’s 1990 paper is a great read particularly for the phrase:


“the difficulty of the convergence is not caused by the structure of the network but the quantity of the information included in the given data”.


Salt, Yildiz, Livingstone and Tinsley, “The use of artificial neural networks in QSAR.” Pestic. Sci. (1992) 36, 161–170

Aoyama, Suzuki and Ichikawa,  “Neural Networks applied to Structure-Activity Relationships” J. Med .Chem. (1990), 33, 905–908


#BucketListPapers #DrugDiscovery #MedicinalChemistry #BeTheBestChemistYouCanBe


BucketListPapers 50/100: Going Deep – it was bound to happen – Deep Neural Nets (DNN) in compound prediction.


Picture BLP 50 100


As a streaker flashed across the stage at the 1974 Oscars, the forever cheerful and charming co-host David Niven turned back to the audience and said, “Well ladies and gentleman, that was almost bound to happen…” Given the long history of efforts to predict properties of virtual molecules, and interest in Neural Nets in the 90’s, then Random Forest, “it was bound to happen” that Deep Neural Nets (DNN) would be applied to chemical data sets. Even less surprising was Bob Sheridan would be one of the first to publish.

The importance of encoding molecules in the right form (descriptors) rings true in these publications, as does the reliance on the quality (not quantity) of data. Equally pay attention to the amount of gain DNN provides over previous methods, we still have a way to travel.


The great volume of DNN papers current being submitted led us to select several papers – enjoy them all.


“Deep Neural Nets as a Method for Quantitative Structure–Activity Relationships”

Ma, Sheridan, Liaw, Dahl & Svetnik J. Chem. Inf. Model. 2015, 55, 2, 263-274


“DeepTox: Toxicity Prediction using Deep Learning”

Mayr, Klambauer, Unterthiner  & Hochreiter Frontiers in Env. Sci, 2016, 3, 2 – 15


“PotentialNet for Molecular Property Prediction”

Feinberg, Sur, Wu, Husic, Mai,  Li,  Sun,  Yang, Ramsundar & Pande ACS Cent. Sci. 2018, 4, 1520−1530


A word a caution….think about the errors in any prediction. Frequently a new virtual compound requiring a prediction is out-of-domain, even for these new DNN models.

The Relative Importance of Domain Applicability Metrics for Estimating Prediction Errors in QSAR Varies with Training Set Diversity

Sheridan, J. Chem. Inf. Model. 2015, 55, 6, 1098-1107

#BucketListPapers #DrugDiscovery #MedicinalChemistry #BeTheBestChemistYouCanBe


BucketListPapers 49/100: First exploration of Random Forests in SAR modelling.

Picture BLP 49 100

Artificial intelligence (AI) in life science is everywhere at the moment but those of us that have been around the block a while know that many of the machine learning (ML) techniques have already been explored and used for some time. This paper was the first exploration of Random Forest, RF (or Regression Tree) modelling applied to drug discovery datasets to predict properties. If you have no idea about ML in drug discovery this paper is a good read as entry point as the author make a good stab at explaining how RF works and it applicable.

As the authors point out there is “no free lunch” in molecular modelling, one technique does not work for all situations, datasets and compound type however since 2002 RF has shown to be is pretty good a lot of the time. It has a great advantage, as this work shows, that is can be used “off-the-shelf” with it default settings. Recent ML work (see Deep Learning papers – next BucketListPaper) that the encoding of the molecules (descriptors) is important and the quality of the dataset submitted. Looking back with modern experience we find this work remarkable that good models in this work were produced with just a few hundred compound measurements. The final reason, and why we selected this paper, is the rigour and quality of the process of performing the work in comparison to other techniques and write ups.


Random Forest: A Classification and Regression Tool for Compound Classification and QSAR Modeling

Svetnik, Liaw, Tong, Culberson, Sheridan, and Feuston J. Chem. Inf. Comput. Sci. 2003, 43, 1947-1958

#BucketListPapers #DrugDiscovery #MedicinalChemistry



#BucketListPapers 48/100: How do chemists actually improve molecules?

Picture BLP 48 100

The basis of structure activity relationships (SAR) is identifying a well define chemical difference between two molecules and examining the difference in activity and properties. Over time compound designers build experience through a mental “bag-of-tricks” for designing a new molecule with the desired properties. If these tricks did not work then there is no such thing as the art of medicinal chemistry; we might as well make random compounds. Inherently designing a new compound involves mentally “changing” atoms into other atoms, even if that is as simple as change hydrogen into fluorine.

Given this, what are all the combinations of the atoms, or groups of atoms, that could be changed in a molecule? Well that would be a very high number (90 billion) but a sensible place to start would be examining what chemists have made in known drug molecules. Given these molecules have made it to patients, and so have low, if not no, toxicity then that gives us an idea of “acceptable” groups.  This early work by Sheridan is the first results of such a study, and perhaps produced the first large scale database of chemical transformations. The paper discusses the techniques and challenges involved in finding the chemical groups; principally by finding and using maximum common substructure (MCSS), what we now call matched pairs. Interesting the most common “transformations” are still the most frequent changes that chemists often make to molecules (see Figs 5 and 6). This paper certainly inspired us ‘back in the day’ to explore and develop further Matched Molecular Pair Analysis.

The Most Common Chemical Replacements in Drug-Like Compounds

Robert P. Sheridan J. Chem. Inf. Comput. Sci. 2002, 42, 1, 103-108

#BucketListPapers #DrugDiscovery #MedicinalChemistry #BeTheBestChemistYouCanBe


BucketListPapers 47/100: Confirmation of conformation

Picture BLP 47 100

This is a great personal favourite because it illustrates a clear link between two worlds that I enjoy working in – quantum chemistry and crystal structures.  Both of these are rich sources of information about drug-like molecules. A particular challenge that both face is whether they are relevant to the behavior of molecules in solution.  In this paper, the question of whether the two at least agree with one another is addressed and is pleasingly positive – as you can see in the figure in which the curve is the energy variation with dihedral angle (computed at the RHF/STO-3G level) and the columns are the frequency that each dihedral range is observed in crystal structures.  This evolved towards the MOGUL tool from the Cambridge Crystallographic Database Centre.  A follow up (DOI: 10.1039/c2ce25585e) probed the extent to which the solid state influences the observed torsional preferences in crystal structures and found this to be an infrequent concern.  For those interested in understanding the conformational preferences of molecules the approaches presented here are a great starting point.  Presumably if the preferences hold in the gas phase (in the quantum calculations) and the solid state (in crystal structures) there is a high likelihood of a similar preference prevailing in solution.


Comparison of conformer distributions in the crystalline state with conformational energies calculated by ab initio techniques. Allen, F. H.; Harris, S. E.; Taylor, R.  J. Comput.-Aided. Mol. Design 1996, 10, 247-254.


#BucketListPapers #DrugDiscovery #MedicinalChemistry #BeTheBestChemistYouCanBe


BucketListPapers 46/100: A window into a time gone by and some lessons still worth learning.

One of the joys of compiling and reporting on the bucket list is that we are often reading and describing papers that we did not select ourselves.  That is certainly true of this one which is from 1993 – and it shows.  It is also a revealing insight into how a lot of the progress and overhyping of artificial intelligence and computers in chemistry has come about. Some brilliant folk in computer science who had lived through and driven many of the important developments in artificial intelligence were looking for a scientific problem to apply it to. They stumbled on structure elucidation from mass spectrometry. It is hard to be excited at this remove about the particular application but it is clear that they made a big noise about this application even though they also describe Carl Djerassi’s rather unimpressed response to the program.  However, the general rules suggested by the authors are of pretty general use and interest to those developing scientific software of all kinds:


Lesson 1. The efficiency of the generator is extremely important. It is particularly important that constraints can be applied effectively.

Lesson 2. The use of depth-first search, which provides a stream of candidates, is generally better (in an interactive program) than breadth-first search, in which no candidates emerge for examination until all are generated.

Lesson 3. Planning is in general not simply a nice additional feature but is essential for the solution of difficult problems.

Lesson 4. Every effort to make the program uniform and flexible will be rewarded

Lesson 5. An interactive user interface is not merely a nicety but is essential.

Lesson 6. An interesting extension of the plan-generate-test paradigm could improve its power: search and generation might be combined into a single problem solver.

Lesson 7. Choice of programming language is becoming less of an issue.

Lesson 8. Providing assistance to problem solvers is a more realistic goal than doing their jobs for them.

Lesson 9. Record keeping is an important adjunct to problem solving.

Lesson 10. In order to use a program intelligently, a user needs to understand the program’s scope and limits.

Lesson 11. The context in which problem solving proceeds is essential information for interpreting the solutions

Lesson 12. DENDRAL employs uniformity of representation in two senses: (a) in the knowledge used to manipulate chemical structures, and (b) in the data structures used to describe chemical structures and constraints.


DENDRAL: a case study of the first expert system for scientific hypothesis formation. Lindsay, R. K.; Buchanan, B. G.; Feigenbaum, E. A.; Lederberg, J. Artificial Intelligence. 1993, 61, 209-261.


#BucketListPapers #DrugDiscovery #MedicinalChemistry #BeTheBestChemistYouCanBe


BucketListPapers 45/100: Scrambling to find a better model

We have discussed before (bucket list #26 and 27) about the risk of cherry picking from a large pool of descriptors.  This paper presents one of the ways to check if your model building has benefited from this effect: y-scrambling.  Here, the set of descriptors calculated for the set of molecules is retained but the value of the property (y) that you are trying to model is scrambled – the descriptors no longer correspond to the relevant molecule but the numerical set that the model is built from remains the same.  A repeated set of scramblings of the y-values should give an estimate of the type of model statistics that any credible model must improve upon. A comparison with models built instead using random descriptors shows that these achieve better r2 statistics, likely because real descriptors include some that correlate with one another. They divide models into three regimes:

r2(model) > r2(random descriptors) – probably a good model with physical link between descriptors and the property being modelled.

r2(model) < r2(y-scrambling) – unlikely to be a meaningful model

r2(model) > r2(y-scrambling BUT r2(model) < r2(random descriptors) – possible suggestion that there is a link between the physical description of the molecules captured by the descriptors and the property being modelled BUT this is not as good as can be achieved by random descriptors.


y-Randomization and Its Variants in QSPR/QSAR. Rücker, C.; Rücker, G.; Meringer, M. J. Chem. Inf. Model. 2007, 47, 2345-2357.


#BucketListPapers #DrugDiscovery #MedicinalChemistry #BeTheBestChemistYouCanBe


BucketListPapers 44/100: Reflections and dreams of the future

This retrospective by one of the great names in chemoinformatics (and beyond) provides an encouraging overview of the many advances in the field over the previous 40 years or so that have been of great impact and value. Notable examples include the creation of vast databases of chemical information and tools to exploit them. The perspectives for the future are astonishing because at almost any point in the history of the discipline, similar targets could have been highlighted.  These include: 1) better structural representations and tools for abstracting chemical data, 2) better ways to link between structure and real world effects, 3) predicting chemical reactions/reactivity, 4) helping humans to elucidate chemical structures, 5) elaborating and elucidating biological networks, 6) toxicity prediction. Well worth a read for the optimistic review of achievements and for motivation when selecting new research directions.

Some solved and unsolved problems of chemoinformatics: Gasteiger

SAR QSAR in Environ. Res. 2014, 25, 443-455.

#BucketListPapers #DrugDiscovery #MedicinalChemistry #BeTheBestChemistYouCanBe


BucketListPapers 43/100: Why is my QSAR not working?

This is an unusual pick for the bucket list – an editorial. However, its an editorial that contains much that echoes through the years as one over-hyped method for prediction is replaced by another. Maggiore notes the mismatch between the dimensionality of chemical space and that representations of it that we use in many statistical models.  He also highlights the importance of activity cliffs. These are the large discontinuities in activity that we expect when thinking about molecules fitting into active sites where it is easy to imagine how a small change in structure might take a molecule from binding tightly to not binding at all (because it is now too big or places a hydrogen bond donor towards a donor on the protein etc). These activity cliffs undermine the similarity principle that much QSAR modeling relies upon and are often not well characterized in the activity data – inactive compounds tend not to be followed up experimentally. The fact that each set of descriptors provides a very different map of chemical space is also a problem – every molecule’s nearest neighbor set can change when different sets of descriptors are used. The chastening conclusion is that “all QSAR models are flawed to some degree” – recognizing and dealing with this truth is one of the challenges for chemoinformatics.

On outliers and activity cliffs – why QSAR often disappoints.

Maggiore J. Chem. Inf. Model. 2006, 46, 1535.

#BucketListPapers #DrugDiscovery #MedicinalChemistry #BeTheBestChemistYouCanBe


BucketListPapers 42/100: How (not) to build a model

Picture BLP 42 100

As has been discussed in many of the bucket list papers, medicinal chemists are often called upon to build and/or use statistical models.  There are many ways of doing this incorrectly, some of which are easy to do without realising it.  In this delightfully frank set of instructions, Dearden, Cronin and Kaiser describe lots of the common errors (summarised as the 21 types in the table shown) and acknowledge examples that include some from their own work that show these problems in action. This paper is an essential checklist and note of caution for all those involved in QSAR or QSPR in its many guises.


How not to develop a quantitative structure–activity or structure–property relationship (QSAR/QSPR).

Dearden, Cronin, Kaiser SAR QSAR in Environ. Res. 2009, 20, 241-266.


#BucketListPapers #DrugDiscovery #MedicinalChemistry #BeTheBestChemistYouCanBe

Go to Top