Bucket List 2019-05-15T14:25:40+00:00

Accelerating the life sciences ecosystem

MedChemica Bucket List

Accelerating the life sciences ecosystem

The MedChemica Bucket List.

The MedChemica Bucket List

BucketListPapers 56/100: How have acids and bases faired in drug discovery?

“It is fairly common for drugs to be classified as weak acids or bases or perhaps more accurately as acids, bases, neutral, or zwitterionic.” Often the acidic or basic group is key part of the pharmacophore, and as such tend not be optimised by fine tuning the pKa. This very useful review is a comprehensive study of the effect acidic and basic compounds.  Table 1 to 4 should be printed out by any compound designing chemists, and carried around as a reference. These summaries the effect on ADMET properties of ionised molecules from several dozen papers. The selected plot above (Figure 5) showing the clear effect of having ionisable group on aqueous solubility.  However, having read the rest of the paper you will be left with the view that having a neutral compound as a drug is the best outcome, given that both lipophiliic acids and bases tend to have some kind of ADMET issue.

Acidic and Basic Drugs in Medicinal Chemistry: A Perspective J. Med. Chem. 2014, 57, 23, 9701–9717

#BucketListPapers #DrugDiscovery #MedicinalChemistry #BeTheBestChemistYouCanBe

BucketListPapers 55/100: Methyl, Ethyl, Futile – We have all said it…

We distinctly remember when this paper came out, and it was not long after that the phrase entered the common lexicon of the medicinal chemists. The study was not that many years after the Rule-of-Five paper, and within our discipline, the naughties become a decade of looking at compounds and defining further guidelines (some becoming un-useful “rules”). This is a must read, as it discuss the key principal of finding the lipophilicity ‘sweet spot’. This is where binding affinity and absorption are sufficient, but not too high, where metabolism and safety concerns arise. The reason for “methyl, ethyl, futile” phrase is simple because it is too easy to increase the lipophilicity of a compound series and “potency” improves, leading to a false sense of progress on a project. Later on came the concept of efficiency in drug design; getting the most out of each atom and lipophilic group. Read the paper and it will improve your thinking in compound design.

Lipophilicity in PK design: methyl, ethyl, futile.   J Comput Aided Mol Des. 2001 Mar;15(3):273-86. 

#BucketListPapers #DrugDiscovery #MedicinalChemistry #BeTheBestChemistYouCanBe

BucketListPapers 54/100: The biggest screening libraries ever made: DELs

The last post concerned fragment libraries – round built on the philosophy “small fragments can represent massive libraries” at the other end of the scale are DNA encoded combinatorial libraries (DELs).  DELs represent the technological offspring of combinatorial chemistry and molecular biology with a little classical protein biochemistry for good measure. With split pool synthesis to make vast libraries and encoding the sequence of chemistry used in a DNA sequence attached to the compounds, huge libraries can be made and potent ligands identified.  Chromatography with the protein target as ‘bait’ to fish out the most potent compounds followed by PCR to sequence the DNA tag establishes the identity of the best binders.  If you can do affinity chromatography with your protein target, DELs represent the other extreme approach to lead generation.  The first paper is a pure classic – Brenner and Lerner’s PNAS publication contains the essence of the technique in highly readable form.  It contains the brilliant line “we recently, in principle, solved the synthetic procedure for peptides”.   But actually to industrialise takes another 17 years.  There has been a huge number of synthetic chemistry devils to outwit in making the method scalable and Morgan et al’s 2009 paper shows one version of the production and screening of billion compound libraries and the identification of inhibitors reduced to practice.

“Encoded combinatorial chemistry”, Brenner and Lerner,  Proc. Natl. Acad. Sci (1992), 89, 5381-5383

“Design, synthesis and selection of DNA-encoded small-molecule libraries”, Morgan et al, Nature Chem. Bio. (2009), 5, 647 – 654

#BucketListPapers #DrugDiscovery #MedicinalChemistry #BeTheBestChemistYouCanBe


BucketListPapers 53/100: The rise and rise of Fragment Based Drug Discovery (FBDD)


FBDD is now an established methods of drug discovery having resulted in drugs delivered to patients and multiple compounds in clinical trials.  For groups without access to a compound collection or where the belief is that the target belongs to a class where you have few ligands, FBDD is a logical choice.  The key requirement is that you can access structural information to drive synthesis to make the small, weak ligands more potent.  FBDD has also provided a framework for people to think about what constitutes a good ligand via the debate round ligand efficiency, and how to improve potency.

The first paper to read is Hadjuk, Fesik et al’s “SAR by NMR”

“Discovery of Potent Nonpeptide Inhibitors of Stromelysin Using SAR by NMR”  J. Am. Chem. Soc., (1997), 119, 5818–5827

And then to follow up:

Murray & Rees,  Nature Chemistry (2009), 1,187–192

Congreve, et al, J. Med. Chem. (2008), 51, 3661–3680

And finally:

“Twenty years on: the impact of fragments on drug discovery”

Erlanson,  Fesik, Hubbard, Jahnke & Jhoti, Nature Reviews Drug Discovery (2016), 15 , 605-619


#BucketListPapers #DrugDiscovery #MedicinalChemistry #BeTheBestChemistYouCanBe

BucketListPapers 52/100: Structural alerts for Mutagenicity – a must for every compound designer.

We feature here, as the bucket list paper, one of first papers to publish in this key area by John Ashby, but in fact there are number of vital publications. Observations and testing results in the critical Ames assay were taken by the likes of Ashby and Tennant to derive and categorize a set of structural alerts for DNA reactivity that can identify potentially mutagenic compounds. There is some danger in doing this as Ashby states in his paper “It is obviously dangerous to simplify so complex an issue as chemical-structure/biological-activity relationships in chemical carcinogenicity and mutagenicity.”  None the less it is important to know which chemical groups frequently cause this type of toxicity to ensure correct screening and due process; avoiding them altogether is best.

We include a few other papers that work further to increase knowledge and develop computer models to predict tox. The tables in these papers should be printed out and stuck on the wall above your desk!


“Fundamental structural alerts to potential carcinogenicity or non-carcinogenicity” Ashby Environ. Mutagen. (1985),7, 919-921


This paper uses corresponding Ames test data (2401 mutagens and 1936 non-mutagens) to construct new criteria and alerts. SMARTS string representations of the specific toxicophores are available in the Supplementary Information:


“Derivation and Validation of Toxicophores for Mutagenicity Prediction” Kazius, McGuire, Bursi J. Med. Chem. (2005), 48, 312-320


And in vivo rat studies

“Structure−Activity Relationship Analysis of Rat Mammary Carcinogens” Cunningham, Moss, Lype, Qian, Qamar & Cunningham Chem. Res. Toxicol. (2008), 21, 10, 1970-1982


#BucketListPapers #DrugDiscovery #MedicinalChemistry #BeTheBestChemistYouCanBe

BucketListPapers 51/100: Neural Networks – back where it began

Our previous entry talked about the current theme of using deep neural networks, however it’s worth remembering that the field has been here before.  For a really clear and thoughtful exposition of the use of artificial neural networks see Salt and Livingstone’s 1992 paper, which in 6 clear pages covers the essentials of the technique, examples of how ANN’s can fit to different functions, many of the issues and two case studies. For an even more succinct and prescient view, Ichikawa’s 1990 paper is a great read particularly for the phrase:


“the difficulty of the convergence is not caused by the structure of the network but the quantity of the information included in the given data”.


Salt, Yildiz, Livingstone and Tinsley, “The use of artificial neural networks in QSAR.” Pestic. Sci. (1992) 36, 161–170

Aoyama, Suzuki and Ichikawa,  “Neural Networks applied to Structure-Activity Relationships” J. Med .Chem. (1990), 33, 905–908


#BucketListPapers #DrugDiscovery #MedicinalChemistry #BeTheBestChemistYouCanBe

BucketListPapers 50/100: Going Deep – it was bound to happen – Deep Neural Nets (DNN) in compound prediction.



As a streaker flashed across the stage at the 1974 Oscars, the forever cheerful and charming co-host David Niven turned back to the audience and said, “Well ladies and gentleman, that was almost bound to happen…” Given the long history of efforts to predict properties of virtual molecules, and interest in Neural Nets in the 90’s, then Random Forest, “it was bound to happen” that Deep Neural Nets (DNN) would be applied to chemical data sets. Even less surprising was Bob Sheridan would be one of the first to publish.

The importance of encoding molecules in the right form (descriptors) rings true in these publications, as does the reliance on the quality (not quantity) of data. Equally pay attention to the amount of gain DNN provides over previous methods, we still have a way to travel.


The great volume of DNN papers current being submitted led us to select several papers – enjoy them all.


“Deep Neural Nets as a Method for Quantitative Structure–Activity Relationships”

Ma, Sheridan, Liaw, Dahl & Svetnik J. Chem. Inf. Model. 2015, 55, 2, 263-274


“DeepTox: Toxicity Prediction using Deep Learning”

Mayr, Klambauer, Unterthiner  & Hochreiter Frontiers in Env. Sci, 2016, 3, 2 – 15


“PotentialNet for Molecular Property Prediction”

Feinberg, Sur, Wu, Husic, Mai,  Li,  Sun,  Yang, Ramsundar & Pande ACS Cent. Sci. 2018, 4, 1520−1530


A word a caution….think about the errors in any prediction. Frequently a new virtual compound requiring a prediction is out-of-domain, even for these new DNN models.

The Relative Importance of Domain Applicability Metrics for Estimating Prediction Errors in QSAR Varies with Training Set Diversity

Sheridan, J. Chem. Inf. Model. 2015, 55, 6, 1098-1107

#BucketListPapers #DrugDiscovery #MedicinalChemistry #BeTheBestChemistYouCanBe

BucketListPapers 49/100: First exploration of Random Forests in SAR modelling.

Artificial intelligence (AI) in life science is everywhere at the moment but those of us that have been around the block a while know that many of the machine learning (ML) techniques have already been explored and used for some time. This paper was the first exploration of Random Forest, RF (or Regression Tree) modelling applied to drug discovery datasets to predict properties. If you have no idea about ML in drug discovery this paper is a good read as entry point as the author make a good stab at explaining how RF works and it applicable.

As the authors point out there is “no free lunch” in molecular modelling, one technique does not work for all situations, datasets and compound type however since 2002 RF has shown to be is pretty good a lot of the time. It has a great advantage, as this work shows, that is can be used “off-the-shelf” with it default settings. Recent ML work (see Deep Learning papers – next BucketListPaper) that the encoding of the molecules (descriptors) is important and the quality of the dataset submitted. Looking back with modern experience we find this work remarkable that good models in this work were produced with just a few hundred compound measurements. The final reason, and why we selected this paper, is the rigour and quality of the process of performing the work in comparison to other techniques and write ups.


Random Forest: A Classification and Regression Tool for Compound Classification and QSAR Modeling

Svetnik, Liaw, Tong, Culberson, Sheridan, and Feuston J. Chem. Inf. Comput. Sci. 2003, 43, 1947-1958

#BucketListPapers #DrugDiscovery #MedicinalChemistry


#BucketListPapers 48/100: How do chemists actually improve molecules?

The basis of structure activity relationships (SAR) is identifying a well define chemical difference between two molecules and examining the difference in activity and properties. Over time compound designers build experience through a mental “bag-of-tricks” for designing a new molecule with the desired properties. If these tricks did not work then there is no such thing as the art of medicinal chemistry; we might as well make random compounds. Inherently designing a new compound involves mentally “changing” atoms into other atoms, even if that is as simple as change hydrogen into fluorine.

Given this, what are all the combinations of the atoms, or groups of atoms, that could be changed in a molecule? Well that would be a very high number (90 billion) but a sensible place to start would be examining what chemists have made in known drug molecules. Given these molecules have made it to patients, and so have low, if not no, toxicity then that gives us an idea of “acceptable” groups.  This early work by Sheridan is the first results of such a study, and perhaps produced the first large scale database of chemical transformations. The paper discusses the techniques and challenges involved in finding the chemical groups; principally by finding and using maximum common substructure (MCSS), what we now call matched pairs. Interesting the most common “transformations” are still the most frequent changes that chemists often make to molecules (see Figs 5 and 6). This paper certainly inspired us ‘back in the day’ to explore and develop further Matched Molecular Pair Analysis.

The Most Common Chemical Replacements in Drug-Like Compounds

Robert P. Sheridan J. Chem. Inf. Comput. Sci. 2002, 42, 1, 103-108

#BucketListPapers #DrugDiscovery #MedicinalChemistry #BeTheBestChemistYouCanBe

BucketListPapers 47/100: Confirmation of conformation

This is a great personal favourite because it illustrates a clear link between two worlds that I enjoy working in – quantum chemistry and crystal structures.  Both of these are rich sources of information about drug-like molecules. A particular challenge that both face is whether they are relevant to the behavior of molecules in solution.  In this paper, the question of whether the two at least agree with one another is addressed and is pleasingly positive – as you can see in the figure in which the curve is the energy variation with dihedral angle (computed at the RHF/STO-3G level) and the columns are the frequency that each dihedral range is observed in crystal structures.  This evolved towards the MOGUL tool from the Cambridge Crystallographic Database Centre.  A follow up (DOI: 10.1039/c2ce25585e) probed the extent to which the solid state influences the observed torsional preferences in crystal structures and found this to be an infrequent concern.  For those interested in understanding the conformational preferences of molecules the approaches presented here are a great starting point.  Presumably if the preferences hold in the gas phase (in the quantum calculations) and the solid state (in crystal structures) there is a high likelihood of a similar preference prevailing in solution.


Comparison of conformer distributions in the crystalline state with conformational energies calculated by ab initio techniques. Allen, F. H.; Harris, S. E.; Taylor, R.  J. Comput.-Aided. Mol. Design 1996, 10, 247-254.


#BucketListPapers #DrugDiscovery #MedicinalChemistry #BeTheBestChemistYouCanBe

BucketListPapers 46/100: A window into a time gone by and some lessons still worth learning.

One of the joys of compiling and reporting on the bucket list is that we are often reading and describing papers that we did not select ourselves.  That is certainly true of this one which is from 1993 – and it shows.  It is also a revealing insight into how a lot of the progress and overhyping of artificial intelligence and computers in chemistry has come about. Some brilliant folk in computer science who had lived through and driven many of the important developments in artificial intelligence were looking for a scientific problem to apply it to. They stumbled on structure elucidation from mass spectrometry. It is hard to be excited at this remove about the particular application but it is clear that they made a big noise about this application even though they also describe Carl Djerassi’s rather unimpressed response to the program.  However, the general rules suggested by the authors are of pretty general use and interest to those developing scientific software of all kinds:


Lesson 1. The efficiency of the generator is extremely important. It is particularly important that constraints can be applied effectively.

Lesson 2. The use of depth-first search, which provides a stream of candidates, is generally better (in an interactive program) than breadth-first search, in which no candidates emerge for examination until all are generated.

Lesson 3. Planning is in general not simply a nice additional feature but is essential for the solution of difficult problems.

Lesson 4. Every effort to make the program uniform and flexible will be rewarded

Lesson 5. An interactive user interface is not merely a nicety but is essential.

Lesson 6. An interesting extension of the plan-generate-test paradigm could improve its power: search and generation might be combined into a single problem solver.

Lesson 7. Choice of programming language is becoming less of an issue.

Lesson 8. Providing assistance to problem solvers is a more realistic goal than doing their jobs for them.

Lesson 9. Record keeping is an important adjunct to problem solving.

Lesson 10. In order to use a program intelligently, a user needs to understand the program’s scope and limits.

Lesson 11. The context in which problem solving proceeds is essential information for interpreting the solutions

Lesson 12. DENDRAL employs uniformity of representation in two senses: (a) in the knowledge used to manipulate chemical structures, and (b) in the data structures used to describe chemical structures and constraints.


DENDRAL: a case study of the first expert system for scientific hypothesis formation. Lindsay, R. K.; Buchanan, B. G.; Feigenbaum, E. A.; Lederberg, J. Artificial Intelligence. 1993, 61, 209-261.


#BucketListPapers #DrugDiscovery #MedicinalChemistry #BeTheBestChemistYouCanBe

BucketListPapers 45/100: Scrambling to find a better model

We have discussed before (bucket list #26 and 27) about the risk of cherry picking from a large pool of descriptors.  This paper presents one of the ways to check if your model building has benefited from this effect: y-scrambling.  Here, the set of descriptors calculated for the set of molecules is retained but the value of the property (y) that you are trying to model is scrambled – the descriptors no longer correspond to the relevant molecule but the numerical set that the model is built from remains the same.  A repeated set of scramblings of the y-values should give an estimate of the type of model statistics that any credible model must improve upon. A comparison with models built instead using random descriptors shows that these achieve better r2 statistics, likely because real descriptors include some that correlate with one another. They divide models into three regimes:

r2(model) > r2(random descriptors) – probably a good model with physical link between descriptors and the property being modelled.

r2(model) < r2(y-scrambling) – unlikely to be a meaningful model

r2(model) > r2(y-scrambling BUT r2(model) < r2(random descriptors) – possible suggestion that there is a link between the physical description of the molecules captured by the descriptors and the property being modelled BUT this is not as good as can be achieved by random descriptors.


y-Randomization and Its Variants in QSPR/QSAR. Rücker, C.; Rücker, G.; Meringer, M. J. Chem. Inf. Model. 2007, 47, 2345-2357.


#BucketListPapers #DrugDiscovery #MedicinalChemistry #BeTheBestChemistYouCanBe

BucketListPapers 44/100: Reflections and dreams of the future

This retrospective by one of the great names in chemoinformatics (and beyond) provides an encouraging overview of the many advances in the field over the previous 40 years or so that have been of great impact and value. Notable examples include the creation of vast databases of chemical information and tools to exploit them. The perspectives for the future are astonishing because at almost any point in the history of the discipline, similar targets could have been highlighted.  These include: 1) better structural representations and tools for abstracting chemical data, 2) better ways to link between structure and real world effects, 3) predicting chemical reactions/reactivity, 4) helping humans to elucidate chemical structures, 5) elaborating and elucidating biological networks, 6) toxicity prediction. Well worth a read for the optimistic review of achievements and for motivation when selecting new research directions.

Some solved and unsolved problems of chemoinformatics: Gasteiger

SAR QSAR in Environ. Res. 2014, 25, 443-455.

#BucketListPapers #DrugDiscovery #MedicinalChemistry #BeTheBestChemistYouCanBe

BucketListPapers 43/100: Why is my QSAR not working?

This is an unusual pick for the bucket list – an editorial. However, its an editorial that contains much that echoes through the years as one over-hyped method for prediction is replaced by another. Maggiore notes the mismatch between the dimensionality of chemical space and that representations of it that we use in many statistical models.  He also highlights the importance of activity cliffs. These are the large discontinuities in activity that we expect when thinking about molecules fitting into active sites where it is easy to imagine how a small change in structure might take a molecule from binding tightly to not binding at all (because it is now too big or places a hydrogen bond donor towards a donor on the protein etc). These activity cliffs undermine the similarity principle that much QSAR modeling relies upon and are often not well characterized in the activity data – inactive compounds tend not to be followed up experimentally. The fact that each set of descriptors provides a very different map of chemical space is also a problem – every molecule’s nearest neighbor set can change when different sets of descriptors are used. The chastening conclusion is that “all QSAR models are flawed to some degree” – recognizing and dealing with this truth is one of the challenges for chemoinformatics.

On outliers and activity cliffs – why QSAR often disappoints.

Maggiore J. Chem. Inf. Model. 2006, 46, 1535.

#BucketListPapers #DrugDiscovery #MedicinalChemistry #BeTheBestChemistYouCanBe

BucketListPapers 42/100: How (not) to build a model

As has been discussed in many of the bucket list papers, medicinal chemists are often called upon to build and/or use statistical models.  There are many ways of doing this incorrectly, some of which are easy to do without realising it.  In this delightfully frank set of instructions, Dearden, Cronin and Kaiser describe lots of the common errors (summarised as the 21 types in the table shown) and acknowledge examples that include some from their own work that show these problems in action. This paper is an essential checklist and note of caution for all those involved in QSAR or QSPR in its many guises.


How not to develop a quantitative structure–activity or structure–property relationship (QSAR/QSPR).

Dearden, Cronin, Kaiser SAR QSAR in Environ. Res. 2009, 20, 241-266.


#BucketListPapers #DrugDiscovery #MedicinalChemistry #BeTheBestChemistYouCanBe

BucketListPapers 41/100: Generating sets of conformers for when 1 just won’t do.

Chemists understand that most drug sized molecules have some flexibility, and so may have multiple conformations that are accessible at room temperature.  Therefore, if we want to consider modelling the binding of a ligand to a protein, or the 3D similarity between two proteins we need to have access to multiple conformations of ligands.  This is an essential step in virtual screening. We could explore the ‘conformational space’ each time we look at a molecule, but obviously once you have a set of conformations for a molecule, it doesn’t change with the task in hand.  So why not pre-calculate the sets of conformers and store them? The question is how to generate such conformer ensembles, and philosophically more challenging, how to know if the set of generated conformations is ‘good’.  The team at Openeye produced a combined approach, partly using data from crystal structures to identify the well populated torsions for bonds, and then a more ‘first principles’ approach to generate low energy fragment structures and combine the fragments avoiding clashes.  Sets of conformers are then compared to an extremely well curated set of x-ray crystal structures. It’s worth reading the paper just for the exploration of what makes a good crystal structure.

If you’re using crystal structures, docking sets of conformers, comparing sets of molecules in 3D or looking at the results of any of those calculations.

Algorithm and Validation Using High Quality Structures from the Protein Databank and Cambridge Structural Database.

Hawkins, Skillman, Warren, Ellingson, & Stahl J. Chem. Inf. Model. (2010), 50, 572–584.

#BucketListPapers #DrugDiscovery #MedicinalChemistry #BeTheBestChemistYouCanBe

BucketListPapers 40/100: The instant 3D structure revolution

Generating three dimensional structures of molecules from the 2D structure is a classic computational chemistry problem. To generate the best possible structure requires quantum mechanical calculations, modelling of solvation, ionisation, tautomerization and then generating an ensemble of conformers. But to generate a starting point, a “best guess given what we know” surely something simpler could be done?  The next step down would be a ball-and-spring force field model.  Treat all the atoms as balls and bonds as springs, and use classical mechanics to search for the most stable configurations. But still that will take minutes per compound to minimise structures searching all the bond torsions and relaxing the compounds . What about something even simpler, surely from x-ray structures there are some fragments that don’t change much.  Enter the high speed rule based methods.  If you lived in the US you probably used CONCORD and in Europe CORINA. Blazingly fast and allowing a “best first guess” at the 3D structure of a molecule. Like the electronic equivalent of the molecular building kit they allow chemists to generate sets of 3D structures almost instantly.  For high precision work the full QM treatment is still needed, but for a quick look and see or to provide a quick start to QM and Force field minimisation, CORINA and CONCORD are still excellent.

Automatic generation of 3D-atomic coordinates for organic molecules.

Gasteiger, Rudolph,  and Sadowski, J., Tetrahedron Computer Methodology(1990) 3, 537–547. 


Using CONCORD to construct a large database of three-dimensional coordinates from connection tables

Rusinko, Sheridan, Nilakantan, Haraki, Bauman and  Venkataraghavan, . J. Chem. Inf. Model.(1989) 29, 251–255.


#BucketListPapers #DrugDiscovery #MedicinalChemistry #BeTheBestChemistYouCanBe

BucketListPapers 39/100: getting a view on ligand protein interactions


Ligand:protein interactions are intrinsically difficult to view, three dimensional  and highly complex.  Picking out the critical interactions can be an exercise in rotating the structure, zooming, cutting, and creating annotations.  What chemists and drug hunters often want though is a summary – “what are the key interactions?” a map to orientate themselves by, not all the details, but showing the most important features.  Vital for communication and comparing different structures.  As structure based design has grown and expanded into fragment based drug discovery with protein structures at the centre of the make test cycle, the ability to rapidly summarise protein:ligand interactions becomes vital.  The original approach to this is Ligplot, with almost 4000 citations is the classic view and has become so ubiquitous you may not even know its name.  The follow up Ligplot+ brings the classic up to date.


LIGPLOT: a program to generate schematic diagrams of protein-ligand interactions

Wallace , Laskowski and Thornton  Protein Engineering (1995) ,8 127-134

LigPlot+: Multiple Ligand Protein Interaction Diagrams for Drug Discovery

Laskowski & Swindells J. Chem. Inf. Model. (2011), 51, 2778–2786

#BucketListPapers #DrugDiscovery #MedicinalChemistry #BeTheBestChemistYouCanBe

BucketListPapers 38/100: Open Season on Virtual Screening Compounds

When starting a drug(or agrochemical) hunting program one of the first vital steps is having compounds that bind to your target. Getting to this matter has been revolutionised in the last 15 years with three pieces of technology: virtual screening methods “good enough” to enrich screens by 10-50 fold, accessible compute resource via high powered desktops or increasingly as cloud resource, and databases of accessible well curated compounds.  Now start-up companies and academic groups are adopting the strategy of “‘virtual screen, order compounds, low throughput testing” to generate the first hits for a project.  The first compounds get the biology going and can demonstrate to investors or grant funding bodies the glimmer of progress for follow on funding.  The grandparent of databases for virtual screening is ZINC.  Initially a library of just over 700k compounds with 3D structures in 2005, currently ZINC15 is 120 million purchasable “drug-like” compounds.

An invaluable resource and a sign of how compound discovery is changing:

“ZINC − A Free Database of Commercially Available Compounds for Virtual Screening”

Irwin and Shoichet J. Chem. Inf. Model. (2005), 45, 177-182

With the follow up:

ZINC 15 – Ligand Discovery for Everyone  J. Chem. Inf. Model. (2015), 55, 2324-2337


#BucketListPapers #DrugDiscovery #MedicinalChemistry #BeTheBestChemistYouCanBe

BucketListPapers 37/100 Tuning Amines pKa’s : a regular medchem task

Ionisation influences so many of a compound’s properties: solubility, binding to ionic sub-sites, permeability and interactions with critical targets like the hERG ion channel.  This makes tuning the pKa of amines a frequent job for a medicinal chemistry team.  Ionisation is however notoriously hard to accurately predict from first principles, therefore chemists fall back on mental “milestones” of pKas in well studied sets of compounds and rules of thumb to adjust the pKa for different substituents.  One of the best collections of pairs and small sets of amines is in this joint publication led by Diederich at the ETH in Zurich collaborating with Roche in Basel, the University of Wein and the Johannes Gutenberg-Universität in Mainz.  A detailed exploration of the transmission of electronic effects through the sigma skeleton of organic bases in undertaken with a forensic analysis of the effect of conformation of substituents on basicity.

If you’re stuck trying to get the pKa of an amine just right – this is perfect background reading for you.

“Predicting and Tuning Physicochemical Properties in Lead Optimization: Amine Basicities “

Diederich et al, ChemMedChem (2007), 2, 1100 – 1115

#BucketListPapers #DrugDiscovery #MedicinalChemistry #BeTheBestChemistYouCanBe

BucketListPapers 36/100: Critical reading for Fluorine fans


Most medicinal chemists know that fluorine is the “wild-child” of the halogens, shifting adjacent pKa’s and blocking metabolism.  This paper however is a deeply analytical study of how fluorine alters conformational preferences and undergoes very specific interactions with amino acid residues such as C-F:HN, C-F:C=O and interactions with arginines and electropositive cavities.  With systematic analyses of data from the PDB and CSD this paper should be essential reading particularly if you’re working in structure based compound design and have thoughts of ‘adding a fluorine as a bioisostere’. It may behave quite unlike you were expecting.  With over 3800 citations on google scholar, it’s the grandparent of systematic fluorine chemistry reviews.

“Fluorine in Pharmaceuticals: Looking Beyond Intuition.”

Muller, K., Faeh, C., & Diederich, F. Science, (2007), 317,  1881–1886.

#BucketListPapers #DrugDiscovery #MedicinalChemistry #BeTheBestChemistYouCanBe

BucketListPapers 35/100: Chemical Structure Checking – essential hygiene for chemists

This paper should be compulsory reading for every chemist who ever creates a chemical structure that could end up in a database just so they understand the dreadful variety of errors that can be made in recording chemical structures.  Like supplying clean water to a population, supplying valid chemical structures to other chemists is an under-rated but essential task.  Much of this area of cheminformatics is hidden in large companies, but this paper shows the essential steps in cleaning a set of structures so that even the simplest tasks such as duplicate identification and clustering can take place, let alone any QSAR or other modelling.  That and it starts with a quote from both Ronald Reagen and Felix Dzerzhinsky.

“Trust, But Verify: On the Importance of Chemical Structure Curation in Cheminformatics and QSAR Modeling Research” Fourches, Muratov & Tropsha  J. Chem. Inf. Model. (2010), 50, 1189–1204

#BucketListPapers #DrugDiscovery #MedicinalChemistry #BeTheBestChemistYouCanBe

BucketListPapers 34/100: How do we define a ‘scaffold’ in medchem? How do you get a computer to do it?

Another easy thing for medicinal chemists to talk about is the ‘scaffold’ of the molecule, or indeed ‘scaffold hopping’. Between chemists it might be possible to build a fair definition, which works well until the next weird ‘chemical class’ appears for a new set of protein targets. For chem-informaticians working in computer space the encoding of a scaffold needs to be firmly defined to script programs that work consistently across chemical space. The technical approach to breaking down organic molecules to defined scaffolds described in this paper is so well done that we talk about ‘Bemis / Murcko scaffolds’. So if you want to know how it is done and use the scaffolds then you had better read this paper.

Bemis, G.W.; Murcko, M.A.; The Properties of Known Drugs. 1. Molecular Frameworks J. Med. Chem.1996, 39, 15, 2887-2893


#BucketListPapers #DrugDiscovery #MedicinalChemistry #BeTheBestChemistYouCanBe

BucketListPapers 33/100: Two papers discussing Molecular Similarity. They are different, honest.

Molecular similarity is an important concept, and vital for rapid database searching and activities like clustering that are critical for day to day working by chemists. But what do we mean by similarity? It is rather subjective; for example, two molecules may have the same molecular weight, but be completely different – what is the correct measure and method to quantify this? The first paper describes traditional a priori and algebraic methods and the second is a cracking review of the concepts of similarity (see excellent diagram above as a taster).

Johnson, M.; Basak, S.; Maggiora, G. A characterization of molecular similarity methods for property prediction Mathematical and Computer Modelling, 1988, 11, 630-634

Maggiora, G.; Vogt, M.; Stumpfe, D.; Bajorath, J. Molecular Similarity in Medicinal Chemistry J. Med. Chem.2014, 57, 83186-3204

#BucketListPapers #DrugDiscovery #MedicinalChemistry #BeTheBestChemistYouCanBe

BucketListPapers 32/100: The next step in representing molecules as a single number – Extended-Connectivity Fingerprints

We highlighted one of the first papers describing a method to represent chemical structures in a computer as a unique fingerprint. Fingerprints allow very rapid comparisons between molecules (similarity – more later) with computers but this important work goes further. The authors describe the method of generating extended topological fingerprints designed for more detailed SAR work. This style of fingerprint is well adopted in the chemical industries.

Rogers, D; Mathew H, H; Extended-Connectivity Fingerprints J. Chem. Inf. Model. 2010, 50, 742–754

#BucketListPapers #DrugDiscovery #MedicinalChemistry #BeTheBestChemistYouCanBe