One Chiral Fingerprint To Find Them All

single chiral smiles method figure1

Markus Orsi & Jean-Louis Reymond. J. Cheminform., 2024, 16:53

doi: 10.1186/s13321-024-00849-6

In Chemoinformatics there are a few problems that always rear their ugly heads and stereochemistry is one of them, even though many computational tools use fingerprints which generally do not consider this. Orsi and Reymond have created one fingerprint that attempts to rule them all that includes chirality information.

In this paper, the methodology of how to create this new chiral fingerprint is described, MAP4C fingerprint. This fingerprint builds upon the MinHashed Atom-Pair fingerprint (MAP4) to include stereochemistry, for both chiral centres (in accordance with Cahn-Ingold-Prelog nomenclature) and double bonds. Essentially, all circular substructures are found with radius 1 up to 4. When a circular substructure has a central chiral atom the atom string is replaced in the SMILES with its chirality or a question mark if unknown, for example ‘C(C)(O)CC’ to ‘$R$(C)(O)CC’. At each radius all possible pairs of circular substructures the shingles are calculated, example is found within Figure 1. The list of shingles is then hashed using the MinHash technique.

The authors subsequently investigate how well this FP performs. When examining the shingles generated on a dataset it demonstrated that chirality only appears in a small fraction of shingles which leads to a molecule with and without stereochemistry defined to have a high fingerprint similarity. The new chiral fingerprint were then evaluated to see how it performs in a virtual screen, against different radius’ and non-/chiral FPs. MAP(C) FPs outperform ECFP(C) and AP(C) in differentiating between active and inactive compounds, with the larger radius’ outperforming the smaller radius’.

Finally, the MAPC is then used to show that they can distinguish between different stereoisomers, for both small molecules up to large natural products and peptides. Additionally, the Jaccard distances for these FPs provide a smaller distance between enantiomers or stereoisomers than for structural isomers. The authors’ recommendation is to use MAPC with a radius of 4 rather than the slightly superior 6 radius due to computational times. It will be interesting to see the uptake of this fingerprint with the chemoinformatics community.

Markus Orsi & Jean-Louis Reymond. J. Cheminform., 2024, 16:53

doi: 10.1186/s13321-024-00849-6