The release of OpenBind’s first public data set
The UK-led OpenBind project has reached a key milestone.

Dr. Ed Griffen at Diamond Light Source. Credit: Stuart March-DNDi
The first public dataset and AI prediction model has been released, marking an important step forward in the use of artificial intelligence for developing new medicines. MedChemica is proud to be a part of the OpenBind Consortium, led by Diamond Light Source. With special recognition to those involved, this highlights the successful delivery of measurable results.
In just seven months, it has generated 800 high-quality measurements, highlighting a significant improvement in speed and efficiency. Previously, datasets of this scale typically took years to generate and become available to the public. This structure–affinity dataset contains 925 crystallographic binding events from 699 compounds, and associated affinity measurements for 601 compounds. The release includes standardised experimental data and ‘OpenBind V1’, that will allow for immediate use to drive the next developments of AI models.
MedChemica’s Contribution to OpenBind
Openbind’s first data release is a snapshot of progress in the last year. To work at the pace Openbind needs is to work like a factory, with compounds being designed, predictions being made and then compounds being tested and having their protein:ligand structures determined in parallel and at pace. To create an AI for this purpose, it is essential to have processes in place that allow for consistent and reproduceable results.
At MedChemica, our work over the last year has been both in the design phase and, also in helping refine and make robust the process and systems that can now start to deliver compounds at scale into the start of the measurement phase. It’s been quite the journey so far but compound designs have moved into the make stage for the next round of the process and the next targets are in sight. The machine is really beginning to move.
The Benefits

Researcher Jasmin Aschenbrenner loading samples at the crystallography beamline, Diamond Light Source. Credit: Stuart March-DNDi
Although the use of AI models increases the accuracy with which we can predict protein structures, the impact on drug discovery is limited due to lack of reliable experimental data showing in detail how drug molecules actually bind to disease-related proteins. OpenBind strives to change this.
Now, a large open dataset of real drug–protein interactions has been released, along with an AI model trained on binding data. This helps researchers predict drug binding quickly and accurately, speeding up drug discovery process and making it easier for scientists to access the information globally. It establishes a platform for major advances in drug discovery, with upcoming data releases aimed at tackling global health issues such as COVID-19, Dengue, Zika, Malaria and cancer, where the need for quickly developing new treatments is still critical. This data is now accessible to the public.
You can find out more about the project by visiting
