Skip to main content

Boltz-2: democratising the future of drug design

Open source structure prediction with binding affinity

In November 2024 Regina Barzilay’s group at MIT released Boltz-1, a model to predict the structure of biomolecular complexes that rivals AlphaFold3 but crucially, is open source. The team has now released an improved version, Boltz-2, which is capable of predicting binding affinity, an advance that could change the face of drug development. 

We talked to Regina, who is a member of team MATCHMAKERS, and Jeremy Wohlwend, one of the lead developers from Regina’s group about the new features and what developments the team are planning next towards tackling the T-cell receptor challenge. 

Predicting protein structure

AlphaFold2 was a game changer. Published in 2021 it allowed protein structure prediction with near experimental accuracy. Its impact was immediate. A mere three years later, Demis Hassabis and John Jumper were awarded a share of the 2024 Nobel Prize in Chemistry.  

Also in 2024, the scientific community was waiting with bated breath for the next AlphaFold release. AlphaFold3 can model not just protein-protein interactions, but all kinds of interactions, expanding its repertoire to RNA, DNA and small molecules, as well as complexes with modified residues. The model has the potential to transform biomedical research and accelerate drug discovery. But when it arrived there was a caveat: Nature published AlphaFold3 without the accompanying code. Access was restricted to a web platform, with limits placed on the number and types of predictions possible. 

Regina Barzilay
Regina Barzilay, team MATCHMAKERS 

In November 2024, Regina Barzilay’s group released Boltz-1, led by Jeremy Wohlwend, Gabriele Corso and Saro Passaro. Boltz-1 provides accuracy rivalling AlphaFold3 and no restrictions on who can use it or how many predictions they can run. The group also released the code, bolts and all, under MIT’s open license. This means users can take the code and freely use, modify, and build upon it, in academia or industry, developing it further for their own applications. Jeremy reflects, “Boltz-1 has rapidly become the most widely utilised model of structure prediction".

Introducing Boltz-2

Today the group released Boltz-2, which has the ability to not just predict the structure of interactions, but for protein-small molecule interactions also the affinity at which they bind. Jeremy emphasises, “For now, small molecules remain the main drug modality, and we think these machine learning models are going to allow hit discovery to be done completely in silico, enabling virtual screening at scale.” 

 

One of the most expensive and time-consuming parts of the drug development pipeline is the initial hit discovery. By allowing small molecule screening to be done cheaply in silico at speed and scale Boltz-2 could transform pre-clinical drug development. Molecular dynamics simulations, which are routinely used to predict binding affinity, currently take hours whereas similar predictions using Boltz-2 can be carried out roughly 1,000x quicker, taking just 15-30 seconds for an average sized protein and small molecule. 

 

Regina emphasises, “Affinity is integral to any drug design problem. It has been an open problem for decades, but it really required novel machine learning approaches to address. This is not only a life sciences advance, but also an important advance in core machine learning and computer science.” 

 

Like AlphaFold, Boltz-2 structure prediction models were trained using the Protein Data Bank (PDB), a well curated database where researchers have been depositing structures for over 50 years. To enable affinity prediction, the team trained the model using the PubChem database. Jeremy comments, “A lot of our effort has actually gone into curating that data for binding affinity training.” 

 

Affinity prediction is not the only improvement of the Boltz-2 release. While structural data is routinely deposited in the PDB, every lab has additional knowledge about their interaction of interest, which is not necessarily contained in public databases. Critically the team has given users the opportunity to utilise this existing knowledge. Jeremy explains, “We've generalised a way of telling the model about any contacts or pocket that you're aware of on the target, so that you can steer the model towards using them. This can improve the accuracy of the structure prediction pretty dramatically.” 

Across the board, Boltz-2 incorporates significantly more training data sources than previous models. Additionally, the team has integrated Boltz-steering which forces the model to observe the general rules of chemistry, ensuring physical plausibility of the predictions. This has solved hallucination issues found with previous models, avoiding issues with chirality and clashes. Jeremy comments, “These steering techniques and contact conditioning mean users can leverage their domain knowledge to further improve accuracy without having to modify, retrain or fine-tune the model.”  

  

Jeremy Wohlwend, Gabriele Corso and Saro Passaro.
Gabriele Corso, Jeremy Wohlwend and Saro Passaro.

Boltz-2 and beyond

The group are continuing to innovate. While at the moment Boltz-2 can predict binding affinities only for protein-small molecule interactions, Jeremy and his colleagues are actively working on expanding its capabilities to allow protein-protein interaction affinity prediction. This will be critical for team MATCHMAKERS’ work, as they take on the T-cell receptor (TCR) challenge and attempt to accurately predict TCR antigen specificity.  

 

This is a mammoth task given the complexity of TCR biology and a critical hurdle in establishing personalised immunotherapies. TCRs don’t directly bind to antigens, but instead recognise peptides displayed on the cell surface by major histocompatibility complex (MHC) molecules. Each T cell has a unique TCR to allow recognition of the many possible peptide-MHC (pMHC) combinations that it may encounter. To aid this surveillance, individual TCRs exhibit high levels of cross-reactivity, recognising thousands of different pMHC combinations. And this diversity isn’t reflected in the experimental data currently available, meaning existing prediction models demonstrate poor performance.  

 

Team MATCHMAKERS aims to change this and develop AI tools for accurate TCR-pMHC prediction. Armed with $25m from Cancer Grand Challenges, the team combines machine learning leaders like Regina and Nobel prize winner David Baker, with world-leading structural biologists and immunologists. Jeremy comments on MATCHMAKERS’ approach, “The experimental people on the team are generating diverse data at a large enough scale to feed the models, so we can properly train them.” 

 

Beyond drug development and team MATCHMAKERS, Regina is already thinking big-picture, “Hopefully Boltz-2 will broaden people’s perspectives of what is possible and help the discovery of biological phenomena that would not have been possible through traditional experimental methods.” 


 

Through Cancer Grand Challenges, team MATCHMAKERS is funded by Cancer Research UK, the National Cancer Institute in the US and The Mark Foundation for Cancer Research. 

 

In addition to MATCHMAKERS funding part of the GPU resources necessary to complete the Boltz-2 project were provided by National Energy Research Scientific Computing Center (NERSC), a Department of Energy Office of Science User Facility, via NERSC award GenAI@NERSC. The team from MIT was also supported by the Abdul Latif Jameel Clinic for Machine Learning in Health, the NSF Expeditions grant (award 1918839: Collaborative Research: Understanding the World Through Code), the DTRA Discovery of Medical Countermeasures Against New and Emerging (DOMANE) Threats program, the Machine Learning for Pharmaceutical Discovery and Synthesis (MLPDS) consortium, The Mark Foundation for Cancer Research, the Centurion Foundation and the BT Charitable Foundation. 

 

Article written by Rebecca Eccles with thanks to Jeremy Wohlwend and Regina Barzilay