Each year SCI’s Scotland group runs a competition where students are invited to write a short article describing how their PhD research relates to SCI’s strapline: where science meets business.
Hayley Russell, a Pure and Applied Chemistry PhD student at the University of Strathclyde, was the overall winner of this year’s competition. Her article ‘Streamlining Reaction Discovery with Machine Learning: Can we predict the outcome of a reaction without doing it in the lab?’ is reproduced below:
Streamlining Reaction Discovery with Machine Learning: Can we predict the outcome of a reaction without doing it in the lab?
If you had asked my 17 year old self, in the first year of her chemistry degree, what her future career was going to look like the answer would have been quite different from the reality. I had images of long days spent in the lab, lab coat buttoned, safety specs donned. As I approach the end of my PhD, I spend very little time in the lab these days: most of my research takes place on my computer.
The rise of artificial intelligence (AI) is reaching all industries and chemistry is no different. It’s an ideal application, as the large amount of data generated experimentally and computationally can be used as inputs for a machine learning (ML) model. In simplest terms, ML algorithms perform lots of statistical calculations on input data and can use this to predict an output or highlight trends in the data that a human would often never spot.
Using data and ML to inform decision making in chemistry is not new to us. The field of chemoinformatics was developed in the 1990s, with the aim of specifically using information gained from statistical analysis to make informed decisions to improve drug lead identification, though it is now also used outside of the pharmaceutical industry. This is often done by the application of ML methods. One of the most common examples of chemoinformatics is the development of quantitative structure-activity relationships, now a standard tool in the medicinal chemist’s toolkit. My project applies machine learning to organic chemistry for the prediction of reaction outcomes.
Organic chemistry is a constantly evolving field but reaction development often involves huge amounts of trial and error. These ever-growing reaction databases allow better understanding of reaction functionality. But they also mean that scientists are spending a lot of time, money and resources running reactions that fail. Again. And again. And again. What if we could eliminate that? What if we could feed an AI model details of the reaction we are hoping to run and it then told us the outcome? What if we could find out what the yield would be? The favoured isomer?
Many chemical companies have decades worth of experimental data, sitting untouched in archives and folders. With the help of AI, their unused data dumps become gold mines, shimmering with untapped potential. The C-H borylation of aromatics is of importance to the synthetic chemistry industry as the organoborane products are useful building blocks in a range of important reactions. In particular, the Suzuki-Miyaura cross coupling of boronic acids/esters with suitable electrophiles is regarded as one of the key C-C bond forming processes in modern organic chemistry. Organoboranes also find application in the Chan-Lam cross coupling and also serve as masked hydroxyl groups.
My project aims to deepen our understanding of iridium catalysed C-H borylation reactions using ML. ML has been shown in the literature to be capable of predicting the outcome of chemical reactions if a large enough, high quality dataset is available. In my project, I have generated an experimental dataset using high throughput equipment and trained an AI model to predict the outcome of iridium catalysed C-H borylations. It can sort potential substrates into three categories: will not borylate, will borylate with some side products and will borylate fully.
By being able to predict the outcomes of chemical reactions with AI, scientists can reach their target molecules through more efficient routes, saving time and money. This acceleration of synthetic chemistry can allow scientists to make more discoveries and to keep up with the growing demands for more.
More pharmaceutical drugs, more non-toxic pesticides, more environmentally friendly materials.
Hayley Russell
PhD student
University of Strathclyde