AI can come up with 40,000 potential new chemical weapons in 6 hours?

AI can come up with 40,000 potential new chemical weapons in 6 hours?

Written by: Green Apple

Scientific papers are often models of detail: Author teams are often obliged to disclose all the information needed to allow others to reproduce their findings.

But this study is an exception.

A recent paper published in Nature Machine Intelligence, titled “Dual-use of artificial intelligence-powered drug discovery,” clearly freaked out its authors, both in the tone of the text and in the fact that it left out key information.

A possibility verification

In 2021, Collaborations Pharmaceuticals, a company based in Raleigh, North Carolina, USA, which uses computers to help clients identify molecules that appear to be potential drugs, was invited to present a paper on “Potential misuse of drug discovery technology” at a conference organized by the Spitz Laboratory in Switzerland.

This is a "convergence" series of conferences established by the Swiss government to identify technological developments that could have implications for the Chemical Weapons Convention and the Biological Weapons Convention. Held every two years, the conferences bring together a group of international scientific and disarmament experts to explore the state of the art and trajectory of development in the chemical and biological fields, consider potential security implications, and consider how these implications can best be addressed internationally.

To prepare for the talk, some researchers at the collaboration conducted what they called a “thought experiment,” a computational proof of concept for building a biological weapon.

At this Swiss conference, Collaborations Pharmaceuticals decided to explore how to use AI to design toxic molecules. The company previously designed a drug molecule generation model called MegaSyn, which uses machine learning models to predict biological activity and find new therapeutic inhibitors for human disease targets. This generative model usually penalizes predicted toxicity and rewards predicted target activity.

In the new experiments, they tweaked the model so that it rewards both toxicity and bioactivity and trained it using molecules from a public database.

Their method and results were disturbingly simple: by training the software on the chemical structures of a set of drug-like molecules (defined as substances that are easy to synthesize and readily absorbed by the body) extracted from a public database, as well as the known toxicity of those molecules, it took the modified software less than six hours to find 40,000 potentially deadly molecules that met the researchers’ predefined parameters and could be used as chemical weapons.

The Verge interviewed the paper’s first author, Fabio Urbina, a senior scientist at drug discovery company Collaborations Pharmaceuticals, about the potential misuse of AI technology in drug development.

The research team had never thought of this before, and they were vaguely aware of the safety issues of working with pathogens or toxic chemicals. Urbina's work is rooted in building ML models for therapeutic and toxic targets, not to create viruses, but to better assist in the design of new molecules for drug discovery, using ML models to predict the toxicity of newly produced drugs.

It's like, there is a wonderful drug that can magically lower blood pressure, but its side effect is to pierce the heart channel. Then, this drug touches the forbidden zone and it is impossible to be put on the market because it is too dangerous.

For decades, teams have been using computers and AI to improve human health. In other words, no matter what kind of drugs you try to develop, you first need to make sure they're not toxic.

Recently, the company has released a number of computational ML models for toxicity prediction in different fields, and Urbina chose to flip the switch and really go into toxicity when speaking at the conference, exploring how AI can be used to design toxic molecules.

It was an unprecedented thought exercise for the team, which ultimately evolved into a computational proof of concept for building biological weapons.

Urbina is a bit vague in his description of some details, deliberately concealing certain details to prevent them from being exploited.

In simple terms, the general workflow of the entire experiment is to use existing molecular data sets in the research and development history as predictive labels, because these molecules have been tested for toxicity.

It is important to note that the team focuses on VX.

So what exactly is VX?

Technically, it's a man-made chemical warfare agent that's classified as a nerve agent. And nerve agents are the most toxic and fastest-acting of the chemical warfare agents known. Specifically, VX is an inhibitor of what's called acetylcholinesterase. Whenever you do anything with your muscles, your neurons use acetylcholinesterase as a signal to encourage you to "move your muscles." This is where VX is deadly, it actually blocks the movement of your diaphragm, the muscles that affect your lungs, causing your lungs to become paralyzed, making it impossible to breathe, and even paralyzed.

Obviously, this is something people want to avoid. So historically, different types of molecules have been experimented with to see if they inhibit acetylcholinesterase. So Urbina built a large dataset of these molecular structures and their toxicity.

The team can then use these datasets to create an ML model that can basically tell which parts of the molecular structure are important for toxicity and which parts are not. This ML model can then be given new molecules, perhaps new drugs that have never been tested before. Its judgment then tells us which drugs are predicted to be toxic or predicted to be non-toxic.

It is the above method that effectively improves the speed at which researchers screen drugs, that is, they can screen a large number of molecules very quickly and eliminate those that are predicted to be toxic.

However, in this study, the team reversed this. Apparently, the purpose the team was trying to achieve with this model was to predict toxicity.

In addition, another key part is these new generative models. The team can feed the generative model some completely different structures, and it can learn how to put molecules together. Then, in a sense, you can ask it to generate new molecules. At this point, the generative model can generate new molecules throughout the chemical space, but they are just some random molecules with no real meaning. But one thing the researchers can do is tell the generative model what direction to expect.

Of course, this can be achieved by designing a scoring function that gives it a high score if the molecule it generates is what the researchers expect. For example, in the case of generating poisons, it is necessary to give high scores to toxic molecules.

The experimental results show that the model begins to generate these molecules, many of which look like VX and some other chemical agents.

Urbina said the team wasn’t really sure what to expect, because generative models are still relatively new and not widely used.

But one thing that is particularly concerning is that many of the compounds produced are predicted to be more toxic than VX. What's even more alarming is that VX is basically one of the most potent compounds known, which means that only very, very, very small amounts are needed to cause death.

Although these predictions have not yet been validated in real life, and the researchers say they don’t want to validate them themselves, the prediction models generally perform quite well, so even if there are a lot of false positives, there should be more toxic molecules among them.

Second, the team actually looked at many of the structures of these newly generated molecules. It was not hard to see that many of them looked like VX and other warfare agents, and even real chemical agents in some of the models. And these were generated when the model had never seen these chemical agents before. There is no doubt that the model must be able to generate some toxic molecules, because some of these molecules have been made before.

The worrying question, then, is how easy is it to achieve?

The researchers said that many of the things used in the development process are free. You can download the toxicity dataset from anywhere. If someone knows how to program in Python and has some ML skills, it may be possible to build a generative model similar to this one driven by the toxic dataset in a short weekend.

So this is why the researchers really considered publishing this paper: the bar for this type of abuse is simply too low.

“We still cross a grey ethical line by demonstrating that it is possible to design virtual potentially toxic molecules without much effort, time or computational resources,” Urbina said in the paper. “While we can easily delete the thousands of molecules we created, we cannot delete the knowledge of how to recreate them.”

Urbina said that this is a very unusual topic, and they want to get this real information out there and really talk about it. At the same time, they don’t want it to fall into the wrong hands.

But he made it clear that as scientists, they should be careful that what they publish is done responsibly.

Beyond that, Urbina said, what’s being done is really easy to replicate because a lot of it is open source — the science is shared, the data is shared, the models are shared.

Urbina fervently hopes that more researchers will acknowledge and become aware of the potential for abuse.

When you start working in chemistry, you are indeed informed about the dangers of chemical misuse and it is your responsibility to ensure that you avoid this as much as possible. In ML, on the contrary, there is no guidance whatsoever about misusing the technology.

“We just hope more researchers acknowledge and are aware of the potential for abuse,” Urbina said.

Given that the performance of the models is getting better and better, it is necessary to make this awareness public, which can really help people pay attention to this problem: at least it is discussed in a wider circle, and at least it can become a point of attention for researchers.

<<:  Is it because the seaweed that doesn’t want to be used as nori is not good seaweed, that it is expensive?

>>:  The world's first shot!

Recommend

Is Shanghai’s beverage “nutrition grading” healthier?

Recently, some beverages in Shanghai have begun t...

Can't you supplement calcium by sunbathing through glass?

Since childhood, we have often seen calcium suppl...

Watermark 1.0.0 Watermark Widget

【Software Description】 Watermark is a software th...

If smart drugs really exist, would you dare to take them?

Every exam season, rumors about "smart drugs...

What do we do when we promote our app?

Abstract : I am quite familiar with this aspect, s...

Science or witchcraft? Does hypnosis really exist?

In movies and TV shows, we often see scenes where...

Why does the sun have a cycle when it loses its temper?

It is reported that China's first comprehensi...

A brief analysis of precise audience targeting in Internet advertising!

With the development of Internet advertising tech...