The following article is from Nature Portfolio, author Nature Portfolio Today's digital painting technology can customize the design of useful biological molecules on demand. "Okay, let's start now." David Juergens, a computational chemist at the University of Washington, is planning to design a protein that has never appeared in the past 3 billion years of evolution. In a video call this morning, Juergens opened a cloud version of RFdiffusion, an artificial intelligence (AI) tool he helped develop. This neural network and other similar tools are bringing custom proteins into mainstream research —an endeavor that until recently was difficult and often failed. An AI tool called RFdiffusion designed a protein that binds to parathyroid hormone (pink). Credit: Ian C. Haydon/UW Institute for Protein Design These proteins could form the basis for vaccines, therapeutics and biomaterials. “This is absolutely a game-changing moment,” said Gevorg Grigoryan, chief technology officer of Generate Biomedicines, a biotech company in Massachusetts, which is trying to apply protein design to drug development. These tools are inspired by AI software that can synthesize realistic images, such as Midjourney, which became popular this year for synthesizing a photo of the Pope wearing a designer white down jacket. Researchers have found that similar concepts can be used to generate realistic protein shapes that meet the designer's requirements, which means that new proteins that bind tightly to another biological molecule can now be quickly designed. Early experiments show that when researchers synthesize these proteins, a small number of useful proteins behave the same as predicted by the software. Over the past year, these tools have revolutionized the protein design process, researchers say. “The capabilities have suddenly exploded,” says Mohammed AlQuraishi, a computational biologist at Columbia University in New York, whose team developed a protein design tool. “You can now design a protein that has the function you want.” “You’re customizing a protein structure for a problem,” says David Baker, a computational biophysicist at the University of Washington. Baker’s team, which developed RFdiffusion and includes Juergens, released the software in March 2023 and published a paper describing the neural network in Nature[1]. (A preprint of the paper was published in late 2022, when several other groups, including AlQuraishi[2] and Grigoryan’s group, reported similar neural networks.) This is the first time that protein designers have these reproducible and reliable tools, and they can use them to create a whole new industry, Grigoryan said. “The next challenge is, what can you do with it?” Grand design Juergens entered some of the properties he wanted for the protein into a web form that resembles an online tax calculator. It had to be 100 amino acids long and able to form a symmetrical complex of two proteins called a homodimer. Many cell receptors have this configuration, and a new homodimer could serve as a synthetic cell-signaling molecule, says Joe Watson, a computational biochemist at the University of Washington in Washington, who helped develop RFdiffusion and joined today's video call. But this morning's design was designed only to simulate a realistic protein, with no other purpose. For decades, researchers have struggled to design new proteins. Initially, they tried to piece together useful parts of existing proteins, such as the part of an enzyme that catalyzes a chemical reaction. This approach requires an understanding of how proteins fold and work, as well as intuition and a lot of trial and error. Scientists sometimes need to screen thousands of designs before they find a structure that meets their expectations. Baker says the emergence of AlphaFold (developed by London-based AI company DeepMind, now Google DeepMind) and other AI models that can accurately predict protein structure from amino acid sequences is a promising step. Designers have found that these neural networks, trained on real protein sequences and structures, can also be used to create proteins from scratch. Baker’s group and others have released several AI-based protein design tools over the past few years. These tools use a method called “hallucination,” which involves creating a random chain of amino acids that is then optimized by tools like AlphaFold or RoseTTAFold until it forms something that the neural network believes can fold into a specific structure. Another method, called “inpainting,” takes a specific protein sequence or structure and uses RoseTTAFold to build the rest of the molecule around it. None of these tools are perfect, however. Experiments have shown that structures designed using the “hallucination” method do not always fold into their proper configuration when synthesized in the lab, ending up as a pile of material at the bottom of a test tube. The “hallucination” method also has difficulty generating anything other than small proteins (although other researchers suggested in a preprint paper in February that the method could be used to design longer molecules[4]). The “repair” method also has a limited ability to form proteins from short fragments. While the method does yield theoretical protein structures, it does not offer a different approach to a problem that would improve the chances of success. That’s where RFdiffusion and similar protein-design AI tools released in recent months come in. They work on the same principles as neural systems that synthesize realistic images, such as Stable Diffusion, DALL-E, and Midjourney. These “diffusion” networks are trained on data, whether it’s an image or a protein structure, and the training process “adds noise” to the data, so that the final result has no resemblance to the image or structure at the beginning. The network then learns to “de-noise” the data, performing the task in reverse. Networks like RFdiffusion are trained using tens of thousands of real protein structures stored in the Protein Data Bank (PDB). When the network designs a new protein, it starts with total noise: random combinations of amino acids. “You want to know what protein is producing this noise,” Watson said. After several rounds of denoising, it produces proteins that look realistic but are actually completely new. When Baker's team tested RFdiffusion without giving any instructions other than protein length, the network came up with a variety of proteins that looked realistic, completely different from those trained using the PDB. However, the team was also able to direct the program to design proteins according to specific design requirements during the denoising process, a process called "conditioning." For example, Baker’s team used RFdiffusion to design proteins with special folding structures or that can attach to the surface of another molecule (the interaction mechanism behind the binding). Grigoryan’s team also developed a diffusion network called Chroma, which uses conditioning to design proteins that resemble the 26 capital letters of the English alphabet and Arabic numerals [3]. AI designed proteins that resemble the English alphabet. Source: John Ingraham, Wujie Wang, Max Baranov, Gevorg Grigoryan Noise signal Juergens’ computer screen starts out noisy, as the AI system starts with random sequences of amino acids. They’re red, jumbled lines that look like a child’s finger paintings. Frame by frame, they morph into more complex structures with protein-like features, such as tight spirals called alpha helices and inverted filaments called beta sheets. “It’s a perfect marriage of alpha-beta topology,” Juergens says, looking at the protein he designed in just a few minutes. “It looks good.” “The design process is very different than it was a year ago,” said Baker, who uses the tool throughout his lab. The neural network can do tasks that other methods have been slow, difficult or unsuccessful to accomplish. In one analysis reported in the study[1], the team started with a fragment of another protein, such as a portion of a viral protein that immune cells recognize, and asked the AI tool to generate 100 different new proteins to see how many contained the motif they were looking for. The team challenged the task with 25 different starting shapes. The final results didn’t always contain the starting fragment, but RFdiffusion produced at least one protein with 23 of the target motifs, compared with 15 for the “hallucination” method and 12 for the “repair” method. RFdiffusion has also been shown to be able to design proteins that self-assemble into complex nanoparticles that can deliver drug or vaccine components. Previous AI methods [5] were also able to design such proteins, but Watson believes that RFdiffusion’s designs are more sophisticated. Neural networks like RFdiffusion seem to be good at designing proteins that bind to another specific protein. Baker’s team has used the network to design proteins that bind tightly to proteins in diseases such as cancer and autoimmune diseases. In one successful experiment, which has not yet been published, they designed proteins that bind tightly to a hard-to-target immune-signaling molecule—a target that generates billions of dollars in revenue each year for antibody drugs, he says. “It expands the range of proteins that we can find to bind to and develop useful therapeutics for,” Watson says. Real-world testing Since Baker’s team designed so many proteins, testing whether they had the expected functions became a daunting task. “One machine learning researcher could design a protein that would keep 100 biologists busy for months,” said Kevin Yang, a researcher at Microsoft Research in Massachusetts who studies biomedical machine learning.[6] But early signs are that RFdiffusion’s designs are the real deal. In another assignment they describe, Baker’s team used the tool to design proteins containing key fragments of p53, a signaling molecule that’s overactive in many cancers (and an attractive drug target). When the team synthesized 95 of the software’s designs—by making them express them in engineered bacteria—more than half had the ability to bind p53 to its natural target, MDM2. The best design bound more than 1,000 times more strongly than natural p53. When the researchers used the “hallucination” approach, designs that predicted success didn’t actually work in the test tube, Watson says. Overall, Baker says his team has found that 10-20% of RFdiffusion's designs bind tightly enough to their intended targets to produce useful effects, compared with less than 1% success rates with previous approaches using AI. (Previous machine-learning approaches couldn't reliably design binding proteins, Watson says.) Matthias Gloegl, a colleague at the University of Washington and a biochemist, says his recent success rates are approaching 50%, meaning that it might take just a week or two rather than months to get a useful design. "It's a little crazy," he says. A funnel-shaped protein assembly (top row) and a ring structure with six protein chains (bottom row) designed using noise by the diffusion-based AI drawing generator. Credit: Ian C. Haydon/UW Institute for Protein Design Sergey Ovchinnikov, an evolutionary biologist at Harvard University, said that by the end of June, the cloud version of RFdiffusion had about 100 users a day. Joel Mackay, a biochemist at the University of Sydney in Australia, tried to use RFdiffusion to design proteins that could bind to other proteins studied in his lab, including transcription factors that control gene activity in cells. He found the design process simple and used computer modeling to confirm that these proteins could theoretically bind to these transcription factors. Mackay is testing whether these proteins can alter gene expression as expected when produced in cells. He hopes that success will lead to simple ways to switch specific transcription factors on and off in cells, without resorting to drugs that take years to develop—if such drugs can be developed at all. “If this approach works for our protein, it’s a game-changer,” he says. Future Optimization Charlotte Deane, an immunoinformatician at the University of Oxford in the United Kingdom, says the latest models, such as RFdiffusion, are “a quantum leap.” But key questions remain. “What it can do is open people’s eyes to the potential of these diffusion models,” she says. One application that she and other scientists and biotech companies are interested in is designing more complex binding proteins, such as antibodies or protein receptors used by T cells (a type of immune cell). These proteins have flexible coiled structures that can interlock with their targets, and RFdiffusion is currently best at flat sandwich-like interfaces. Baker said they have made progress in antibodies. In general, Ovchinnikov and others believe that such designs are more difficult for biomolecules whose function depends on loose regions folding into different shapes. These features have proven difficult to simulate with AI. "If your question is whether it can bind to something else and inhibit it," Ovchinnikov said, "then the problem can be solved with these methods. But if you want to accomplish more complex tasks, such as simulating natural phenomena, you have to add some flexibility." Tanja Kortemme, a computational biologist at the University of California, San Francisco, is using RFdiffusion to design proteins that can act as sensors or cell control switches. She said that if a protein's active site depends on the arrangement of a few amino acids, the AI network can perform well, but it has difficulty designing proteins with more complex active sites, which require more key amino acids, which is also a problem she and her colleagues are working to solve. Another limitation of the latest diffusion models is that they can’t design proteins that look radically different from natural ones , Yang said. Because such AI systems are trained on existing proteins that researchers have characterized, he said, the proteins they design are “copycats.” Designing proteins with unusual appearances may require a better understanding of the physical mechanisms that give proteins their functions. "This makes it easier to design proteins that can perform tasks that natural proteins cannot," Yang said. "There is still a lot of room for improvement." The latest protein-design tools have proven very powerful at designing proteins for specific tasks, as long as the function can be described in terms of structure, such as the surface to which the protein binds, AlQuraishi said. But he stressed that tools like RFdiffusion cannot yet handle other features, such as designing a protein that will perform a specific reaction no matter what shape it takes—you know what you want but you don't know what the geometry should be. Grigoryan said that future protein design tools will also need to be able to design proteins according to a variety of requirements. Potential therapeutic proteins must not only bind to their targets, but also not bind to other objects, and can be easily mass-produced. One direction the researchers are exploring is whether they can describe the design of these proteins in plain language, much like the prompts given to image-generating tools like Midjourney. “You could imagine writing down a description of a protein, then having them synthesize it and then testing it,” Watson said. Grigoryan and his colleagues have taken a step toward this goal. In a preprint paper published in December 2022 [3], they trained Chroma to incorporate descriptions into its designs and generate designs based on text descriptions, including "protein with CHAD domain" (a protein structure that combines multiple helices) or "crystal structure of aminotransferase" (an enzyme involved in the synthesis and degradation of proteins). The protein that Juergens spent a few minutes designing this morning is just a model of the protein's 3D structure. Juergens then used another AI tool to get the amino acid sequence that can fold into this structure. In a final check, he plugged the sequence into AlphaFold to see if the software could predict a folded structure that matched the design. The results were exactly the same, with AlphaFold's predictions and the designed structure only differing by an average of 1 angstrom (the width of a hydrogen atom). “With this level of accuracy we can call it a successful design,” Watson said. The only thing left to do is to see how the protein performs in the real world. References: 1. Watson, JL et al. Nature https://doi.org/10.1038/s41586-023-06415-8 (2023). 2. Lin, Y. & AlQuraishi, M. Preprint at https://arxiv.org/abs/2301.12485 (2023). 3. Ingraham, J. et al. Preprint at bioRxiv https://doi.org/10.1101/2022.12.01.518682 (2022). 4. Frank, C. et al. Preprint at bioRxiv https://doi.org/10.1101/2023.02.24.529906 (2023). 5. Wicky, BIM et al. Science 378 , 56–61 (2022). 6. Wu, KE Preprint at https://arxiv.org/abs/2209.15611 (2022). The original article was published in the News Feature section of Nature on July 11, 2023 with the title AI tools are designing entirely new proteins that could transform medicine © nature doi: 10.1038/d41586-023-02227-y Copyright Notice: This article was translated by Springer Nature Shanghai. The Chinese content is for reference only. All other content is subject to the English original. You are welcome to forward it to your friends circle. If you need to reprint it, please email [email protected]. Unauthorized translation is an infringement and the copyright owner reserves the right to pursue legal liability. © 2023 Springer Nature Limited. All Rights Reserved |
>>: The heroes are cool and cool? They may also be workers!
Doing a good job of website SEO optimization can ...
Original title: "Ask the reporter·Fun facts ...
Red envelopes are one of the common marketing met...
Every day when we enter and leave residential are...
A quality live broadcast must be carefully planne...
Douyin’s 7-day spiral feed resonance superpositio...
Intel is not a GPU company, but because its deskt...
Before iOS 17 was released, due to the new EU reg...
This summer, bayberry, a fruit produced in abunda...
The World Cup, which is held every four years, is...
Since most mobile devices used by designers and p...
The sub-zero temperatures in Beijing have added a...
With every opportunity for consumption upgrade, w...
The original article lists 21 innovative growth e...