Changing the course of biological research: AI models unlock the code of life information

Changing the course of biological research: AI models unlock the code of life information

There are many research directions in the field of life sciences, such as cell biology and molecular biology at the microscopic level, and ecology that studies the relationship between organisms and the environment. The research that is closest to the laws of life activities, development mechanisms, and the essence of life is the research on biological macromolecules, such as proteins and nucleic acid structures.

Systematic and in-depth research on proteins can allow us to interpret the composition and operation of living organisms at a deeper level, and then fully reveal the mechanism of life operation and development, and stimulate the development of biological sciences, drug development, and synthetic biology. Therefore, protein research and protein structure prediction are areas in which academia and industry are deeply involved. In the AI ​​era, thanks to the great improvement in computing power and algorithm models, we have also witnessed the historical moment of protein structure prediction.

The CASP competition, held every two years, is known as the "Olympic Games in the field of protein structure prediction." In the evaluation of the protein test set based on CASP14 (the 14th International Protein Structure Prediction Competition in 2020), Tianrang TRFold achieved the best result among all publicly available protein structure prediction models in China, second only to DeepMind's AlphaFold2, which ranked first in the world. The CASP competition is a very valuable competition. Winning excellent results in such an international competition also means that the performance of domestic computational biology has broken through and entered the world's first echelon.

Whether it is the world-renowned AlphaFold2 model or the new domestic TRFold model, these cutting-edge AI models have given research in the field of life sciences catalytic efficiency. Following the value exploration process of protein research, let us take a look at this journey of reconstructing the fields of life sciences and medicine.

Open up scientific research ideas and research space

We have had a brief introduction to protein in middle school. We know that protein is the main functional molecule in cells and is involved in almost all cell functions: for example, various enzymes that can catalyze the digestion of food; hemoglobin in the blood transports nutrients and metabolic waste; it participates in the regulation of metabolism in organisms, such as insulin; myosin is used in the formation of the cytoskeleton, and proteins are involved in immunity, cell differentiation, apoptosis and other processes.

When proteins participate in the process of exercising cell functions, they must fold into specific structures. However, the differences in their arrangement and position make the types extremely numerous. There are 10^300 ways for proteins to fold in three-dimensional space, and the structure is very complex. Different folding methods make the activity and biological properties of proteins uncertain, and this complex characteristic also dooms the path of studying proteins to be difficult.

There are three main traditional methods for observing protein structure, including nuclear magnetic resonance, X-ray, and cryo-electron microscopy, but these methods often rely on expensive equipment and a lot of trial and error, and each structure study takes several years. In history, scientists have spent decades to obtain a clear three-dimensional structure of a protein, and the determination of the three-dimensional structure of a protein has become a very difficult research in the field of biology. So far, without the assistance of AI technology, only 170,000 three-dimensional structures have been clearly seen, which is a huge gap compared to the total amount of proteins.

The latest progress in the application of AI to protein structure, namely the AlphaFold2 model and the TRFold model, can predict protein structures with high confidence in a few days or even minutes, which used to take decades. Compared with traditional measurement methods, it is not only fast but also low-cost, which is very suitable for high-throughput protein structure acquisition. The study shows that at this rate, the prediction of 130 million protein structures will be completed by the end of this year, which may completely change the research process of life sciences.

This also means that large-scale protein structure prediction under the guidance of AI will become an important tool for scientific researchers, which can answer new scientific questions from the perspective of structure and open up scientific research ideas. For example, researchers can perform functional annotations on unknown functions or newly discovered protein molecules through structural analysis to guide the design of biological experiments for functional confirmation. It is also possible to analyze the structure of proteins, confirm functional units or domains, provide targets for genetic operations, and provide reliable basis for designing new proteins or modifying existing proteins. For in-depth research in the field of biological sciences, AI models such as TRFold developed by the Tianrang team can further open up the discovery and exploration space for innovative research in the field of biological computing around protein structure and function issues, and promote the development of this field at a faster pace. In addition to support for biological structural sciences, AI models also have room to play in the research process of medicine and pharmacology.

Rapid analysis of virus structure and folding of drug development time

New drug development is one of the most risky, complex and time-consuming technical research fields in human development. According to a statistical report from the Tufts Center, it takes an average of $2.6 billion and about 10 years to develop a new drug that is successfully marketed. The high cost is related to the huge failure rate of drug development. In the past decade, the average success rate of drug development projects from Phase 1 clinical trials to FDA approval for marketing was 7.9%.

With the development of artificial intelligence technology, the cost of new drug development that uses AI has been reduced by 35%, and the development cycle has been shortened from 5-10 years to 1-3 years. In fact, drug development is a systematic project. In this system, AI technology can address core pain points such as screening and design optimization in the drug development process, reduce a lot of trial and error and rework time, and save the cost of drug development.

The TRFold model can predict some disease-related protein structures at low cost, and then find potential drugs for these diseases through drug repositioning, virtual screening and other methods. For example, in some rare diseases such as albinism and osteogenesis imperfecta, these diseases cannot be taken seriously by pharmaceutical companies due to low returns and the fact that most patients are poor. In China alone, there are more than 20 million patients with such diseases. Although such neglected diseases account for 12% of the total diseases in the world, only 1.1% of newly developed drugs are suitable for neglected diseases. Today, AI models such as AlphaFold2 and TRFold have brought hope to the development of drugs for such diseases. By quickly and accurately predicting protein structures and providing reasonable target molecules and structures for the design of new drug molecules, drug development for diseases that are almost exclusively concentrated among the poor has become possible.

In clinical trials of new drugs, the TRFold model can also serve as a "toxicity warning system." Animal models are very valuable in testing drug toxicity, but we need to reduce risks when entering high-risk human clinical trials, otherwise unexpected toxic side effects will cause new drugs to withdraw from clinical research and all previous efforts will be wasted. The general solution is to develop highly simulated human biological systems, but this is still difficult to achieve. AI models such as TRFold give us a 3D model of human proteins, which may help us build better human biological simulation systems.

Of course, in some bacterial and virus-related protein structure research, the TRFold model will also expand the scope of functional analysis of protein types and downstream applications, such as research on some viral infection diseases, the development of antibiotics and targeted drugs, and the research and development of new efficient enzymes, etc., to contribute to drug research and health.

However, there are still many research works that require extremely high accuracy of protein structure. For example, the displacement of iron ions in hemoglobin is discussed at the scale of a few tenths of an angstrom. For the fine analysis of such structural details, the predicted structure cannot be used as a basis for discussion, because the slightest uncertainty may lead to completely different conclusions. The algorithm universality and accuracy of AI models still have room for improvement, and algorithms in areas such as the complex structure of proteins and their ligands and the dynamic analysis of proteins need to be improved.

AI prediction model drives deep into life information

Using AI models to predict the structure of a single protein is just the beginning. The prediction of the structure only points out the direction of research. Subsequent progress still requires experiments and brainstorming. There are also some structures that cannot be predicted and discovered by AI models, and their research remains a mystery, which also leaves a lot of room for researchers, enterprises and research institutions.

Different protein structure prediction models at home and abroad will occupy their respective areas of expertise and play their roles in the vast fields of life sciences and biotechnology. Structural biologist and academician of the Chinese Academy of Sciences Shi Yigong once expressed his views on AI prediction models, "The three-dimensional structures of proteins that can be predicted in the human proteome have basically been predicted by AlphaFold. Overall, the prediction results are credible and relatively accurate. This is a very remarkable historical achievement in the scientific exploration of mankind's understanding of nature, and it is also one of the most important scientific breakthroughs made by mankind in the 21st century."

There is no doubt that AlphaFold2 is a major breakthrough in protein structure prediction. The high-quality protein structures predicted by this type of AI model will promote the development of new technologies for efficient screening of compounds and the entire life cycle of drug development.

Some people may wonder, the performance of the AlphaFold 2 model is powerful enough, why do we have to spend energy and brainpower to build the algorithm ourselves? In fact, although Deepmind has open-sourced the AlphaFold 2 model, it is the reasoning code that is open-sourced, not the training code, so the code downloaded from github can actually only run the AF2 model algorithm and directly predict the protein structure. If you want to focus on protein structure function issues, or an AI algorithm that can meet the accuracy requirements of actual landing applications, you cannot advance this technology to solve deeper problems without experience in training models, or without the ability to train the AlphaFold 2 model results.

Just like chips, without core technical capabilities, the exploration of deep life sciences will be limited when it comes to protein structure prediction. The TRFold algorithm platform created by the Tianrang team is completely domestically developed and built up slowly from the bottom-level code. It has achieved good results in international competitions, second only to the AlphaFold 2 model.

During the two and a half years of research and development, TRFold has gone through dozens of versions of iterations. The current training architecture was designed at the beginning of this year. It took 10 months to process data, train data, and continuously iterate and optimize. The prediction accuracy of its latest version is close to that of AlphaFold2, and it breaks through the bottleneck of AlphaFold2 requiring super computing power. Different from the AlphaFold2 model, TRFold has its own thinking and design. TRFold uses weight sharing to save computing power. With limited training resources and computing power, the Tianrang team made improvements in data and network design, using only a small amount of real data for training, so that the model can obtain better recognition of real co-evolution information during training, thereby obtaining more accurate prediction results for amino acid residue distances and coordinates.

Its computing power consumption is about 1/32 of AlphaFold2, and it takes no more than 16 seconds to predict most protein chains. Compared with the more than 70 seconds required by AlphaFold2 to predict a protein chain of about 400 amino acids, it has obvious advantages in training and generating small sample data. In the subsequent process of building protein interaction networks, with the exponential growth of computing power, the research on protein structure prediction has far-reaching significance, and also opens the door to domestic in-depth research in subsequent research such as structural bioscience and drug research. We will not rely on others due to technical limitations.

The TRFold model of the Tianrang team also has its own development direction: focusing on protein structure and function issues and being able to meet the accuracy requirements of actual application, and then promoting the solution of deeper problems. For example, to study the problem of protein-protein interaction, use the current whole proteome co-evolution analysis to establish a precise link between protein-protein interactions. By studying the interactions between proteins, it helps researchers build large-scale interaction network diagrams, find new ideas for drug binding targets, and new methods for precision disease treatment. In the development of new drugs, antibody simulations, and other vaccines, it improves the accuracy and success rate of protein design, and helps to verify the protein design of various vaccines such as the new crown vaccine.

Throughout the history of science, every major advancement in scientific research is inseparable from the support of the technology at the time. Whether in the difficult years of protein purification or in the era of cryo-electron microscopy technology for observing proteins with electric eyes, the tools scientists use for research rely on the highest level of technology at the time. With the continuous breakthroughs in AI technology, companies such as DeepMind and Tianrang, which are leading the way in the deep waters and no-man's land in the cutting-edge life science field, continue to empower researchers with AI technology to help them with their scientific research, allowing researchers to say goodbye to relying on human prior knowledge to predict protein structures.

In the foreseeable future, standing on the shoulders of AI giants, the development of this field will surely have a qualitative leap. Proteins are macromolecules that can affect the life process, and AI algorithm models have opened up a new world in the field of life sciences for us. These massive protein structures will be released by technology, and the interpretation and analysis behind them contain a "rich mine" of life information, waiting for us to explore and mine.

<<:  In-depth | Has the consumer revolution arrived? Takeaway recyclable tableware is becoming a trend in the United States

>>:  Understand the marine economy and protect the marine environment

Recommend

Case analysis: How to build a user incentive system?

The construction of a user incentive system is ge...

You must be kidding! How can there be sharks in the lake? !

Shark watching in the lake? It's not a fantas...

How to use “scenario-based thinking” to write copy?

We often mention "using scenario-based think...

Microsoft's high-tech HoloLens: Will it take over AR after VR?

The three-day E3 gaming expo in the United States ...

A new way to place wedding photography video ads!

This article shares with you a case study of the ...

Marketing and promotion activities tips for training institutions!

Double Eleven has just passed, and Double Twelve ...

Guangzhou mobile room bandwidth rental price

The mobile server is a computer room connected to...

There is no entrepreneurial winter in the Sequoia world

[[159040]] The industry structure has changed sud...

Image social networking is counterattacking general social networking

Preface: If we look back at the original intentio...

A collection of ViewHolder tool classes

Preface In the process of developing APP, enginee...