What is biocomputing? If a scientist tells you that he is doing protein structure research, I think most people will have the following image in their mind: he is wearing a white coat, sitting in front of a microscope and observing intently. This seems to have become the standard image of a biochemist. However, one day when you go to his office, you find that there is no microscope or test tube at all. You only see him wearing loose and comfortable casual clothes, typing codes on the computer. This scene will definitely surprise many people. In fact, to be precise, this scientist is a biocomputationalist (or computational biologist). His research object is indeed tiny biologically active substances such as proteins or DNA, but unlike traditional biochemists, his research tools are not test tubes and microscopes, but computers. When many people first see news headlines like "Company X enters biocomputing," they often have a strong sense of mystery. Some even think that the company is going to use biologically active substances to make computers, just like the soft brain tissue with electrodes inserted in it that appears in science fiction movies. This is a huge misunderstanding. What these news reports actually say is that a certain company is going to design an AI algorithm that can accurately draw the three-dimensional structure of a protein based on limited protein information. To use a figurative metaphor, this company is equivalent to helping the police design a program that can accurately draw the appearance of the murderer based on the victim's description. Drawing the three-dimensional structure of a protein is of great significance for the development of new drugs, such as the development of a vaccine for the new coronavirus. However, this problem is also extremely difficult. It is one of the several critical challenges facing human science today. To know why it is difficult and why it is so important, read on. The protein folding problem Protein is a group of organic macromolecules from a microscopic point of view. It is the basic component of life. Each protein has a specific three-dimensional structure, but this three-dimensional structure has a special feature: it must be folded from a long chain. To understand what the three-dimensional structure of protein is like, you only need to play with a children's toy called "Magic Ruler" and you will understand it immediately. The Magic Ruler is made up of sections, and each section can be flipped at various angles. Therefore, you can fold a long magic ruler into various shapes. The more sections the magic ruler has, the more shapes it can fold. The basic unit of protein is amino acid, which is like a "section" of a magic ruler. When a protein is just generated, it is like a long magic ruler with dozens to hundreds of sections. Then, it will quickly fold into a specific shape within a few microseconds to milliseconds. Therefore, under an electron microscope, each protein looks like a tangled mess. Therefore, the properties and functions of a protein are determined by the amino acid sequence that makes up the protein and the shape that the protein eventually folds into. For example, when our immune system faces viral and bacterial invasions, it will produce a "Y"-shaped antibody protein. Their shape is like the claws of a claw machine, which can accurately target and clamp these invaders. Image: Antibodies targeting and recognizing the virus There is a lot of collagen between our ligaments, bones and skin. Their shape is like a braid made of three thick ropes, which provides tension to our skin and makes it appear elastic. Figure: Collagen in a twisted shape For example, CRISPR, the gene-directed editing technology that won the Nobel Prize in 2020, also uses a CAS9 protein that looks like a crab claw. It "tightly clamps" a specific section of DNA in the genome and cuts it. Image: The crab claw-like CAS9 protein (orange) tightly clamps the DNA (green) Therefore, scientists are most interested in two pieces of information about proteins: one is the amino acid sequence of the protein, which you can imagine as the "sections" of the magic ruler; the other is the structure of the protein, which is the shape of the magic ruler after folding. Sequence information is relatively easy to obtain, but structural information is extremely difficult to obtain. Structural information is more important because knowing the structure of an unknown protein can more accurately understand its role in cells. If this protein is associated with a certain disease, scientists can develop corresponding drugs based on its structural shape. In 1972, Nobel Prize winner in Chemistry Christian Anfinsen proposed a hypothesis: In fact, we only need to know one piece of information. Because he found in the experiment that as long as the sequence of a protein does not change and it is always in the same chemical environment, it can fold into the same three-dimensional structure every time. Therefore, how a protein should fold in three-dimensional space is actually contained in its amino acid sequence. In other words, if we know the amino acid sequence of a protein, theoretically we should be able to infer its three-dimensional structure. Anfinsen's hypothesis was recognized by his peers all over the world. However, scientists soon discovered that it seemed useless to know this theory. To use a popular Internet phrase - it's useless. Although we can relatively easily measure the amino acid sequence of a protein in the laboratory, we still cannot accurately infer its three-dimensional structure based on a certain physical law. Scientists have been studying this for nearly 50 years, and to this day they have not completely figured out the laws of protein folding. This problem is called the "protein folding problem" in the biochemical community, and it is one of the major challenges facing human science in the 21st century. Money-burning industry The only way for scientists today to figure out the three-dimensional structure of a protein is to spend huge amounts of manpower and material resources, use extremely clumsy methods, and conduct a large number of repetitive experiments to find the three-dimensional structure of the protein. The experimental equipment required, such as cryo-electron microscopes, X-ray crystal diffractometers, and nuclear magnetic resonance instruments, are all expensive. For example, the price of a cryo-electron microscope is as high as millions to tens of millions of RMB. Whether the process of analyzing the structure is smooth or not depends largely on luck. When you are unlucky, you may not get any results even if you repeat the experiment thousands of times. Therefore, the cost of analyzing each protein structure is usually between tens of thousands and hundreds of thousands of dollars. Since the end of the last century, some computer technology companies led by IBM have proposed a bold idea: to use supercomputers to predict the three-dimensional structure of proteins through the amino acid sequence of proteins. This is equivalent to transferring the experiments originally conducted in test tubes to the digital space of computers. This idea was very bold and avant-garde at the time, because its computing power was astronomical for the computers at that time. You may be curious: How can predicting the folding of a protein require a huge amount of computing? Roughly speaking, the computing process is like drawing prizes in a lottery box. A protein with 100 amino acids, you can imagine it as a magic ruler with 100 sections, which can produce about 10^94 different shapes. This number has far exceeded the number of elementary particles in the entire universe. What the computer has to do is actually the elimination method. According to certain rules, first eliminate a certain type of absolutely impossible structure in batches, and then eliminate them one by one according to the characteristics of the protein. In the final stage, it is like constantly drawing prizes in a huge lottery box, and each draw requires a huge amount of computing power. IBM spent five years on research and development, and finally announced in 2004 that the world's largest supercomputer, Blue Gene, was launched. Its main goal was to solve the protein folding problem. However, things did not go as well as computer experts expected. Ten years later, Blue Gene was upgraded three generations, but the supercomputer still could not replace test tubes, X-ray crystal diffraction, and nuclear magnetic resonance. IBM also regretfully terminated the development of the Blue Gene series [1]. However, IBM's failure does not mean the failure of computer simulation of protein structure. On the contrary, driven by IBM, more and more teams participated in this challenge and achieved more and more results. Various wonderful solutions emerged one after another, and the most interesting example was the invention of Professor David Baker of the University of Washington. In 2008, his team developed a puzzle game called "Foldit". The content of this puzzle game is to let users fold proteins based on their intuition and then get points according to certain rules. The results are very gratifying. After a monkey HIV-related protein that has troubled biologists for 15 years was uploaded to the game as a puzzle, players successfully cracked its most likely folding method in just 10 days. Figure: Monkey HIV-related proteins Since 1994, an international protein structure prediction competition called CASP has been born. It is held every two years, with more and more participating teams and technology giants from all over the world. In this competition, the referee will score the structure predicted by each group out of 100. In the 14th competition that ended in December 2020, a shocking news came: the artificial intelligence team of Google, which once developed the famous Go program AlphaGo, won the championship with the AlphaFold program they developed and scored 92.4 points. It was also the champion in the last session, but its score was less than 60 points. This speed of progress is too shocking. The protein structure predicted by AlphaFold is very close to the results of real experiments. Humans are only one step away from computers conquering the protein folding problem. China should enter By now, you should have a preliminary understanding of "biocomputing". I don't know if you noticed that after talking so much in the previous article, China was not mentioned once. This is such an important scientific undertaking, but it has been basically played by foreigners in the past few decades, and we Chinese have nothing to do with it. This really makes me feel distressed. For the future development of new drugs, vaccines, precision medicine and other biomedical technologies, I can almost say with certainty: whoever has biocomputing will win the world. The traditional test tube plus electron microscope research and development model will eventually be replaced by AI. This scientific research undertaking should be elevated to the level of national strategy. Source 1. https://en.wikipedia.org/wiki/IBM_Blue_Gene Source: Science Voice Author: Wang Jie |
Professional Douyin and Kuaishou likes-boosting p...
In the experiment, The fuselage and wing structur...
The Beijing Winter Olympics are getting closer! T...
Review expert: Lu Binshen, Department of Oral and...
In the field of smart phones, iPhone user loyalty...
Douyin information flow ads refer to the ads that...
French linguist Champollion systematically studie...
1. Take a holistic view and use data to understan...
Brewing a cup of coffee in the morning to refresh...
Baidu Mini Programs has been developing rapidly r...
The purpose of writing copy is to gain as much us...
Compared with Cordyceps sinensis to nourish the b...
Author: Huang Yanhong, Huang Xianghong, Duan Yuec...
How soon can you make money? After watching the v...
(Image source: pexels) Written by | Ah Xian Revie...