Preface Neurological diseases such as stroke and dementia are one of the main causes of illness and disability. According to the World Health Organization (WHO), more than one-third of the world's population suffers from such diseases. Among them, neurodegenerative diseases are a type of chronic and common disease that seriously threatens human health and quality of life. A deeper understanding of the structure and operation of proteins will provide an important basis for us to solve these diseases. As early as the 1950s, the exploration of protein folding began. The emergence of AlphaFold has completely changed the paradigm of scientists' research on protein folding. Today, AI for protein sciences has made new breakthroughs—— Recently, a research team from the University of Copenhagen, St. Jude Children's Research Hospital and Illinois Institute of Technology introduced a general algorithm for designing protein variants with specific structural properties, expanding the study of proteins to the field of intrinsically disordered proteins (IDPs). IDPs are proteins that fail to fold into a stable or ordered three-dimensional structure and are considered to be of great biological significance both in healthy systems and in the pathophysiology of various diseases. Unlike folded proteins, IDPs are characterized by high disorder, local mobility, and high dynamics, making them particularly challenging for existing prediction tools. This study not only proposed a new design method in theory, but also verified the designed IDPs variants through experiments and used machine learning models to predict the collective properties of IDPs, providing new tools for computational protein design. It may help us understand the pathogenesis of various neurodegenerative diseases (such as Alzheimer's disease, Parkinson's disease, amyotrophic lateral sclerosis) and various types of cancer, as well as promote the discovery of new drugs and the development of biomaterials. The related research paper titled "Design of intrinsically disordered protein variants with diverse structural properties" has been published in the authoritative journal Science Advances. Why should we care about IDPs? A piece of origami is nothing more than pressed wood pulp before it is folded in a specific way; once it is folded, it becomes something new. After a few precise folds and flips, it becomes a paper product that can predict your future - a lucky sign. The same piece of paper, with a few changes in the folding steps, becomes a crane with its wings spread, symbolizing the arrival of good luck. Similarly, a long string of amino acid molecules has no function until it spontaneously folds into its specific shape. Cells make proteins by stringing together small molecules of amino acids into long polypeptide chains. Which amino acids are chosen depends on the set of instructions provided by DNA. Moments after creation, the polypeptide chain precisely bends and folds into the protein's final 3D shape. If proteins cannot complete this folding process extremely efficiently, a cascade of disasters can unfold in the body. Misfolded or unraveled proteins can become toxic and cause cell death. Many diseases and disorders, such as sickle cell anemia, are caused by misfolded proteins. Misfolded proteins can also aggregate into clumps, a hallmark of neurodegenerative diseases such as Alzheimer's and Parkinson's. Therefore, predicting the 3D shape of protein molecules is very important for our understanding or treatment of neurodegenerative diseases. However, the field of structural biology has historically focused on studying proteins and nucleic acids that fold into stable 3D structures, and much of the current understanding of how proteins function in cells is based on the concept of "sequence-structure-function" relationships. Approximately 30% of proteins in eukaryotes do not fold into stable 3D structures. These dynamically deforming proteins are IDPs, or when they are located in the context of other structured protein domains, they are called "intrinsically disordered regions" (IDRs). IDPs and IDRs play various important roles in molecular and cellular functions, challenging the sequence-structure-function paradigm. Dysregulated cellular function of IDPs has been implicated in several neurodegenerative diseases (Alzheimer's disease, Parkinson's disease, amyotrophic lateral sclerosis) and many cancers, and their ability to self-associate to form biomolecular condensates and generate a variety of membraneless organelles in the cellular environment is increasingly recognized to be of great importance in cell biology and disease. To more fully understand biology and human disease, Paul Robustelli, assistant professor at Dartmouth College, stressed in a related feature: "Structural biology must move beyond the study of proteins with stable 3D structures and develop rules that explain how the sequence of IDRs determines the distribution of shapes they adopt in solution and how this distribution determines their function in cells and malfunction in disease." Extending computational protein design to IDPs IDPs have extreme but generally non-random structural heterogeneity and cannot form stable folded structures, so the structure prediction of IDPs is more challenging than that of folded proteins, and their computational design remains limited. To this end, Francesco Pesce and colleagues have addressed this challenge. Building on a previously published computational model called CALVADOS, they designed a general algorithm to generate IDPs with predefined global properties and used it to produce four IDPs with different properties. They also focused on a type of IDPs called A1-LCD and experimentally validated the model's derivation of sequence-ensemble relationships for several variants of A1-LCD. They designed a general algorithm for protein variants with specific structural properties. The algorithm uses coarse-grained simulations and free energy calculations, combined with Monte Carlo sampling methods, to search in sequence space and generate protein sequences with target structural features. The researchers used the algorithm to design a variety of protein variants and experimentally validated them. The results showed that the algorithm can effectively design protein variants with different compactness, long-range interactions and phase separation tendencies. The algorithm searches the sequence space and uses efficient coarse-grained simulations to link each sequence to its conformational properties. The CALVADOS model is used for coarse-grained molecular dynamics (MD) simulations and generates a set of conformations of IDPs. The algorithm samples the sequence space using Monte Carlo simulations (MCMC) and predicts its conformational properties (through MD simulations and free energy calculations). Through the optimization process, specific amino acid arrangements are sought to determine the target structural features. Figure | Overview of the research team’s algorithm for designing IDPs sequences with target conformational properties. The algorithm can design IDPs sequences with specific structural properties, such as compactness, long-range contacts, and phase separation tendencies. In addition, it can explore sequence space and find IDPs sequences with novel conformational features. The research team also used machine learning models to accelerate the algorithm to make it more efficient. In the future, the research team recommends sampling a wider range of sequence space and exploring the combination of MCMC sampling with other methods (such as reinforcement learning and Bayesian optimization) to more effectively explore sequence space. In addition, the authors point out that the combination of machine learning and simulation will be particularly important in designing sequences with more complex structural observables, where simulations may be more expensive and chemical calculations may be less efficient. In addition, the algorithm can be applied to design sequences with other structural features and demonstrates the possibility of designing sequences with target contact maps. AI for Proteins, Keeps Getting Better Since the 1960s, scientists have been studying proteins, relying mainly on traditional techniques such as X-rays and nuclear magnetic resonance (NMR) to analyze their structures. With the deepening of understanding of protein biochemistry and the rapid advancement of computing technology, researchers have begun to turn to computational methods to predict protein structures. In 2016, Xu Jinbo's team pioneered the use of deep residual networks (ResNet) for structure prediction, significantly improving the accuracy of protein residue contact prediction. Based on this achievement, a series of studies combining co-evolution and deep learning algorithms have emerged, such as AlphaFold (focusing on residue distance prediction) and trRosetta (focusing on the introduction of dihedral angle information, etc.) developed by Yang Jianyi and David Baker's team, both of which adopted the ResNet architecture. In 2020, AlphaFold2 made a splash in the CASP14 competition, achieving a prediction accuracy of 98.5%. In 2021, David Baker's team released the open source protein prediction tool RoseTTAFold in Science magazine. The tool uses natural language processing (NLP) technology to directly extract co-evolution information from multiple sequence alignments (MSA), and its prediction accuracy is comparable to AlphaFold2 in CASP14. Since then, pre-trained models based on protein sequences, also known as protein language models (PLMs), have begun to be widely used in protein structure prediction. At the end of 2022, Meta launched ESM-2 and ESMFold, which became one of the largest and most complex protein language models released at the time. In 2024, David Baker's team launched RoseTTAFold All-Atom (RFAA), a new structure prediction method that can accurately depict the 3D coordinates of all atoms in biological units, including proteins, nucleic acids, small molecules, metals, and chemical modifications. In addition to significant progress in the field of protein structure prediction, artificial intelligence (AI) has also continued to make progress in many other areas of protein research, such as predicting the interaction between proteins and other biomolecules, protein design, proteomics, etc. Looking ahead, AI will continue to expand its influence and fill many gaps in the field of protein research. |
Among the 93 species of cetaceans in the world, t...
What are the ways to operate a matrix account? Ho...
Today I want to talk to you about the entire clos...
First of all, why do we need to do customer acqui...
Everyone says that 2018 is a cold winter for the ...
When it comes to red blood cells, readers will de...
Review expert: Peng Guoqiu, deputy chief physicia...
Review expert: Mo Jianchu, Professor of Institute...
In winter, when the temperature is low, will fat ...
When I think of promoting content online, I imagi...
Thanks to the popularity of internet celebrities ...
[[134665]] In recent months, I have used my spare...
It makes sense to integrate IoT and mobile device...
【Case Name】Pinduoduo Money-Saving Monthly Card 【C...
If you stand up suddenly after squatting, you may...