AI successfully predicts tens of millions of "missense mutations", which is expected to solve the problem of human genetics

AI successfully predicts tens of millions of "missense mutations", which is expected to solve the problem of human genetics

Artificial intelligence (AI) promises to solve one of the biggest challenges in human genetics.

Just now, a research team from Google DeepMind created AlphaMissense based on the AlphaFold methodology - by utilizing the protein sequence database and variant structure background, it can identify pathogenic missense mutations and unknown pathogenic genes.

It is reported that AlphaMissense shows superior capabilities compared to many existing similar tools (variant effect predictors or VEPs).

Specifically, AlphaMissense successfully predicted the pathogenicity of 216 million possible single amino acid changes in 19,233 standard human proteins, and obtained predictions for 71 million missense mutations. Subsequently, AlphaMissense successfully predicted 89% of missense mutations, of which 57% may be benign and 32% may be pathogenic .

The related research paper, titled “Accurate proteome-wide missense variant effect prediction with AlphaMissense”, has been published in the authoritative scientific journal Science.

In a concurrent opinion piece, Joseph A. Marsh, Professor of Computational Protein Biology at the University of Edinburgh, and Sarah A. Teichmann, Head of Cell Genetics at the Wellcome Sanger Institute and Research Fellow at the University of Cambridge, commented:

"While this research undoubtedly helps with variant interpretation and prioritization, it is important not to confuse these labels with the specific clinical definitions of these terms, which rely on multiple lines of evidence."

It is worth mentioning that Google DeepMind has made all AlphaMissense predictions available to the research community for free and has open-sourced the code of the AlphaMissense model .

Successfully predicted 89% of missense mutations

Missense variants are genetic variants that can change the amino acid sequence of a protein . Pathogenic missense variants can severely disrupt protein function and reduce the fitness of an organism, while benign missense variants have limited effects.

Of the more than 4 million observed missense variants, only approximately 2% have been clinically classified as pathogenic or benign, and classifying the remaining unknown variants is a major challenge in human genetics. The lack of accurate missense variant function prediction limits the diagnostic rate of rare diseases and the development and application of clinical treatments targeting the underlying genetic causes.

Although multiplexed analysis of variant effects (MAVEs) systematically measure the effects of protein variants and can accurately predict the clinical outcome of variants, MAVEs experiments are costly and labor-intensive, and thus proteome-wide investigations of variant pathogenicity remain incomplete.

Machine learning methods can close this variant interpretation gap by leveraging patterns in biological data to predict the pathogenicity of unannotated variants . The success of AlphaFold has demonstrated that it is possible to predict large-scale, high-precision protein structures using protein sequences as input, and such protein structure models can serve as the basis for understanding other aspects of protein biology, such as variant pathogenicity.

In this study, AlphaMissense combined three elements of existing strategies with the help of AlphaFold’s methodology :

1) Weakly labeled training based on population frequency data avoids the use of manual annotations, thus avoiding circularity;

2) learning amino acid distributions conditioned on sequence context by using an unsupervised protein language modeling task;

3) Integrate context by using a system derived from AlphaFold.

According to the paper, the training of AlphaMissense is divided into two stages : structural pre-training and variant fine-tuning. The pre-training stage is the same as described in AlphaFold, but with a higher weight added to the masked multiple sequence alignment reconstruction loss; during fine-tuning, the model is optimized to simultaneously predict the pathogenicity of the variant and the structure of the reference sequence.

Previous studies have shown that benign training variants are based on variants frequently observed in humans and other primate species, which are defined according to the PrimateAI method, while pathogenic training variants are sampled from variants that have never been observed in the human population, with sampling weights depending on the trinucleotide context and gene.

AlphaMissense does not predict the effect of a mutation on protein structure or other effects on protein stability. Instead, it uses a database of related protein sequences and the structural context of the variant to generate a score between 0 and 1 that assesses the probability that the variant is pathogenic . The continuous score allows the user to select a threshold that meets their accuracy requirements for classifying a variant as pathogenic or benign.

AlphaMissense classified 89% of the 71 million possible missense variants as likely pathogenic or likely benign, compared to only 0.1% of variants that had been confirmed by human experts.

AlphaMissense achieves state-of-the-art predictions on a wide range of genetic and experimental benchmarks, without being explicitly trained on such data.

The model also outperformed other computational methods when used to classify variants in ClinVar, a public data archive on the relationship between human variation and disease.

Potential solution to human genetics puzzle

There is no doubt that AlphaMissense's predictions elucidate the molecular impact of variants on protein function, which helps identify pathogenic missense mutations and unknown pathogenic genes, while improving the diagnosis rate of rare genetic diseases. In addition, AlphaMissense will also promote the further development of specialized protein variant effect predictors.

Marsh and Teichmann also pointed out a limitation of AlphaMissense :

The structural component of its predictor currently does not take into account that most proteins assemble into complexes or condensates with diverse tetrameric structures. Mutations in proteins that form complexes may cause disease in ways that may not be obvious when considering only the monomeric structure.

Furthermore, while many disease-associated mutations cause loss of function through protein instability or disruption of complex assembly, in other cases the mutant protein causes disease through dominant-negative or synergistic effects.

It is therefore interesting to observe how AlphaMissense performs on non-loss-of-function variants, which are typically small perturbations to the amino acid and which almost all previously tested variant effect predictors (VEPs) have difficulty accurately predicting.

Ultimately, incorporating information about protein tetramer structures may be possible through algorithms that predict protein complex structures, which is expected to lead to greater improvements in the field of variant effect prediction.

Reference Links:

www.science.org/doi/10.1126/science.adg7492

www.science.org/doi/10.1126/science.adj8672

Author: Yan Yimi

Editor: Academic

<<:  If you experience these symptoms during the autumnal equinox but don’t pay attention to them, be careful that you may be suffering from Sjögren’s syndrome!

>>:  "Pink Killer" wanted poster, AI's ability to read breast X-rays is comparable to that of doctors

Recommend

The Glory and Stupidity of the Three-Star Dynasty

In the past 2016, Samsung was in a quagmire. The ...

The takeaway you eat may have been made last year...

Data from an e-commerce platform once showed that...

Naixue-P8 Million Big Data Architect Phase I

Naixue-P8 Million Big Data Architect Phase I Reso...

Where does the oil on your face come from? I finally found the reason

This article was reviewed by: Xiaobo Zhou, Doctor...

Five secrets of smart credit card cash withdrawal

"Do you want to 'maintain the card' ...

Extremely cold zone, how do these creatures survive in the Arctic?

Produced by: Science Popularization China Author:...

Taking a census of galaxies? China's sky survey helps resolve the Hubble crisis!

The English version of Science China Physics, Mec...

Wow! What a scene!

Loading long image... Source: Xinhuanet Comprehen...

"The most mysterious bird in the world" actually looks like this!

Your browser does not support the video tag Septe...

How to quickly and effectively increase the number of APP users?

In recent years, more and more people have entered...