Did you know that it has been 20 years since the first draft of the human genome sequence was released? In the human genome sequence published 20 years ago, most of the data came from a white volunteer and a mixture of the genomes of at least three other people, of which this white person accounted for more than 70% of the data. As for why this happened, it was probably because sequencing a person's genome was too expensive at the time. According to the final statistics, scientists from six countries spent a total of $4.2 billion to complete the first draft of the human genome. Scientists at the time may have hoped to obtain as much human genetic information as possible through a genome project, so they mixed the DNA of different people; at the same time, due to technical constraints, the production of physical maps and other technologies at the time required a large amount of DNA. If all of it came from one person, I'm afraid a lot of blood would have to be drawn. Although it came from at least four people, this genome map appeared in the form of "one person." In the past 20 years, tens of thousands of studies on human genes have been based on the genome sequence completed by the Human Genome Project. However, this reference genome still has many problems. For example, the genome technology was backward at the time, and the genome we saw was not complete for each chromosome, but had many "holes", especially in the parts with many repetitive sequences, such as telomeres and centromeres. When it was published in 2003, the genome was actually only 92% complete, and it took scientists another 20 years to complete the remaining part; second, although the genomes of different human individuals are on average more than 99.6% identical, the 0.4% difference has caused the diversity of humans, such as our hair color, height, skin color, etc. are all different, which is determined by that 0.4%. However, these characteristics cannot be fully described in the sequence map completed by the Human Genome Project because it only represents the genes of "one person". In the past 20 years, with the advancement of technology and the continuous efforts of scientists, in 2022 scientists published the completed map of the human genome project from telomere to telomere, filling almost all the "holes" left by the human genome project. We really saw the complete "one" human genome map. The regret that the human genome has many "holes" mentioned above was completed. And in early May 2023, four articles published in Nature and Nature Biotechnology pushed the human genome into the "pan-genome" era, that is, the era of everyone's genetic characteristics. Today, let's talk about the latest series of progress. First of all, what is a pan-genome? Pan-genome refers to the sum of all genomic information within a species, which covers more genetic diversity than a single reference genome. The most complete pan-genome is the sum of the genes of all individuals in this species. Several articles published in journals such as Nature and Nature Biotechnology include: "Draft of the human pan-genome reference sequence" published in Nature Biotechnology; "Increased mutations and gene conversion in human fragment amplifications", "Recombination between acrocentric centromeres of human heterologous chromosomes", and "Using the Minigraph-Cactus alignment tool to construct a human pan-genome map" in Nature magazine. Let us summarize the research results of these four studies: First, this pan-genome draft was obtained by analyzing independent and complete personal genome data from 47 independent individuals from different sources. Compared with the currently widely used human genome reference sequence GRCh38 version, the draft added 119 million base pairs (referring to two complementary paired bases in the DNA double helix structure) and 1,115 gene duplications. The picture comes from Tuchong.com Compared with GRCh38, this draft can detect 104% more genes with structural variations. It makes up for the 210 Mb (megabase) DNA sequence fragment in GRCh38, of which 151 Mb was completely unknown before, and 59 Mb was a predicted sequence obtained by previous computer simulation. This missing situation will cause data bias in related studies, which also means that there are still many areas in the human gene map that we don't know about. It still needs to be improved. Second, scientists developed a single nucleotide variant (SNV) map that contains millions of previously uncharacterized SNVs, and a new pan-genome map describes the variable nature of some genomic regions that have segmental repeat sequences and share highly identical DNA sequences repeated at one or more sites in the genome. The presence of such repeat sequences may lead to genomic variation, which in turn affects an individual's phenotypic traits and risk of disease. Third, using the human pan-genome draft, scientists observed the recombination pattern between the short arms of heterologous centromere chromosomes and a certain DNA exchange mechanism, which indicates that there is a DNA exchange method between chromosomes that was previously speculated but not confirmed. Fourth, researchers used the human pan-genome draft to improve the accuracy of the pan-genome reference genome. In this study, scientists demonstrated the process of "Minigraph-Cactus pan-genome analysis", which can create a pan-genome directly from whole genome alignments, and it can also process cross-species genome data between humans and fruit flies. This provides more comprehensive information for a better understanding of genomic variation between species and individuals in the future. Of course, the research results achieved this time are only a transitional stage in the development of human pan-genome research. The entire plan aims to observe and describe the genetic diversity of 350 individuals. What has been completed now is only a small part of it. The researchers plan to complete the genome sequencing of these 350 people by mid-2024. Finally, from the Human Genome Project to the current Pan-Genome Project, we have seen the contribution of the Chinese, and the proportion of contribution has gradually increased since BGI represented China in the Human Genome Project with 1%. This time, we saw that among the four articles, the corresponding author of two of them is Dr. Li Heng from China, who is also a great figure in the field of genome research. There are also many Chinese names on the list of authors. We also hope that in the future, there will be more Chinese voices and more contributions from China in the field of human genome research. This article is a work supported by Science Popularization China Starry Sky Project Author: Tiangeng Reviewer: Tao Ning (Associate Researcher, Institute of Biophysics, Chinese Academy of Sciences) Produced by: China Association for Science and Technology Department of Science Popularization Producer: China Science and Technology Press Co., Ltd., Beijing Zhongke Xinghe Culture Media Co., Ltd. |
<<: Is the freezer a "safe" for food? Here comes the July 2023 scientific rumor list →
GSMA releases new report 'The benefits of 5G ...
Good news, good news. Just hold on for another we...
During the COVID-19 pandemic, a large number of f...
Automatically "reset permissions for unused ...
Introduction: It is said that to see whether a pe...
At the Shanghai epidemic prevention and control p...
Zhihu's "I Want to Make Money: Be a Prom...
The eight golden rules introduced in this article...
Why are we addicted to TikTok? Because we never k...
Today, QQ on PC and QQ on mobile suddenly had an ...
For those who continue to pay attention to mini p...
Since being introduced into China in 2014, Jiangx...
Wuzhizhou Island is located on the coast of the S...
The reporter learned from the Guangxi Institute o...
I am Dong Dong Meow Talking animals are so fun! T...