The first draft of the human "pan-genome" has been released. What is its significance?

The first draft of the human "pan-genome" has been released. What is its significance?

Did you know that it has been 20 years since the first draft of the human genome sequence was released? In the human genome sequence published 20 years ago, most of the data came from a white volunteer and a mixture of the genomes of at least three other people, of which this white person accounted for more than 70% of the data.

As for why this happened, it was probably because sequencing a person's genome was too expensive at the time. According to the final statistics, scientists from six countries spent a total of $4.2 billion to complete the first draft of the human genome. Scientists at the time may have hoped to obtain as much human genetic information as possible through a genome project, so they mixed the DNA of different people; at the same time, due to technical constraints, the production of physical maps and other technologies at the time required a large amount of DNA. If all of it came from one person, I'm afraid a lot of blood would have to be drawn. Although it came from at least four people, this genome map appeared in the form of "one person."

In the past 20 years, tens of thousands of studies on human genes have been based on the genome sequence completed by the Human Genome Project. However, this reference genome still has many problems. For example, the genome technology was backward at the time, and the genome we saw was not complete for each chromosome, but had many "holes", especially in the parts with many repetitive sequences, such as telomeres and centromeres. When it was published in 2003, the genome was actually only 92% complete, and it took scientists another 20 years to complete the remaining part; second, although the genomes of different human individuals are on average more than 99.6% identical, the 0.4% difference has caused the diversity of humans, such as our hair color, height, skin color, etc. are all different, which is determined by that 0.4%. However, these characteristics cannot be fully described in the sequence map completed by the Human Genome Project because it only represents the genes of "one person".

In the past 20 years, with the advancement of technology and the continuous efforts of scientists, in 2022 scientists published the completed map of the human genome project from telomere to telomere, filling almost all the "holes" left by the human genome project. We really saw the complete "one" human genome map. The regret that the human genome has many "holes" mentioned above was completed. And in early May 2023, four articles published in Nature and Nature Biotechnology pushed the human genome into the "pan-genome" era, that is, the era of everyone's genetic characteristics. Today, let's talk about the latest series of progress.

First of all, what is a pan-genome? Pan-genome refers to the sum of all genomic information within a species, which covers more genetic diversity than a single reference genome. The most complete pan-genome is the sum of the genes of all individuals in this species.

Several articles published in journals such as Nature and Nature Biotechnology include: "Draft of the human pan-genome reference sequence" published in Nature Biotechnology; "Increased mutations and gene conversion in human fragment amplifications", "Recombination between acrocentric centromeres of human heterologous chromosomes", and "Using the Minigraph-Cactus alignment tool to construct a human pan-genome map" in Nature magazine.

Let us summarize the research results of these four studies: First, this pan-genome draft was obtained by analyzing independent and complete personal genome data from 47 independent individuals from different sources. Compared with the currently widely used human genome reference sequence GRCh38 version, the draft added 119 million base pairs (referring to two complementary paired bases in the DNA double helix structure) and 1,115 gene duplications.

The picture comes from Tuchong.com

Compared with GRCh38, this draft can detect 104% more genes with structural variations. It makes up for the 210 Mb (megabase) DNA sequence fragment in GRCh38, of which 151 Mb was completely unknown before, and 59 Mb was a predicted sequence obtained by previous computer simulation. This missing situation will cause data bias in related studies, which also means that there are still many areas in the human gene map that we don't know about. It still needs to be improved.

Second, scientists developed a single nucleotide variant (SNV) map that contains millions of previously uncharacterized SNVs, and a new pan-genome map describes the variable nature of some genomic regions that have segmental repeat sequences and share highly identical DNA sequences repeated at one or more sites in the genome. The presence of such repeat sequences may lead to genomic variation, which in turn affects an individual's phenotypic traits and risk of disease.

Third, using the human pan-genome draft, scientists observed the recombination pattern between the short arms of heterologous centromere chromosomes and a certain DNA exchange mechanism, which indicates that there is a DNA exchange method between chromosomes that was previously speculated but not confirmed.

Fourth, researchers used the human pan-genome draft to improve the accuracy of the pan-genome reference genome. In this study, scientists demonstrated the process of "Minigraph-Cactus pan-genome analysis", which can create a pan-genome directly from whole genome alignments, and it can also process cross-species genome data between humans and fruit flies. This provides more comprehensive information for a better understanding of genomic variation between species and individuals in the future.

Of course, the research results achieved this time are only a transitional stage in the development of human pan-genome research. The entire plan aims to observe and describe the genetic diversity of 350 individuals. What has been completed now is only a small part of it. The researchers plan to complete the genome sequencing of these 350 people by mid-2024.

Finally, from the Human Genome Project to the current Pan-Genome Project, we have seen the contribution of the Chinese, and the proportion of contribution has gradually increased since BGI represented China in the Human Genome Project with 1%. This time, we saw that among the four articles, the corresponding author of two of them is Dr. Li Heng from China, who is also a great figure in the field of genome research. There are also many Chinese names on the list of authors.

We also hope that in the future, there will be more Chinese voices and more contributions from China in the field of human genome research.

This article is a work supported by Science Popularization China Starry Sky Project

Author: Tiangeng

Reviewer: Tao Ning (Associate Researcher, Institute of Biophysics, Chinese Academy of Sciences)

Produced by: China Association for Science and Technology Department of Science Popularization

Producer: China Science and Technology Press Co., Ltd., Beijing Zhongke Xinghe Culture Media Co., Ltd.

<<:  Is the freezer a "safe" for food? Here comes the July 2023 scientific rumor list →

>>:  It is said online that "the Forbidden City has lasted for 600 years and has never been flooded." Did the ancients really have advanced drainage technology?

Recommend

GSMA: Benefits of 5G deployment in Russia

GSMA releases new report 'The benefits of 5G ...

How to write an awesome soft article title? Here are 19 routines and cases

Introduction: It is said that to see whether a pe...

Teach you how to quickly find shortcuts for APP operation and promotion

The eight golden rules introduced in this article...

Random draw, an addictive way to motivate users!

Why are we addicted to TikTok? Because we never k...

QQ is really tough! Updated 4 new features to target primary school students

Today, QQ on PC and QQ on mobile suddenly had an ...

How to operate a mini program? It is easy for Xiaobai to get

For those who continue to pay attention to mini p...

Disappeared for more than 70 years! Reappeared again!

The reporter learned from the Guangxi Institute o...

Butterfly: When you see me, you will probably think of "those flowers"

I am Dong Dong Meow Talking animals are so fun! T...