After 20 years of gene sequencing, we finally figured out what junk DNA is for

After 20 years of gene sequencing, we finally figured out what junk DNA is for

In 1990, the international human genome project was launched. By 2003, most human genomes had been sequenced. People were surprised to find that human genes were not a complete chain of information, but were fragmented by many sequences that could not encode genetic information. These DNAs that could not encode genetic information were called "junk" at the time. Why did nature put so much junk in human genes? Over the past 20 years, with the efforts of scientists, the truth has gradually surfaced: these junk DNAs have their own functions, and a very important type of them is called "introns".

Written by Yubao (PhD, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences)

Discovery of introns

Like father, like son, like daughter. Heredity is a phenomenon that can be seen everywhere in our lives. Scientists have long speculated that there must be some substance that can pass on the characteristics of the previous generation to the next generation. In the middle of the 19th century, Austrian scientist Gregor Johann Mendel imagined an independent genetic unit called "genetic factor" based on the results of his many years of plant hybridization experiments, and believed that all the characteristics of organisms are passed on through genetic factors. In 1903, American biologist Walter Sutton and German biologist Theodor Heinrich Boveri proposed that genetic factors are located on chromosomes, and chromosomes are the carriers of genetic material. In 1909, Danish geneticist Wilhelm Johannsen proposed the concept of "Gene" to replace the "genetic factor" assumed by Mendel. Since then, the term "gene" has been used in genetics to this day.

Johnson believed that there should be a chemical entity behind the "gene". People believed that as long as the structure of the gene was understood, it would be easy to explain how the gene encodes genetic information and how the genetic information is passed on. Before the 1950s, the structure of the gene was not clear. It was not until 1953 that the American molecular biologist James Watson and the British biologist Francis Crick discovered the double helix structure of DNA, and this problem was solved. However, scientists have proposed many theories about the encoding method of genes. For example, "one gene, one enzyme (protein)" was a theory popular in the 1940s. Later, people found that this theory encountered more and more exceptions: many genes function as RNA entities, or several genes encode one protein, or one gene encodes several proteins. Therefore, the definition of "gene" has become more and more complicated.

In 1977, American scientist Philip Sharp and British scientist Richard Roberts independently discovered introns using electron microscopy while studying adenovirus genetics [1, 2] and proposed the "split gene theory", for which they were awarded the 1993 Nobel Prize in Physiology or Medicine. Electron microscopy has played an important role in the discovery of introns. Its resolution can see nanoscale DNA or RNA molecules. However, the name of introns was given by someone else. In a short article in 1978, American scientist Walter Gilbert proposed using "intron" to refer to the non-coding sequence in the mRNA precursor. mRNA is the template for the "translation" of genes from DNA sequences into protein sequences.

In 1980, Gilbert shared the Nobel Prize in Chemistry with Frederick Sanger and Paul Berg for the invention of gene sequencing technology.

The split gene theory holds that the gene sequence in the genome of eukaryotic organisms is discontinuous, and contains a large number of non-coding sequences between the coding regions of the genes, thus interrupting the amino acid sequence of the corresponding protein. Introns generally refer to DNA sequences in eukaryotic genes that do not encode proteins and are cut off during the mRNA processing. This cutting reaction is completed by the "spliceosome"; the structure of the spliceosome is very complex and consists of more than 100 "parts".

Figure 1 Schematic diagram of intron shearing during transcription. During gene transcription in eukaryotic cells, the "spliceosome" removes introns and combines exons (green) together to form mature mRNA. Image source: Li Hongbin et al.

Function of introns

The biggest difference between protein-coding genes in eukaryotic cells and prokaryotic cells is that the former have introns while the latter do not. Usually, the length of introns is much longer than the exon sequence encoding proteins. The existence of introns causes eukaryotic cells to consume a lot of substances and energy during generation and gene expression, which undoubtedly increases the survival burden of the body. So, what is the use of such a long non-coding segment embedded in the gene?

In the 20 years after the discovery of introns, little research was done on their origin and function. In fact, until the beginning of the 21st century, when the draft of the human genome was just completed, there was a popular saying: "95% of the sequences in the human genome are junk DNA." I believe some readers remember this saying at the time. Of course, the junk sequences that people talked about at that time included introns. With the continuous efforts of scientific researchers, the saying of "junk DNA" has been gradually overturned, and the important functions of introns have gradually become clear.

A series of studies have found that introns help maintain gene stability and participate in gene expression and regulation. Specifically, introns and exons in genes produce different exon combinations through alternative splicing, which in turn translate into a variety of proteins, increasing the complexity of the proteome; regulatory elements such as enhancers (sequences) in introns can regulate the transcription efficiency of genes; and some RNA elements in introns can also prevent premature termination of transcription.

Early studies have found that introns can maintain the stability of DNA sequences during gene transcription: preventing genes from forming "R-loops" during transcription. The so-called R-loop, as the name suggests, is an "R"-shaped structure. It refers to the structure of an RNA-DNA hybrid chain formed by the transcribed RNA chain and one of the opened double-stranded DNA strands undergoing base complementary pairing. At the same time, the other unpaired DNA strand is in a free state (see Figure 2). The presence of introns can reduce the formation of R-loops and maintain the stability of genomic DNA. However, R-loops are not all "bad". Later, people discovered that R-loops in cells also have biological functions - they can regulate gene expression, such as transcription initiation and elongation, epigenetic regulation, etc. In addition, the disorder of R-loops is also related to DNA damage, genomic instability, and high-frequency gene recombination.

Figure 2. Two ways of forming an "R-loop" during gene transcription. Image source: Zhang Yiyun et al.

Introns have many other functions. A few years ago, the Elela team at the University of Sherbrooke in Canada and the Bartel team at the Massachusetts Institute of Technology in the United States simultaneously published two papers [4, 5] showing that introns can help the body cope with the stress of nutrient deficiency and enable it to “stand up to starvation”.

Elela's team knocked out more than 200 introns of brewer's yeast one by one to see if it would affect the yeast's ability to survive. Through sequencing and corresponding phenotypic analysis, the researchers found that introns have the function of regulating yeast's adaptation to nutrient deprivation (starvation). Bartel's team found that 34 introns in yeast have always existed in cells, in full-length or linear form after shearing. They are regulated by the classic TOR metabolic pathway and can slow down the growth rate of yeast when nutrients are scarce, thereby improving the yeast's adaptability and survival rate. These introns function to cope with adversity and have nothing to do with the function of the genes in which they are located. Introns are related to the life and death of the organism, so it is understandable that they have been preserved during its biological evolution.

Introns can be divided into four categories: type I introns, type II introns, spliceosomal introns, and tRNA introns. Among them, introns in the general sense are spliceosomal introns, which, as the name suggests, are introns with their own spliceosomes, and the three-dimensional protein structure of their "spliceosomes" has been analyzed. The splicing reaction that generates mRNA is very precise and has a very low error rate - you know, if the frameshift is misplaced by one base, it will cause abnormalities in the subsequent transcription process, and the protein will not be generated or the wrong protein will be generated.

Type I introns exist in bacteria, bacteriophages, protists, and fungi and are capable of self-splicing. Type II introns exist in bacteria and organelle genomes and are also capable of self-splicing, but the mechanism is different from that of type 1 introns and is similar to the spliceosomal intron splicing mechanism. tRNA introns exist in eukaryotic cells and archaea, and the splicing process requires endonucleases and ATP.

The mechanism of intron production

How do introns appear in eukaryotic cells?

Regarding the mechanism of intron production, the most popular explanation is the "introner theory" [6] , which can explain the origin of introns in the spliceosome. Introners can be regarded as "parasites" in the genome, which "produce" a large number of introns in the genome by "copying" and "pasting". In 2009, scientists discovered introners in micromonas, and subsequently found traces of them in dinoflagellates, some fungi, and urochordates.

Many studies by scientists have shown that this “copy” and “paste” process can be repeated on a large scale throughout the genome: throughout the process of biological evolution, introners have continuously produced introns in different eukaryotic organisms. For example, in the past 100,000 years, most introns in fungal genomes were introduced by introners**[7]**.

Figure 3. How does Introner "make" introns? Introner inserts the intron sequence into the genome, thereby "splitting" the original DNA sequence and generating new exons. Image source: Merrill Sherman

The study found that in some species, the sequences produced by introners have strong similarities with DNA transposons, such as in the algae Polarella glacialis and Micromonas. DNA transposons represent a larger family of genetic elements, also known as transposable elements or "jumping genes"; transposons can copy their own sequences in large quantities and insert them into the genome. This similarity between introners and transposons suggests that some introns may come from transposons. Introns produced by the introner mechanism often appear in large numbers in the genome in a short period of time, with strong randomness, which can explain why introns are not evenly distributed in the genome of eukaryotic organisms.

However, introners have only been found in some species so far. For example, the emergence of introns seems to be more common in aquatic organisms: introns are more than six times more likely to appear in the genomes of aquatic organisms than in the genomes of terrestrial organisms. In addition, nearly three-quarters of aquatic species containing introns have genomes with multiple introns with similar sequences. This phenomenon of sequence similarity is actually horizontal gene transfer, that is, the transfer of gene sequences from one species to another. This form of gene transfer often occurs in aquatic environments or in species symbiosis, such as between hosts and parasites.

Aquatic environments facilitate horizontal gene transfer because various genetic materials can flow freely in aqueous media. Single-celled organisms can easily absorb or fuse foreign DNA in water; more complex multicellular organisms lay eggs or fertilize in water and have the opportunity to come into contact with foreign DNA or RNA. Studies have found that nearly 1,000 gene horizontal transfer or intron insertion events have occurred in the genomes of nearly 300 bony fish**[8]**. In contrast, the frequency of horizontal gene transfer between terrestrial organisms is much lower.

The significance of introns to biological evolution

As eukaryotes, mammals have more and longer introns than yeast. For example, the length of human intron sequences accounts for about 25% of the genome, and each gene has an average of about 9 introns, which helps genes achieve complex and diverse functions. The length of introns in human mRNA precursors varies greatly, ranging from 50 bases to millions of bases.

The distribution of introns between and within species is also uneven. In the same gene of different individuals of the same species, some have introns and some do not; the length, number and location of introns in the same gene of different species are different. For example, the introns of the two homologous genes, Sccoxl.2b and Ancoxl.3, have 70% identical sequences, but the order of the exons next to the introns is very different, which may be the result of the transfer of introns in different species.

The existence of introns needs to be guaranteed by a corresponding mechanism. Eukaryotic cells have nuclear membranes, which provide a basis for the spatial separation of gene transcription and translation processes. At the same time, a large number of mitochondria in cells can provide energy, so the existence of introns has its material basis. Prokaryotes, on the other hand, do not have nuclear membrane structures, and transcription and translation are synchronized, so prokaryotes do not need introns to maintain the stability of DNA sequences.

Scientists believe that introns help drive the evolution of gene families or species. The genome combines exons and introns to create new mutations through the mechanism of alternative splicing, generating new regulatory patterns or functional modules (enzymes, proteins, pathways, etc.). For example, species that can produce toxins usually need to quickly combine at the genetic level to generate new venom (complex peptide mixtures) to adapt to different prey or deal with natural enemies. The animal's immune system needs to quickly rearrange MHC genes and continuously produce new antibodies or antigen-presenting cells to cope with the changing antigens in the living environment. This rapid evolution mechanism is common in nature, and introns are often involved in these mechanisms.

References

[1] Berget SM et al. Spliced ​​segments at the 5' terminus of adenovirus 2 late mRNA. PNAS. 1977, 74 (8): 3171–3175.

[2] Chow LT, et al. "An amazing sequence arrangement at the 5' ends of adenovirus 2 messenger RNA". Cell. 1977, 12 (1): 1–8.

[3] Gilbert W. Why genes in pieces. Nature. 1978, 271 (5645): 501.

[4] Elela AS et al. Introns are mediators of cell response to starvation. Nature. 2019, 565(7741): 612-617.

[5] Bartel DP. Excised linear introns regulate growth in yeast. Nature 2019, 565(7741): 606-611

[6] AZ Worden, et al. Green evolution and dynamic adaptations revealed by genomes of the marine picoeukaryotes Micromonas. Science, 2009, 324 (5924), 268-272

[7] Ate van der Burgt et al. Birth of New Spliceosomal Introns in Fungi by Multiplication of Introner-like Elements. Current Biology, 2012: 22(13), 1260-1265

[8] Zhang HH et al. Horizontal transfer and evolution of transposable elements in vertebrates. Nat Commun. 2020, 11(1):1362.

This article is supported by the Science Popularization China Starry Sky Project

Produced by: China Association for Science and Technology Department of Science Popularization

Producer: China Science and Technology Press Co., Ltd., Beijing Zhongke Xinghe Culture Media Co., Ltd.

Special Tips

1. Go to the "Featured Column" at the bottom of the menu of the "Fanpu" WeChat public account to read a series of popular science articles on different topics.

2. Fanpu provides a function to search articles by month. Follow the official account and reply with the four-digit year + month, such as "1903", to get the article index for March 2019, and so on.

Copyright statement: Personal forwarding is welcome. Any form of media or organization is not allowed to reprint or excerpt without authorization. For reprint authorization, please contact the backstage of the "Fanpu" WeChat public account.

<<:  What pitfalls might a “star” fall into on his journey?

>>:  No need to ask the "melon photographer" for help, just look here to know whether the watermelon is sweet or not!

Recommend

How to use the media to create internet celebrity products?

Every company wants to build its own internet cel...

Put iOS 15 into the browser? This little tool is crazy good

Recently, when Shichao was surfing the Internet, ...

Amazing! Intestinal flora can actually metabolize nicotine?

Author: Zhao Bei Smoking is harmful to health, bu...

Apple's new iMac exposed: using USB-C standard

Cook just said there will be a better desktop Mac...

6 key points for social operation of mobile games

According to CNNIC's "35th Statistical R...

Apple's Big Three Era

[[132213]] On April 10, Apple officially launched...

How to prevent violations during live broadcasts?

Live streaming , using direct narration instead o...

The new version of iMessage is an invisible browser in iOS 10

Just as public market investors are very optimist...