Machine Learning: Finding the Needle in a Haystack

Machine Learning: Finding the Needle in a Haystack

Fast radio bursts (FRBs) are a type of supernormal radio bursts with instantaneous brightness exceeding hundreds of millions of times that of the sun. They can release extremely high energy in a time a hundred times faster than the blink of your eye.

Schematic diagram of fast radio bursts ▏ Source: Danielle Futselaar

What kind of celestial body does this unprecedented mysterious burst come from? And in what extreme environment are they produced? This has aroused the interest of many astronomers. After more than ten years of research, scientists have made considerable observational breakthroughs and brought many surprises. However, the origin and explosion mechanism of fast radio bursts are still unsolved mysteries that need to be revealed urgently.

"Census"

There are many ways to study fast radio bursts, but first you have to find them, the more the better! Fast radio bursts are a hundred times faster than the blink of an eye, so high-time resolution observations are key. Australia's 64-meter radio telescope Parkes has made great contributions, and its advantage is that it has been conducting high-time resolution multi-beam surveys for 25 years. The world's first fast radio burst was discovered by Parkes in 2007.

Parkes Telescope overlooking▏Image source: aussietowns.com

But observation alone is not enough. Because fast radio bursts are so far away, the signal strength they transmit to Earth is much weaker than the radiation from Bluetooth headphones. It is very difficult to pick them out from the background noise of the instrument and the electromagnetic interference created by humans. This is why radio observations lasted for so many years, and the first fast radio burst was not discovered until 2007.

So are there any undiscovered fast radio bursts in Parkes' historical data?

A joint research team led by researchers from the Purple Mountain Observatory decided to "check the household registration" of these historical data. The hard work paid off, and two new fast radio bursts were found in the Parkes data from 1997 to 2001, which was considered a first victory. But what about the astonishing number of 560 million suspected signals that followed?

Among these suspected signals, there are a lot of noise and artificial signals. The brighter real fast radio bursts are picked out first because of their high confidence. But the remaining signals can only be identified by experienced astronomers with the naked eye according to traditional methods. Even if 30,000 images can be viewed every day, it will take 50 years to read all 560 million suspected signals without rest, which is obviously an impossible task!

Artificial signals (top) and fast radio bursts (bottom) appear as similar bright streaks on time-frequency graphs. ▏Image source: Author/top; Nature/bottom

Astronomy's Big Data Challenge

Such a vexing problem is not an isolated case in astronomy. With the advancement of observation technology, modern astronomy faces the challenge of how to deal with big data.

Parkes's record in 1997 was still very low in accuracy, but the daily observation data was still about 10GB. Now, advanced radio telescopes such as China's Sky Eye (FAST) can produce TB of data in one hour, requiring high-performance servers to store and process data. In the future, as Moore's Law fails, this seems to be a game that astronomers are destined to lose.

Large servers needed for astronomical data processing. Image source: must.edu.mo

Eliminate the dross and retain the essence

However, not all data is valuable enough. In the face of big data challenges, we need to build a set of methods to filter and record truly valuable data.

The first step is to establish a single pulse database based on the 560 million suspected signals, including information such as arrival time and observation parameters, to provide a basis for future observations, comparisons and research.

This database is like a gold mine waiting to be mined, and a data analysis method is urgently needed to extract the "real gold" from it.

Machine Learning

Standing at the new starting point of the Parkes single-pulse database, the next step is to introduce machine learning, a method of realizing artificial intelligence (AI) that has developed rapidly in recent years.

Nowadays, face recognition is often needed in daily life, which uses machine learning methods. The work of astronomers is equivalent to training a machine that can pick out a few cats from a group of more than 500 million dogs, but fast radio bursts are more difficult than cats and dogs in terms of morphology and collection of training samples.

Jokes about artificial intelligence ▏Source: hornydragon.blogspot.com

The researchers used an image recognition algorithm called residual neural network to reduce the image size by downsampling the image rate, and also screened the suspected signal arrival time and dispersion, thus greatly reducing the number of images that needed to be checked. Finally, using the trained model, 81 new fast radio burst candidates were found from the database.

These 81 candidates are similar to previous FRBs in many features, such as having similar energy and burst width distributions, which strongly supports their authenticity.

At the same time, the newly discovered candidate samples have a higher proportion in the low-energy end compared to the existing fast radio burst samples. This means that the previous search methods may have missed a large number of low-energy events, reminding people to pay more attention to the search for low signal-to-noise ratio signals.

Comparison of 81 new candidates with previous fast radio bursts ▏ Image source: Author

This series of work has made a good attempt to solve the big data challenges faced by the field of astronomical transient sources, and also provided a solution for the Square Kilometer Array (SKA) under construction to efficiently intercept valuable data. In the future, both FAST and SKA will achieve higher sensitivity and bring more amazing data volumes. They place extremely high demands on signal screening. The joint research team led by researchers from Purple Mountain Observatory plans to continue developing transient source data processing procedures that can be applied to these advanced devices to discover more valuable signals.

SKA conceptual diagram ▏ Source: Wikipedia

References:

[1] 81 New Candidate Fast Radio Bursts in Parkes Archive. Yang et al. 2021, MNRAS stab2275

[2] Parkes Transient Events. I. Database of Single Pulses, Initial Results, and Missing Fast Radio Bursts. Zhang et al. 2020, APJS 249 14

About the Author

Tang Zhenfan

PhD student in the High Energy Time Domain Astronomy Group of Purple Mountain Observatory, Chinese Academy of Sciences. Research direction: observation and data processing of radio transient sources.

Zhang Songbo

Postdoctoral fellow at the China-Australia Joint Astronomical Research Center (ACAMAR). Research areas: observation of radio transients, data processing and theoretical research.

Rotating Editor-in-Chief: Du Fujun

Editor: Wang Kechao

<<:  Why did Einstein make two major mistakes in quantum mechanics and astronomy?

>>:  Is it necessary to build giant starships so that humans can fly out of the solar system and into deep space?

Recommend

Electric vehicle consumption still needs to get rid of "range anxiety"

Recently, a press conference for the "Listen...

First-line traders practice huge Qianchuan tutorial from 0 to 1

First of all, let us understand what is Juliang Q...

Revealed: The 10 most popular programming languages ​​and their creators

The creators of these programming languages ​​are...

The biggest bottleneck of mobile devices: How to break through the battery?

ReadWrite, a US technology blog, recently wrote an...

Why do you always feel that you are not good-looking?

This article was reviewed by Zhao Wei, MD, associ...

Oracle announces Java 7 end of life

Oracle stopped releasing Java 7 security patches ...

20 industry truths about KOL, anchors, and live streaming!

Discussions about KOLs have been popular twice in...

Talk about the hidden rules of App operation

First, let’s deconstruct the position of App Oper...