Huh? Music also has "fingerprints"! This is how the song recognition function is implemented →

Everyone may have had this experience.

Hearing a familiar melody

But I just can't remember the name of the song

At this time, turn on the song recognition function

After a few seconds

The corresponding song appears on the screen.

How does this function work?

How can you accurately identify the song title in such a short time?

Audio fingerprints are key to identifying songs

The key to identifying songs by listening to them lies in audio fingerprinting. Just like a person's fingerprint is unique, each song also has its own unique fingerprint. Audio fingerprint is the digital DNA of the audio signal. Its generation process can be roughly divided into the following steps:
Audio signal digitization

The first step in music recognition is to "listen" to the sound. But how does a machine "hear" a song? Sound is essentially a vibration. After being received by the human ear, the human ear will transmit this vibration through the eardrum and other tissues into a signal that the brain can recognize. The principle of a machine listening to music is similar. It converts the vibration of sound into an electrical signal, and then converts the electrical signal into a digital signal that can be processed by a computer.

The sound in the real world is an analog signal, which is continuous (like a line), while the digital signal that computers need to process is discrete (like multiple points). Therefore, it is necessary to convert the continuous sound waveform into a discrete digital signal through sampling . The sampling rate determines the capture effect of the signal. The higher the sampling rate, the denser the points, and the more complete the original sound is preserved .

The digitized signal after feature extraction will then be sent to the audio processing module for sound feature extraction, including conversion from time domain to frequency domain, especially through Fourier transform (a mathematical transformation algorithm) to decompose the continuous audio signal into components of different frequencies.

The time domain signal is the most direct form of sound expression (that is, the waveform we usually see in the recording software), while the frequency domain signal can reflect the frequency components contained in the sound. After the frequency domain analysis, the obtained spectrogram can make the characteristic information of the audio visual. The spectrogram records the frequency and amplitude of each second of the song , and intuitively shows us which frequencies appear in the signal and when, and how their strengths and weaknesses are related .

Audio fingerprint generation

Based on the features of the spectrogram, we can get the audio fingerprint. The audio is generally divided into several small blocks, and the significant frequency peaks in the audio are extracted. The peak combination of each fragment forms the audio fingerprint of the entire song .

Normally, different frequency ranges are processed separately to ensure a balanced analysis of bass, midrange and treble, avoiding confusion or missing certain musical elements.

Each song is converted into a unique audio fingerprint, so even different versions of the same song will generate different fingerprints due to differences in frequency, amplitude and time to ensure the most accurate subsequent matching.

Finally, when we have a song's "fingerprint", the next step is to find a matching fingerprint in the existing song database to identify the specific song. The song recognition technology converts each audio fingerprint into a hash value (a code) because it is much faster to directly compare the hash value than to compare the entire audio. The software will compare the fingerprint of the user's recording with the fingerprint hash value in the database to find matching songs.

Other uses of audio fingerprints

In addition to being used to identify songs, audio fingerprint technology can also be applied in the following areas:

1. Personalized music recommendations

Feature extraction and matching technology also provides a basis for personalized music recommendations. The recommendation system mines users’ preferences based on music features such as melody, rhythm, and emotion, which not only improves the accuracy of recommendations but also helps users discover more music that matches their tastes.

2. Copyright detection and protection

Audio fingerprint technology can also be used for copyright detection and protection, such as detecting whether there are songs with the same content in the media library, detecting whether videos and audio uploaded by users are infringing, and whether a song is used without authorization.

3. Audio playback monitoring

For example, when advertisers need to monitor whether television or radio broadcasts advertisements on time and at a certain frequency, the radio station can use this technology to monitor and count.

<<: Can't help eating? It's not your fault! Maybe there's something wrong with your body.

>>: "Xuanwu" supports Hebei's "Grassland Sky Road"

Five software development trends for 2020

Domestic Tesla was exposed to have secretly reduced its configuration. The official response said that it would upgrade related accessories for consumers free of charge

Blog

Solid info! Tips for writing information flow copy!

Recommend

The fresher, the more dangerous! A type of "daylily" on the high green mountains, if eaten incorrectly, it can cause poisoning!

There is a song whose lyrics may be familiar to m...

How much does it cost to be an agent of Wuxi Home Improvement Mini Program? What is the price of being an agent for Wuxi home improvement mini program?

Why should you be an agent for WeChat Mini Progra...

Netgear has launched a new product line, the Orbi, which can split into two parts. Will it attract you if you don’t have a house?

If someone tells you, "My router is NETGEAR....

Huh? Music also has "fingerprints"! This is how the song recognition function is implemented →

Five software development trends for 2020

CATL and other four parties established a smart car operation safety inspection center

What exactly happened when I blacked out after drinking?

How to completely complete a Weibo event promotion?

Domestic Tesla was exposed to have secretly reduced its configuration. The official response said that it would upgrade related accessories for consumers free of charge

Solid info! Tips for writing information flow copy!

Weibo Fans Advertising Optimization Tips!

"Taiwan-Internet linkage" changes the situation and creates a TV program ecosystem

Drones: From “useless” to “pioneer”

12 Best Strategies for Attracting and Retaining App Users

Recommend

The fresher, the more dangerous! A type of "daylily" on the high green mountains, if eaten incorrectly, it can cause poisoning!

Summary of 12 methods of traffic promotion

Architecture Senior - Diudiu——AIGC|DELL Precision Professional Mobile Workstation AI Drawing Review

Super detailed! iOS 11~14.3 full series of jailbreak tools and fool-proof tutorials released

How much does it cost to join a fitness app in Anshun?

Science in the Week | Wild boars to be removed from the three lists

These 3 types of toothpaste are included in the "blacklist", do you have any at home?

How much does it cost to be an agent of Wuxi Home Improvement Mini Program? What is the price of being an agent for Wuxi home improvement mini program?

Advertising overview in May, inventory of top advertisers for App and brand promotion

The whole process of product operation from 0 to 1

Samsung Galaxy Fold review: An imperfect folding screen, a promising new form factor

How to analyze the 4 characteristics of online hot spots through user psychology

How to use scroll offset of ScrollView in SwiftUI

Case Study: Review of Tmall’s 21-Day Vitality Plan

Netgear has launched a new product line, the Orbi, which can split into two parts. Will it attract you if you don’t have a house?