Everyone may have had this experience. Hearing a familiar melody But I just can't remember the name of the song At this time, turn on the song recognition function After a few seconds The corresponding song appears on the screen. How does this function work? How can you accurately identify the song title in such a short time? Audio fingerprints are key to identifying songs The key to identifying songs by listening to them lies in audio fingerprinting. Just like a person's fingerprint is unique, each song also has its own unique fingerprint. Audio fingerprint is the digital DNA of the audio signal. Its generation process can be roughly divided into the following steps: The first step in music recognition is to "listen" to the sound. But how does a machine "hear" a song? Sound is essentially a vibration. After being received by the human ear, the human ear will transmit this vibration through the eardrum and other tissues into a signal that the brain can recognize. The principle of a machine listening to music is similar. It converts the vibration of sound into an electrical signal, and then converts the electrical signal into a digital signal that can be processed by a computer. The sound in the real world is an analog signal, which is continuous (like a line), while the digital signal that computers need to process is discrete (like multiple points). Therefore, it is necessary to convert the continuous sound waveform into a discrete digital signal through sampling . The sampling rate determines the capture effect of the signal. The higher the sampling rate, the denser the points, and the more complete the original sound is preserved . The digitized signal after feature extraction will then be sent to the audio processing module for sound feature extraction, including conversion from time domain to frequency domain, especially through Fourier transform (a mathematical transformation algorithm) to decompose the continuous audio signal into components of different frequencies. The time domain signal is the most direct form of sound expression (that is, the waveform we usually see in the recording software), while the frequency domain signal can reflect the frequency components contained in the sound. After the frequency domain analysis, the obtained spectrogram can make the characteristic information of the audio visual. The spectrogram records the frequency and amplitude of each second of the song , and intuitively shows us which frequencies appear in the signal and when, and how their strengths and weaknesses are related . Audio fingerprint generation Based on the features of the spectrogram, we can get the audio fingerprint. The audio is generally divided into several small blocks, and the significant frequency peaks in the audio are extracted. The peak combination of each fragment forms the audio fingerprint of the entire song . Normally, different frequency ranges are processed separately to ensure a balanced analysis of bass, midrange and treble, avoiding confusion or missing certain musical elements. Each song is converted into a unique audio fingerprint, so even different versions of the same song will generate different fingerprints due to differences in frequency, amplitude and time to ensure the most accurate subsequent matching. Finally, when we have a song's "fingerprint", the next step is to find a matching fingerprint in the existing song database to identify the specific song. The song recognition technology converts each audio fingerprint into a hash value (a code) because it is much faster to directly compare the hash value than to compare the entire audio. The software will compare the fingerprint of the user's recording with the fingerprint hash value in the database to find matching songs. Other uses of audio fingerprints In addition to being used to identify songs, audio fingerprint technology can also be applied in the following areas: 1. Personalized music recommendations Feature extraction and matching technology also provides a basis for personalized music recommendations. The recommendation system mines users’ preferences based on music features such as melody, rhythm, and emotion, which not only improves the accuracy of recommendations but also helps users discover more music that matches their tastes. 2. Copyright detection and protection Audio fingerprint technology can also be used for copyright detection and protection, such as detecting whether there are songs with the same content in the media library, detecting whether videos and audio uploaded by users are infringing, and whether a song is used without authorization. 3. Audio playback monitoring For example, when advertisers need to monitor whether television or radio broadcasts advertisements on time and at a certain frequency, the radio station can use this technology to monitor and count. |
<<: Can't help eating? It's not your fault! Maybe there's something wrong with your body.
>>: "Xuanwu" supports Hebei's "Grassland Sky Road"
Internet companies want to seize new entrances by...
Many times, if you write a very popular article, ...
I have been collecting cases and organizing mater...
On December 22, 1983, my country's self-devel...
Author: Huang Xianghong Duan Yuechu In the vast u...
Wenchang Tower, also known as Wenbi Tower, is gen...
It is said that young people like to follow fashi...
Chen Xiang’s compulsory short video course that c...
September 22, 2023, Shanghai: Weichai Group has m...
The company's corresponding promotion strateg...
This is a simple mind map of paid community plann...
In this post I want to talk to you about my recen...
In China, the market share of various versions of...
Recently I have been thinking about a question: f...
On August 14, WeChat iOS version 5.4 was official...