Google starts testing voice payment. Can you really pay with your mouth?

With the advancement of technology, the means of mobile payment have been changing with each passing day. From the cumbersome password input in the early days, there are now more convenient and secure fingerprint and face recognition. However, voiceprint recognition, which is now very common on smartphones, is rarely used in the payment field. Recently, there is news that Google has begun to implement this voice payment function in its products, which allows users to pay with their mouths.

According to relevant media reports, Google is currently testing a new feature that will allow consumers to use Voice Match to authorize and confirm payment. Google also confirmed that not all purchases will provide voice recognition, and at this stage this feature is only applicable to in-app purchases and restaurant orders, not Google Shopping.

[[328224]]

According to the report, this voice payment function was originally planned to be released at this year's I/O developer conference, but due to the epidemic, it had to skip the release and start testing directly. Currently, in the payment interface of Google Assistant, you can see the option of "Confirm with Voice Match".

In fact, voice payment technology is not new, and is even older than natural language processing (NLP) that voice assistants rely on. Yes, although both voice payment and natural language processing are related to voice, there is a world of difference between the two. The essence of voice payment is voiceprint recognition, but voiceprint recognition is obviously not equal to voice recognition.

Sound wave transmission is a data communication mode that uses sound as a transmission method. For example, the process of speaking can be understood as the process of encoding signals into sound. The process of listening is the process of decoding audio signals into language and text. The correspondence between the Chinese characters and pinyin used is actually equivalent to the audio protocol.

However, voiceprint recognition is a personal recognition, which requires extracting the voiceprint characteristics in the voice to determine who is speaking, that is, to solve the problem of "who is speaking". Speech recognition is a common recognition, which determines the content of the speech and solves the problem of "what was said". Obviously, the most important thing in voice payment is to determine who is the person who issued the payment command.

Since the size and shape of each person's vocal organs are unlikely to be exactly the same, these differences also lead to changes in the vocal airflow, which in turn produce differences in voiceprints. That's why we can "hear the voice before seeing the person" and judge the identity of the voice owner through timbre, pitch and speaking habits. Similarly, we can use algorithms to extract obvious, abstract and high-dimensional voiceprint features from voice information, and use deep learning to train the model, and then use unique biological features to prove the proposition of "I am myself".

In fact, the process of using voiceprint recognition to complete voice payment is very simple. The user sends a sound wave with a certain command, and the terminal device obtains this sound wave and converts it into a session, and sends the specific product information and transaction number to the Google backend. After matching the voiceprint information on the server side, the transaction operation can be started, and finally the information of the completed transaction is pushed to the Google voice assistant.

Before Google confirmed that it was testing voice payment, Amazon had already begun allowing users to pay bills on its Alexa last fall using voice commands. Once the user approves the transaction using something like "Alexa, pay my mobile bill," Alexa will use Amazon Pay to pay the bill amount and send a confirmation via the user's registered mobile phone number. In addition, Tmall Genie in the domestic market has also long been able to use voice payment. According to data released by Alibaba, during the Double 11 period last year alone, a total of 1.05 million orders on Tmall Genie were successfully paid by speaking.

However, what Google wants to achieve is obviously not just using voice payment on its own Google Home smart speaker, but is aiming at smart voice assistants that are more applicable to a wider range of scenarios. But what Google can think of, can't Amazon and Alibaba think of it? Fully integrating voice payment on smart voice assistants will undoubtedly greatly improve the user experience. After all, compared with face and fingerprint recognition, voiceprint recognition is much more convenient.

However, Amazon and Alibaba chose to limit this function to smart speakers, which are usually placed at home. There is a reason for this. Compared with fingerprint or facial information, voice is less controllable. After all, users can decide whether to put their fingers on the fingerprint recognition module or put their faces in front of the camera, but they cannot control the transmission of sound in this way.

More importantly, fingerprint information is difficult to collect, and facial recognition usually requires liveness detection, but voiceprint recognition is not only easy to collect, but also difficult to determine the user's state when speaking the payment command. In addition, AI technology has been fully spread today. Through deep learning models and waveform editing tools, the voice data of specified content can be spliced, and the user's voiceprint spectrum can be almost completely reproduced.

The security issues of voice payment do not only occur on the client side, the server side also faces certain risks. Voice payment can be regarded as a data interaction. For example, the cookie mechanism adopts a solution to keep the state on the client side, while the Session mechanism adopts a solution to keep the state on the server side. When the user visits the server for the first time, a Session will be created for the client, and a Session ID will be calculated through a special algorithm to identify the object.

However, since voice payment is not a one-time behavior, the next time the user interacts with the server, the data must be completed through SessionID. However, the implementation mechanism of SessionID makes it possible to be hijacked, such as classic XSS cross-site scripting attacks, network sniffing, proxy hijacking and other different attack modes. If SessionID is hijacked, the hacker can obtain the legitimate session of the target user, and then empty the victim's wallet like credit card fraud.

Therefore, this may be one of the important reasons why Google itself admitted that if the feedback and performance are too negative, the feature may not even be launched to the public. Therefore, before Google solves the critical security issues, if you want to complete the shopping experience by opening your mouth, it may only be possible on smart speakers for the time being.

<<: Will QR codes be scanned by humans? Yes! But we can’t wait for that day

>>: Four new and useful features! Detailed experience of the new WeChat version

The kingdom war is about to begin. Let's see how the army of summer eczema is defeated...

Recommend

Beware! The cold and dry winter climate can easily induce four types of eye diseases. It is important to know how to prevent them correctly!

It is winter, the climate is cold and dry, and th...

How to deal with ophthalmic emergencies?

Acute attack of primary acute angle-closure glauc...

What are the functions of the Lanzhou Campus Errands Mini Program? How much does it cost to create an errand running app?

Some time ago, a student came to us and said that ...

When walking, your legs suddenly become weak. What is wrong with your knees? This article will teach you how to distinguish

Knee pain, usually no problem The part that can k...

China Passenger Car Association: China's auto sales in January-February 2025 will reach 13.52 million units, accounting for 33.7% of the global market share

According to recent news, Cui Dongshu, secretary-...

GoFun Travel enables the development of the entire life cycle of the industry chain and jointly seeks diversified value space in the travel field

On October 23, GoFun Travel held a strategic conf...

Google starts testing voice payment. Can you really pay with your mouth?

The kingdom war is about to begin. Let's see how the army of summer eczema is defeated...

How do smartwatches and fitness trackers reveal our mental state?

What has happened on Earth in the past 4.6 billion years?

Why are condoms and chewing gum always placed together in supermarkets? The marketing trick behind it is actually...

Look at Generation Z marketing from Bilibili and Xiaohongshu!

If I sleep three or four hours today and more than ten hours tomorrow, can I make up for it?

3 information flow advertising cases in the financial industry, watch them all at once, so awesome!

Hydration and moisturizing! 90% of people fail this introductory course

Can homosexuals have children? ——Research results of new academicians

Why did Qianjiang cancel the Notice No. 26? What happened to Qianjiang canceling the Notice No. 26?

Recommend

Beware! The cold and dry winter climate can easily induce four types of eye diseases. It is important to know how to prevent them correctly!

How to deal with ophthalmic emergencies?

What are the functions of the Lanzhou Campus Errands Mini Program? How much does it cost to create an errand running app?

When walking, your legs suddenly become weak. What is wrong with your knees? This article will teach you how to distinguish

Why does sending voice messages on WeChat indicate low emotional intelligence and impoliteness?

Counterpoint: 4G Cat 1 bis will replace 4G Cat 1 for IoT in 2030

Apple and Google team up to track the spread of the epidemic using iPhone and Android apps

China Passenger Car Association: China's auto sales in January-February 2025 will reach 13.52 million units, accounting for 33.7% of the global market share

GoFun Travel enables the development of the entire life cycle of the industry chain and jointly seeks diversified value space in the travel field

How can Samsung remain surviving despite facing three major controversies: product, company and politics?

iPhone 6S battery has shrunk! What about battery life?

There are 7 ways to promote on Xiaohongshu. Have you ever made a mistake?

Secret eavesdropping, targeted promotion...are they all true?

Frequent chatting with the opposite sex will lead to "love illusion"? Do you know the psychological reasons behind it?

These 15 key statistics are what all game developers need to know