Google starts testing voice payment. Can you really pay with your mouth?

Google starts testing voice payment. Can you really pay with your mouth?

With the advancement of technology, the means of mobile payment have been changing with each passing day. From the cumbersome password input in the early days, there are now more convenient and secure fingerprint and face recognition. However, voiceprint recognition, which is now very common on smartphones, is rarely used in the payment field. Recently, there is news that Google has begun to implement this voice payment function in its products, which allows users to pay with their mouths.

According to relevant media reports, Google is currently testing a new feature that will allow consumers to use Voice Match to authorize and confirm payment. Google also confirmed that not all purchases will provide voice recognition, and at this stage this feature is only applicable to in-app purchases and restaurant orders, not Google Shopping.

[[328224]]

According to the report, this voice payment function was originally planned to be released at this year's I/O developer conference, but due to the epidemic, it had to skip the release and start testing directly. Currently, in the payment interface of Google Assistant, you can see the option of "Confirm with Voice Match".

In fact, voice payment technology is not new, and is even older than natural language processing (NLP) that voice assistants rely on. Yes, although both voice payment and natural language processing are related to voice, there is a world of difference between the two. The essence of voice payment is voiceprint recognition, but voiceprint recognition is obviously not equal to voice recognition.

Sound wave transmission is a data communication mode that uses sound as a transmission method. For example, the process of speaking can be understood as the process of encoding signals into sound. The process of listening is the process of decoding audio signals into language and text. The correspondence between the Chinese characters and pinyin used is actually equivalent to the audio protocol.

However, voiceprint recognition is a personal recognition, which requires extracting the voiceprint characteristics in the voice to determine who is speaking, that is, to solve the problem of "who is speaking". Speech recognition is a common recognition, which determines the content of the speech and solves the problem of "what was said". Obviously, the most important thing in voice payment is to determine who is the person who issued the payment command.

Since the size and shape of each person's vocal organs are unlikely to be exactly the same, these differences also lead to changes in the vocal airflow, which in turn produce differences in voiceprints. That's why we can "hear the voice before seeing the person" and judge the identity of the voice owner through timbre, pitch and speaking habits. Similarly, we can use algorithms to extract obvious, abstract and high-dimensional voiceprint features from voice information, and use deep learning to train the model, and then use unique biological features to prove the proposition of "I am myself".

In fact, the process of using voiceprint recognition to complete voice payment is very simple. The user sends a sound wave with a certain command, and the terminal device obtains this sound wave and converts it into a session, and sends the specific product information and transaction number to the Google backend. After matching the voiceprint information on the server side, the transaction operation can be started, and finally the information of the completed transaction is pushed to the Google voice assistant.

Before Google confirmed that it was testing voice payment, Amazon had already begun allowing users to pay bills on its Alexa last fall using voice commands. Once the user approves the transaction using something like "Alexa, pay my mobile bill," Alexa will use Amazon Pay to pay the bill amount and send a confirmation via the user's registered mobile phone number. In addition, Tmall Genie in the domestic market has also long been able to use voice payment. According to data released by Alibaba, during the Double 11 period last year alone, a total of 1.05 million orders on Tmall Genie were successfully paid by speaking.

However, what Google wants to achieve is obviously not just using voice payment on its own Google Home smart speaker, but is aiming at smart voice assistants that are more applicable to a wider range of scenarios. But what Google can think of, can't Amazon and Alibaba think of it? Fully integrating voice payment on smart voice assistants will undoubtedly greatly improve the user experience. After all, compared with face and fingerprint recognition, voiceprint recognition is much more convenient.

However, Amazon and Alibaba chose to limit this function to smart speakers, which are usually placed at home. There is a reason for this. Compared with fingerprint or facial information, voice is less controllable. After all, users can decide whether to put their fingers on the fingerprint recognition module or put their faces in front of the camera, but they cannot control the transmission of sound in this way.

More importantly, fingerprint information is difficult to collect, and facial recognition usually requires liveness detection, but voiceprint recognition is not only easy to collect, but also difficult to determine the user's state when speaking the payment command. In addition, AI technology has been fully spread today. Through deep learning models and waveform editing tools, the voice data of specified content can be spliced, and the user's voiceprint spectrum can be almost completely reproduced.

The security issues of voice payment do not only occur on the client side, the server side also faces certain risks. Voice payment can be regarded as a data interaction. For example, the cookie mechanism adopts a solution to keep the state on the client side, while the Session mechanism adopts a solution to keep the state on the server side. When the user visits the server for the first time, a Session will be created for the client, and a Session ID will be calculated through a special algorithm to identify the object.

However, since voice payment is not a one-time behavior, the next time the user interacts with the server, the data must be completed through SessionID. However, the implementation mechanism of SessionID makes it possible to be hijacked, such as classic XSS cross-site scripting attacks, network sniffing, proxy hijacking and other different attack modes. If SessionID is hijacked, the hacker can obtain the legitimate session of the target user, and then empty the victim's wallet like credit card fraud.

Therefore, this may be one of the important reasons why Google itself admitted that if the feedback and performance are too negative, the feature may not even be launched to the public. Therefore, before Google solves the critical security issues, if you want to complete the shopping experience by opening your mouth, it may only be possible on smart speakers for the time being.

<<:  Will QR codes be scanned by humans? Yes! But we can’t wait for that day

>>:  Four new and useful features! Detailed experience of the new WeChat version

Recommend

What are some low-cost ways to acquire customers for your products?

The secret to success on the Internet is to be ab...

What is the black technology to save 16/32GB iPhone APFS?

There is hope for iPhone users with 16GB of stora...

How to use Xiaohongshu to create a hot-selling online celebrity product

To make it easier to understand this article, let...

Analysis of mobile advertising market in Q3 2021

In Q3 2021, the gaming, short video, e-commerce l...

The most creative outdoor advertising collection!

Recently, Coca-Cola’s creative outdoor advertisin...

Solution for Android alarm setting

Setting an alarm on Android is not as simple as s...

What problems has Apple’s core business encountered in the past year?

The common perception is that Apple products are ...

Win10 vs Win8: A must-upgrade for DX12 games

It's difficult to figure out Windows 10 perfo...

Guo Ruoxi's 7-day breast shaping plan

Guo Ruoxi's 7-day breast shaping plan resourc...

Promotion strategy: How to master live streaming information flow promotion?

The online live streaming industry market has exp...

Apple's App Store search rules have become dramatically more equal

Last week, the news that Apple CEO Tim Cook came ...