Five-minute technical talk | Solution for implementing APP intelligent voice interaction based on Speech framework

Part 01

Overview

The system's speech framework cannot be used by external developers, but Apple has opened up a Speech framework with similar capabilities and behaviors to developers based on machine learning capabilities. You can implement the type keyboard dictation function by calling the open interface capabilities in your own APP application. For example, you can use speech recognition to recognize voice commands or process text dictation in other parts of the application. You can perform speech recognition in many languages, but each SFSpeech object runs on one language, and the Speech framework also relies on Apple's servers for speech recognition, requiring the device to always be connected to the network.

Part 02

Speech Framework: Class Structure

Part 03

Speech framework: speech recognition process

The Speech framework provides a unified interface for fast speech recognition and is easy to use. However, there are some caveats, as follows:

Handling failures caused by speech recognition limits: Speech recognition is a network-based service, individual devices may be limited in the number of recognitions they can perform per day, and each application may be globally limited in the number of requests it makes per day.

1 minute audio duration limit: Speech recognition places a relatively high burden on battery life and network usage. To minimize this burden, the framework stops speech recognition tasks that last longer than 1 minute. This limit is similar to the dictation limit associated with the keyboard.

Don't send private or sensitive information to voice recognition: Don't send passwords, health or financial data, and other sensitive voice recognition data.

Part 04

Speech Framework: Practical Application on Hejiaqin

The main application of the Speech framework in Hejiaqin is in intelligent voice customer service and intelligent management and control. Through the Speech framework, the conversion and display of voice input to content text can be quickly realized, which greatly improves the interactive experience. The main logical flow of Hejiaqin's application of the Speech framework to realize device voice management and control is shown in the figure below👇

picture

The main process steps are as follows:

1️⃣The APP builds matching retrieval data tables locally, including the control action semantic matching retrieval table, the device or activity semantic matching retrieval table, the custom voice control command matching retrieval table, and the default voice control command matching retrieval table.

2️⃣ Apply the Speech framework capability interface to convert the voice input collected by the app into text content and display it on the APP interaction page.

3️⃣Calculate and sort the overall text similarity of the text content converted in step 2 with the locally built custom voice control command matching retrieval table and the default voice control command matching retrieval table, and find the first-level similar control commands and their confidence levels, and the third-level similar commands and their confidence levels.

4️⃣Perform word segmentation on the text content converted in step 2 to extract verbs, nouns, place names, product names, etc. in the text.

5️⃣Calculate and match the verbs, nouns, etc. in step 4 with the control action semantic matching retrieval table and the equipment or activity semantic matching retrieval table respectively to find the optimal action matching result and the optimal activity or equipment matching result, and combine the action and activity or equipment to form a secondary control instruction and its confidence.

6️⃣ Sort the above-mentioned first, second and third level control instructions according to different weights and their corresponding confidence results, and return the sorting results to display on the interactive page, waiting for the user to confirm the final instruction.

7️⃣Execute final control instructions.

<<: Five-minute technical talk | Introduction to common Android development architecture

>>: Accelerate the future! Summary and future prospects of Autohome App application performance optimization

Wuji Doudian fruit category full-case gameplay, Douyin store fruit practice class, 2022 Douyin e-commerce bonus outlet

Popular Science Illustrations | One panel "activates" the sand sea! Stereo photovoltaic sand control "turns sand into gold"

Blog

Recommend

Yu Yuetong reveals the tactics of the main players in A-shares and helps you unlock the operating secrets of the main players

Yu Yuetong's A-share main force tactics revea...

iQOO 11S review: Supercomputing independent graphics chip + E6 screen + 200W flash charging, powerful for more than just gaming

With the improvement of mobile chip performance, ...

The 8th session of Youlianhui TikTok Overseas Gold Mining Training Camp, Teacher Xiaobei is in charge of the practical courses that are upgraded again

The 8th session of Youlianhui TikTok Overseas Gol...

Five-minute technical talk | Solution for implementing APP intelligent voice interaction based on Speech framework

Part 01

Overview

Part 02

Speech Framework: Class Structure

Part 03

Speech framework: speech recognition process

Part 04

Speech Framework: Practical Application on Hejiaqin

Wuji Doudian fruit category full-case gameplay, Douyin store fruit practice class, 2022 Douyin e-commerce bonus outlet

Beware! "Xuan Lan Nuo" strengthens into a super typhoon again... Stop work! Stop flights! Stop classes!

The price of iPhone 6's ultra-thinness: big trouble!

Apple Pay and Alipay cooperation is media's wishful thinking

China Association of Automobile Manufacturers: Economic Operation of China's Automobile Industry in August 2023

Popular Science Illustrations | One panel "activates" the sand sea! Stereo photovoltaic sand control "turns sand into gold"

Zhou Chenchen: "Performance Testing"

Eating breakfast like this can actually accelerate aging? A change in breakfast can reduce the risk of disease!

Guangdong bans electric vehicles

Stock Broker Training Camp: Only Know How to Trade Stocks Episode 18

Recommend

Yu Yuetong reveals the tactics of the main players in A-shares and helps you unlock the operating secrets of the main players

Can the stubborn Netflix break into the 200 billion Chinese video market?

Why is watching TikTok addictive?

Don't buy this kind of fertilizer online to grow plants! Some people got seriously ill and almost died because of it

Wuhan Tea Tasting Recommendation Exchange

iQOO 11S review: Supercomputing independent graphics chip + E6 screen + 200W flash charging, powerful for more than just gaming

How does QQ construct its membership activity operation platform?

Where does the earthquake come from? The “culprit” is a 6000℃ hot sphere…

How much does it cost to join a catering mini program in Hailar?

How to plan a promotional event that will attract a lot of people’s participation?

Popular Science Illustrations | Increase your knowledge! Is there another way to “visit” a museum?

The world's most powerful laser is activated! What is chirped pulse amplification technology?

Writing a JavaScript framework: better timed execution than setTimeout

Is it better to warm up your car for a longer time in winter? Experienced drivers: This is the best way →

The 8th session of Youlianhui TikTok Overseas Gold Mining Training Camp, Teacher Xiaobei is in charge of the practical courses that are upgraded again