AI technology and the governance of "cyber violence": How does artificial intelligence deal with cyber violence?

In terms of the governance of "cyber violence", the country is actively governing cyber violence from a legal perspective by showing red cards and drawing red lines to purify the cyberspace. Some companies have also begun to explore the application of artificial intelligence natural language processing technology to the analysis of cyber violence.

What is “cyberbullying”?

"Cyber violence" refers to defaming and slandering others in the form of text, pictures, videos, etc. on the Internet, damaging others' reputation and privacy, causing mental stress and psychological trauma to the parties involved. It is an extension of social violence on the Internet. The most common cyber violence we see is mainly on Weibo, videos, news information, and forums.

The causes of "cyber violence" are: first, the anonymity of the Internet, which protects personal privacy while also allowing infringers to make reckless remarks; second, some media, in pursuit of traffic and attention, use one-sided reporting and deliberately distort facts to increase topicality; third, when public opinion is formed, individuals tend to tend to follow the direction of group values and ignore their own ability to think rationally.

Natural Language Processing (NLP) and "Cyberbullying"

Cyber violence on social media is mainly spread in the form of comments and bullet comments. For the analysis of unstructured language data such as comments and bullet comments, the core AI technology used is mainly natural language processing. Natural language processing technology is based on machine learning and deep learning methods, which can enable machines to automatically learn language features, so that machines have the ability to understand human language. At present, this technology has been widely used in text classification, automatic summarization, question-answering systems, machine translation, sentiment analysis, etc. Common voice assistants in real life and the recently popular ChatGPT are common applications of natural language processing technology. In terms of "cyber violence" governance, the following directions will also be involved:

Text entity extraction:

The target of "cyberbullying" is usually a certain person or event, so we first need to filter out the comments on a certain cyberbullying event from the massive comment data, which mainly involves the named entity recognition algorithm (NER). NER algorithms are mainly divided into rule-based methods, statistical methods, and deep learning methods.

Figure 1 Named Entity Recognition Method

Text Sentiment Analysis:

Sentiment analysis can score a comment positively or negatively, identify whether the semantics contain different kinds of emotional details, and intelligently extract keywords that have the greatest impact on the overall sentiment from the text. This allows us to understand the emotional distribution of netizens behind tens of millions of comments, and even analyze the emotions of different groups towards different events by time period, region, and gender, timely control negative and violent emotions towards the event, and at the same time, discover more potential cyber violence behaviors based on polarity words.

Figure 2 Different emotion classifications

The technical points involved are mainly text classification and polarity word mining using machine learning (SVM, etc.) or deep learning (CNN). The overall process is shown in the figure:

Figure 3 Sentence-level sentiment analysis solution

Text similarity analysis:

Similarity analysis of comments on the same event can help us discover the public opinion trend of event comments. Similarity analysis of comments on different events can find comments that have similar words or expressions with "cyberbullying" users, and dig out the recent positive/negative public opinion about a certain event/person. Currently, there are two main deep learning paradigms for similarity analysis, as shown in the following figure:

Figure 4 Two paradigms of similarity analysis

The first paradigm first extracts the representation vector of the comment content through a deep neural network, and then calculates the similarity between the two through a simple distance function of the representation vector (such as Euclidean distance). This method of extracting representation vectors is usually implemented using a twin network. Common models belonging to this category include DSSM, CNTN, etc.

The second paradigm is to extract cross-features of comment content through a deep model, obtain matching signal tensors, and then aggregate them into similarity scores.

Syntactic/lexical analysis:

Through syntactic and lexical analysis, we can dig out the common syntactic and lexical habits of a large number of "positive" comments and "cyberbullying" comments, and thus summarize the rhetoric and words commonly used by "cyberbullying" users in the current online environment, as well as the language characteristics used by different users when expressing the polarity of their opinions.

Syntactic structure analysis is used to identify the subject, predicate, object, attributive, adverbial, and complement of a sentence and analyze the relationship between the components. It is generally based on the RNN and LSTM sequence models of deep learning.

The task of lexical analysis is to convert the input comment content string into a word sequence and mark the part of speech of each word. Sequence labeling technology is mainly used. Specific algorithms include conditional random field (CRF), RNN+CRF, etc.

Figure 5 Lexical analysis example

Summarize

The existence of "cyber violence" will not only directly endanger the rights and interests of the victims, but also have a negative impact on network security and social harmony. Relying on its technical accumulation in deep learning, image recognition, natural language processing, OCR, etc., China Mobile Smart Home Operation Center has launched content security protection products that can conduct security detection on multi-dimensional content such as pornography, violence and terrorism, politics, gambling, image OCR, and face recognition in pictures, texts, videos, and audio.

With the development of AI technology, Internet violence governance based on technical means will gradually play an important role. China Mobile Smart Home Operation Center will continue to explore advanced technologies in this scenario, combine cutting-edge technologies in the industry to empower content ecosystem construction, actively respond to the "Clear and Bright" series of special actions of the Cyberspace Administration of China, and contribute to a clear and bright network environment.

References

【1】Zhihu Encyclopedia: Cyberbullying, https://www.zhihu.com/topic/19592480/intro

【2】 Comprehensive interpretation of text sentiment analysis, https://zhuanlan.zhihu.com/p/270399396

【3】Natural Language Processing (NLP) (6) — Lexical Analysis, https://blog.csdn.net/echoKangYL/article/details/87912509

Author: Xu Jingyang

Unit: China Mobile Smart Home Operation Center

<<: Some people don't exercise not because they are lazy, but because they are allergic to exercise

>>: AR Technology Part 2: Visiting the Hefei Advanced Light Source

Let’s talk about the operation and promotion of mobile games!

AI technology and the governance of "cyber violence": How does artificial intelligence deal with cyber violence?

Let’s talk about the operation and promotion of mobile games!

Will this oil promote cancer metastasis? Can we still eat instant noodles, fried chicken, and puffed foods?

Tips for developing a big Tik Tok account that attracts a lot of fans!

We talk about operations every day, so what kind of data does operations need?

Yanling County Mini Program Franchise Company, how much does it cost to join a fresh food mini program?

up to date! Traffic rankings of 33 information flow platforms, a must-read for advertising!

A programmer's guide to preventing sudden death - a programmer's health strategy

ACEA: European new car registrations in September 2020 reached 1.3 million, up 1.1% year-on-year

How to build your own traffic circulation system?

Lao Duan said: How far is Huawei from television?

Recommend

2020 WeChat annual bill is online! Netizen: Am I so rich? ? ?

Unboxing the 1,000-yuan Redmi Note

Are you young but have wrinkles all over your neck? These behaviors will make your neck wrinkles deeper and deeper...

Yiping expands the world | Huanwang joins hands with automotive media to open up new traffic for large screens

Suddenly I felt my heart skip a beat. Is it serious?

Hi nova 11 mobile phone review: The budget is well spent, and its strengths are gaming, photography and battery life

8D Magic City: I thought I was on the first floor, but I was actually on the 27th floor?

[Smart Farmers] Although “earthy”, “rich”, protecting the world in the soil

Deorbiting Sails: The “Scavenger” of Space Junk

Double 11 Douyin e-commerce live broadcast review practical manual

Didi’s growth secret: How to discover the most effective channels and growth methods?

What is the “black technology” behind green steel production?

Detailed description of the INotifyPropertyChanged interface

Can three cobblers really be better than Zhuge Liang?

What issues do you need to consider when you first start UI design?