AI technology and the governance of "cyber violence": How does artificial intelligence deal with cyber violence?

AI technology and the governance of "cyber violence": How does artificial intelligence deal with cyber violence?

In terms of the governance of "cyber violence", the country is actively governing cyber violence from a legal perspective by showing red cards and drawing red lines to purify the cyberspace. Some companies have also begun to explore the application of artificial intelligence natural language processing technology to the analysis of cyber violence.

What is “cyberbullying”?

"Cyber ​​violence" refers to defaming and slandering others in the form of text, pictures, videos, etc. on the Internet, damaging others' reputation and privacy, causing mental stress and psychological trauma to the parties involved. It is an extension of social violence on the Internet. The most common cyber violence we see is mainly on Weibo, videos, news information, and forums.

The causes of "cyber violence" are: first, the anonymity of the Internet, which protects personal privacy while also allowing infringers to make reckless remarks; second, some media, in pursuit of traffic and attention, use one-sided reporting and deliberately distort facts to increase topicality; third, when public opinion is formed, individuals tend to tend to follow the direction of group values ​​and ignore their own ability to think rationally.

Natural Language Processing (NLP) and "Cyberbullying"

Cyber ​​violence on social media is mainly spread in the form of comments and bullet comments. For the analysis of unstructured language data such as comments and bullet comments, the core AI technology used is mainly natural language processing. Natural language processing technology is based on machine learning and deep learning methods, which can enable machines to automatically learn language features, so that machines have the ability to understand human language. At present, this technology has been widely used in text classification, automatic summarization, question-answering systems, machine translation, sentiment analysis, etc. Common voice assistants in real life and the recently popular ChatGPT are common applications of natural language processing technology. In terms of "cyber violence" governance, the following directions will also be involved:

Text entity extraction:

The target of "cyberbullying" is usually a certain person or event, so we first need to filter out the comments on a certain cyberbullying event from the massive comment data, which mainly involves the named entity recognition algorithm (NER). NER algorithms are mainly divided into rule-based methods, statistical methods, and deep learning methods.

Figure 1 Named Entity Recognition Method

Text Sentiment Analysis:

Sentiment analysis can score a comment positively or negatively, identify whether the semantics contain different kinds of emotional details, and intelligently extract keywords that have the greatest impact on the overall sentiment from the text. This allows us to understand the emotional distribution of netizens behind tens of millions of comments, and even analyze the emotions of different groups towards different events by time period, region, and gender, timely control negative and violent emotions towards the event, and at the same time, discover more potential cyber violence behaviors based on polarity words.

,

Figure 2 Different emotion classifications

The technical points involved are mainly text classification and polarity word mining using machine learning (SVM, etc.) or deep learning (CNN). The overall process is shown in the figure:

Figure 3 Sentence-level sentiment analysis solution

Text similarity analysis:

Similarity analysis of comments on the same event can help us discover the public opinion trend of event comments. Similarity analysis of comments on different events can find comments that have similar words or expressions with "cyberbullying" users, and dig out the recent positive/negative public opinion about a certain event/person. Currently, there are two main deep learning paradigms for similarity analysis, as shown in the following figure:

Figure 4 Two paradigms of similarity analysis

The first paradigm first extracts the representation vector of the comment content through a deep neural network, and then calculates the similarity between the two through a simple distance function of the representation vector (such as Euclidean distance). This method of extracting representation vectors is usually implemented using a twin network. Common models belonging to this category include DSSM, CNTN, etc.

The second paradigm is to extract cross-features of comment content through a deep model, obtain matching signal tensors, and then aggregate them into similarity scores.

Syntactic/lexical analysis:

Through syntactic and lexical analysis, we can dig out the common syntactic and lexical habits of a large number of "positive" comments and "cyberbullying" comments, and thus summarize the rhetoric and words commonly used by "cyberbullying" users in the current online environment, as well as the language characteristics used by different users when expressing the polarity of their opinions.

Syntactic structure analysis is used to identify the subject, predicate, object, attributive, adverbial, and complement of a sentence and analyze the relationship between the components. It is generally based on the RNN and LSTM sequence models of deep learning.

The task of lexical analysis is to convert the input comment content string into a word sequence and mark the part of speech of each word. Sequence labeling technology is mainly used. Specific algorithms include conditional random field (CRF), RNN+CRF, etc.

Figure 5 Lexical analysis example

Summarize

The existence of "cyber violence" will not only directly endanger the rights and interests of the victims, but also have a negative impact on network security and social harmony. Relying on its technical accumulation in deep learning, image recognition, natural language processing, OCR, etc., China Mobile Smart Home Operation Center has launched content security protection products that can conduct security detection on multi-dimensional content such as pornography, violence and terrorism, politics, gambling, image OCR, and face recognition in pictures, texts, videos, and audio.

With the development of AI technology, Internet violence governance based on technical means will gradually play an important role. China Mobile Smart Home Operation Center will continue to explore advanced technologies in this scenario, combine cutting-edge technologies in the industry to empower content ecosystem construction, actively respond to the "Clear and Bright" series of special actions of the Cyberspace Administration of China, and contribute to a clear and bright network environment.

References

【1】Zhihu Encyclopedia: Cyberbullying, https://www.zhihu.com/topic/19592480/intro

【2】 Comprehensive interpretation of text sentiment analysis, https://zhuanlan.zhihu.com/p/270399396

【3】Natural Language Processing (NLP) (6) — Lexical Analysis, https://blog.csdn.net/echoKangYL/article/details/87912509

Author: Xu Jingyang

Unit: China Mobile Smart Home Operation Center

<<:  Some people don't exercise not because they are lazy, but because they are allergic to exercise

>>:  AR Technology Part 2: Visiting the Hefei Advanced Light Source

Recommend

Information flow promotion, analysis of 7 excellent case studies and techniques!

For third-party optimizers, who deal with clients...

Four months after entering China, has Apple Pay experienced its embarrassment?

In February of this year, I suddenly found that a...

The most comprehensive guide to large-scale event promotion in history!

Be a long-termist and try to maximize the experie...

Why is ripping off tape so noisy? Because it's about the same as a rocket launch

When we think of supersonic shock waves, we may s...

Marketing campaign analysis in 2022

How is the marketing campaign at the beginning of...

2021 Jiankun University English Listening Comprehension Class

2021 Jiankun University English Listening Compreh...

How to build an overseas operation and promotion system from 0 to 1!

Preface: Many Chinese Internet startup teams goin...

How can online education improve purchase conversion rate?

If you are working in the education industry, you...

Advertising landing page production process

One of the things that netizens hate most when su...

What should we pay attention to when placing the Wenchang Tower?

1. Introduction to Wenchang Tower Wenchang Pagoda...

How will WeChat, Tik Tok, and Weibo play in the second half of the Internet?

In my opinion, content operations must grow toget...