Toutiao’s addictive data mining

Toutiao’s addictive data mining

Due to some irresistible forces, Toutiao 's products have been blocked overseas, as have other companies. But let us just take a look at these forces.

This article will look at Toutiao and Douyin from two perspectives: product + technology, so that we can have an understanding of Toutiao’s products.

Of course, this is just my personal superficial analysis based on limited information and knowledge. First of all, we need to know that the two products have one similarity, which is that they are fun and can be liked by everyone (all over the world).

Tik Tok demonstrates that a good product is one that allows everyone to create with complete freedom and record everyone's life.

Before we begin, we need to have a general understanding of the data of Douyin and Toutiao. The following two sets of data record the development history of Douyin and Toutiao.

Toutiao: A recommendation engine product based on data mining.

As of December 2015, Toutiao had 350 million activated users and more than 35 million daily active users.

Among them, the number of accounts on the "Toutiao" platform has exceeded 41,000, with various media, governments, and institutions totaling more than 11,000; there are more than a thousand traditional media that have signed cooperation agreements, and the total number of "Toutiao" self-media accounts exceeds 30,000.

Tik Tok: A search engine that is the same on a technical level.

Since its launch on Toutiao in September 2016, it has been positioned as a music short video community suitable for young Chinese people. Its application is vertical music UGC short videos. Since 2017, it has achieved rapid growth in user scale.

The download and installation volume of TikTok, the international version of Douyin, once jumped to first place in the US market, and has topped the local App Store or Google Play rankings many times in Japan, Thailand, Indonesia, Germany, France, Russia and other places.

According to Wang Xiaowei, product manager of Douyin, on September 2, 2017, "85% of Douyin users are under the age of 24, and the main influencers and users are basically born after 1995 or even 2000. As of October 2018, the app has been downloaded by more than 800 million global users in more than 150 countries."

The latest data from Sensor Tower in May 2020 shows that the total number of downloads of "Douyin" and "TikTok" overseas in the global App Store and Google Play app stores has exceeded 2 billion times.

These two sets of data illustrate the popularity of Tik Tok and Toutiao. Good products show good data and user growth, which provides us product managers with an example to learn from.

Next, we will look at Douyin and Toutiao from a product perspective, mainly analyzing the similarities between the two.

1. The source of addictive happiness

When Toutiao was first launched, it recommended every piece of news to the right people as much as possible; the same technology applied to TikTok will produce the same effect.

If Toutiao's algorithm is successfully demonstrated, then we can look at the effect of Douyin. Toutiao's trial product has made its own Douyin the most popular short video social product in the world.

Whether it is Douyin or Toutiao, every user can find content that they like, and there is no content that they don’t like.

If you use WeChat or QQ to communicate at work, everyone is under a state of stress; the opposite is true for Douyin. Everyone is free from the worries and pressure of work. During breaks or after get off work, they will open Douyin or Toutiao to relax and entertain themselves in a stress-free state.

We all know that happiness is good, and we all like the feeling of happiness, but is addictive happiness still good?

When we blindly pursue happiness on TikTok and consume our attention, it is like we want to finish our work in a hurry, and then open TikTok to watch short videos on it. We are based on escape, anxious to escape from the stressful work environment, and the same simple escape will deepen our feelings.

This feeling is there at all times, deepening every day. It can also be said that this feeling is repeated every day. The only way is to gradually give up Douyin or Toutiao, reduce your dependence on this product, and reduce the number of times and time you use Douyin.

2. Publicize

We all know that what is popular nowadays is to go to reporters for any problems instead of going to the police as before. Why can a simple piece of public opinion news cause such a big response?

First of all, we have to be grateful for this information-rich society and country. Information is so advanced that we can know whose cat is missing, the police will search the whole city for it, and then it will become a hot topic. This is thanks to the fans behind the scenes, fans are really powerful.

As media platforms, Toutiao and Douyin are able to review and control the content they present, partly due to policy.

If it is fully open, like overseas markets, it will also face a series of regulatory issues. In China, TikTok is more of an entertainment platform, and you are not allowed to express your personal opinions on it, so what we see is life and records these different lives.

We will not discuss too much about the speech part, we will continue to talk about the public opinion behind TikTok as a media product.

I wonder if we have noticed that once public opinion becomes a hot topic, the first thing to consider is etiquette and morality; no matter how strict the law is, some etiquette and morality will be involved.

Of course, everyone has different moral concepts, but the public's moral concepts will make everyone follow. In plain language, it is to conform to the moral concepts of the group, not the individual's concepts. If an individual holds different moral concepts, then you will not be able to participate in this hot public opinion event.

3. Data Mining

Every mature product cannot be separated from the support of technology. The difference between technology and scientific research is that technology needs to create value, while scientific research is valueless research. The technical data mining behind Toutiao and Douyin will be introduced below.

1. Data Mining

It is an interdisciplinary branch of computer science. A computational process that discovers patterns in relatively large data sets, involving methods from the intersection of artificial intelligence, machine learning, statistics, and databases.

The overall goal of the data mining process is to extract information from a data set and transform it into an understandable structure for further use.

In addition to the original analysis steps, it also involves database and data management, data preprocessing, model and inference considerations, interestingness measures, complexity considerations, and post-processing such as structure discovery, visualization, and online updating, which essentially belong to the scope of machine learning.

Terms like "data dredging", "data fishing", and "data detection" refer to the use of data mining methods to sample portions of a larger overall data set that are (potentially) too small to reliably draw statistical inferences about the validity of any patterns discovered, although these methods can nonetheless develop new hypotheses to test the larger data set.

2. History

Data mining is the result of the rapid growth of massive amounts of useful data.

The use of computers to analyze historical data has made digital data collection a reality since the 1960s. In the 1980s, relational databases developed along with structured query languages ​​that could adapt to dynamic, on-demand data analysis, and data warehouses began to be used to store large amounts of data.

Faced with the challenge of processing large amounts of data in databases, data mining came into being. For these problems, its main methods are data statistical analysis and artificial intelligence search technology.

3. Definitions

Data has the following different definitions:

“Extract hidden and previously unknown valuable potential information from the data”;

"The science of extracting useful information from large amounts of data or databases."

Although data mining is usually applied to data analysis, like artificial intelligence, it is also a term with rich meanings and can be used in different fields.

Its relationship with KDD (Knowledge discovery in databases) is that KDD is the process of identifying valid, novel, potentially useful, and ultimately understandable patterns from data; while data mining is a step in KDD to generate specific patterns within acceptable computational efficiency limits through specific algorithms.

In fact, in current literature, the two terms are often used interchangeably.

4. Essence

Data mining is essentially a part of machine learning.

For example, the book Data Mining: Practical Machine Learning Techniques and Java Implementation is mostly about machine learning. The book was originally called "Practical Machine Learning". The term "data mining" was added later for marketing purposes.

Often it is more accurate to use the more formal terms (large-scale) data analysis and analytics, or to refer to actual research methods such as artificial intelligence and machine learning.

5. Process

The actual work of data mining is to automatically or semi-automatically analyze large-scale data to extract valuable potential information that was previously unknown. For example: data grouping (through cluster analysis), abnormal records of data (through anomaly detection) and the relationship between data (through association rule mining).

This often involves database techniques such as spatial indexes. This potential information can be presented through summaries of processed input data, which can then be used for further analysis, such as machine learning and predictive analytics.

For example, when performing data mining operations, you may need to divide the data into multiple groups, and then use a decision support system to obtain more accurate prediction results.

However, data collection, data preprocessing, result interpretation and report writing are not considered steps in data mining, but they do belong to the "knowledge discovery in databases" (KDD) process, but they are just some additional steps.

The Knowledge Discovery in Databases (KDD) process is usually defined as the following phases:

  1. choose
  2. Preprocessing
  3. Transform
  4. Data Mining
  5. Explanation/Evaluation

1) Preprocessing

Before applying data mining algorithms, the target data set must be collected.

Since data mining can only discover patterns that actually exist in the data, the target dataset must be large enough to contain these patterns, while the rest must be concise enough to be mined within an acceptable time frame. Common data sources are data supermarkets or data warehouses.

Before data mining, it is necessary to preprocess the data to analyze the multivariate data, and then clean the target set. Data cleaning removes observations that contain noise and missing data.

2) Data Mining

Data mining involves six common tasks:

  1. Anomaly detection (anomaly/change/deviation detection) : Identify unusual data records and erroneous data that require further investigation;
  2. Association rule learning (dependency modeling) : Searching for relationships between variables. For example, a supermarket might collect data on customers’ purchasing habits. Using association rule learning, the supermarket can determine which products are often bought together and use this information to help with marketing—this is sometimes called market basket analysis.
  3. Clustering : It is to discover the categories and structures of data under the structure of unknown data;
  4. Classification : is the task of generalizing known structures to new data. For example: an email program might attempt to classify an email as "legitimate" or "spam";
  5. Regression : Try to find a function that can model the data with minimal error;
  6. Automatic summarization : Provides a more compact representation of a dataset, including generating visualizations and reports.

3) Result verification

The value of data mining generally carries a certain purpose, and whether this purpose is achieved can generally be verified through results.

Verification means "the determination that specified requirements have been met by providing objective evidence", and the planning, implementation and completion of this "determination" activity are closely related to the content of the "specified requirements".

The setting of "specified requirements" for data validation in the data mining process is often related to the basic goals, process goals and ultimate goals that data mining aims to achieve.

The result of the verification may be that the "specified requirements" are fully met or not met at all, as well as other satisfaction levels in between. Verification can be done by the person doing the data mining themselves, or it can be done through the involvement of others or entirely through someone else's project, in a way that has nothing to do with the data miner.

In general, it is impossible for data miners not to participate in the verification process. However, the collection of objective evidence and the evaluation of the identification process are often more objective if they are carried out by people who are not related to the person proposing the verification.

By verifying the results, data miners can get an assessment of the value of the data they have mined.

Data mining methods include supervised learning, unsupervised learning, semi-supervised learning, and reinforcement learning. Supervised learning includes: classification, estimation, and prediction. Unsupervised learning includes: clustering and association rule analysis.

6. Examples

Application of data mining in the retail industry: A retail company tracks customer purchases and finds that a certain customer has purchased a large number of silk shirts. At this time, the data mining system establishes an association between this customer and the silk shirts.

The sales department will see this information and directly send the current market price of silk shirts and all the information about silk shirts to the customer. In this way, retail stores can discover new information about customers that was previously unknown through data mining systems and expand their business scope.

7. Data Fishing

Often considered a technology related to data warehousing and analytics, data mining sits somewhere in the middle.

However, sometimes there are very ridiculous applications, such as discovering non-existent but seemingly exciting patterns (especially causal relationships). These irrelevant, even misleading, or worthless associations are often jokingly referred to as "data dredging, data fishing, or data snooping" in statistical literature.

Data mining means scanning the data for any possible relationships and then filtering out patterns that match (also called "overmatching patterns"). There are always coincidental or specific data in large data sets that have "exciting relationships."

Therefore, some conclusions seem highly doubtful. Nevertheless, some exploratory data analysis still requires the application of statistical analysis to find data, so the boundary between good statistical methods and data is not very clear.

The danger is the emergence of correlations that do not exist, and investment analysts seem most prone to this mistake.

In a book called Where is the Customer's Yacht? 》Wrote the book:

"There are always a fair number of poor souls who are busy looking for possible repeating patterns in the thousands of spins of the roulette wheel. Unfortunately, they usually find them."

Most data studies focus on discovering highly detailed patterns in large data sets.

Author: Li Hang

Source: Li Hang

<<:  Event operation and promotion: How to plan the core gameplay?

>>:  2021 Jia Kun Vocabulary Tour: 6500 Compulsory Words for University Core

Recommend

[Li Jiaoshou] How to conduct consumer research without a large budget?

Consumer Survey Regarding consumer surveys, one o...

PPI April 2021 iPad Illustration Course [HD Quality] Course Catalog

01Introduction to Software Basics.mp4 02 Line dra...

B station operation and traffic diversion strategy!

Let me first briefly introduce Bilibili, or B Sta...

Xiaohongshu promotion: Xiaohongshu mini program product analysis!

The development of mini programs provides support...

Which one is more important, conversion or traffic?

Why conversion is more important than traffic Fro...

Recommend Chengdu tea tasting 90 minutes unlimited studio Bashi board

Recommend Chengdu tea tasting 90 minutes unlimite...

How much does it cost to create a flash sale mini program in Qian'an?

According to industry insiders, mini programs wil...

WeChat Mini Program Analysis Report

Mobile application product promotion service: APP...

3 counter-common sense points in channel cooperation

Recently, I met with several newly appointed pres...

Brand marketing: brand positioning skills!

In the recently popular TV series "Don't...

How much does it cost to be an agent for Wuwei Ticketing Mini Program?

How much does it cost to be an agent for a ticket...