Without data accumulation and user portraits, this is how I developed Toutiao products...

Without data accumulation and user portraits, this is how I developed Toutiao products...
Toutiao , which had been rowing quietly and attracting no attention when talking about personalized recommendations at exchanges, has undoubtedly been besieged by the entire BAT . Companies in the content field unconsciously regard Toutiao as a competitor, and Internet companies outside the content field also want to get a piece of the content pie. Overnight, the Internet is full of feed streams, and people are embarrassed to get on the table if they don't talk about content recommendation algorithms. The author had the honor of planning the Toutiao product from 0 to 1, and would like to share his practical experience. I would be happy if it helps friends who are interested, and I hope to receive criticism and correction from industry leaders. After all, it is still very dangerous to grope forward alone. 1. Clear positioning My biggest feeling after frequently using reading products is that it is easy for large platforms to have information that lacks depth. Vertical content information is only good in a few fields such as technology, the Internet, etc. My idea at the time was whether it was possible to provide in-depth information within the industry, especially those industries that have not been deeply Internet-based at the beginning. Through pilot projects in one industry, I could form industry headlines, accumulate high-quality industry knowledge, and replicate it to other industries at the lowest cost. After thinking for a long time, I started to report to my boss, saving 10,000 words of specific persuasion process. Finally, he agreed. Because a company in the team has an intersection with a traditional industry A, the industry we started from was industry A. Now we started to implement it. Looking at the more than 10 technicians in total, I fell into deep thought... The disadvantages are not too obvious: 
  • No data accumulation;
  • No user profile ;
 
  • No one on the team has worked in Industry A.
 I'm going to start making headline products... 2. Overall design of Toutiao products I started to build the product from three levels: the bottom type labeling layer, the middle data capture and analysis layer, and the top business application layer. Bottom layer type tag The bottom layer is sorted out according to specific industries. Originally, this process should be sorted out by the products and practitioners in the specific industry. However, due to limited resources, I will do it. It is definitely not detailed enough, but it can be run at the beginning. The bottom type label layer is divided into types and labels. Types are hierarchical, and the database is reserved to level 7. In practice, it is almost enough to sort out to level 3. For example, in industry A, company A is a first-level type, manufacturing company in industry A is a second-level classification, and the specific manufacturing company name is a third-level type. Each type is built independently, and a large number of labels are associated with the type in each table. For example, in the industry A technology type, we find the industry A technology terminology dictionary, and after deleting, it is associated with the A technology type as a label. In the end, the number of types sorted out was more than 600, and the number of labels was more than 100,000. The database reserves status bits, which can be enabled or disabled as appropriate. Middle-level data capture and analysis layer The data crawling and analysis layer is divided into crawler deployment, content source processing, and data classification. 1. Crawler deployment From the perspective of a technical layman, I divide crawlers into two categories. One is non-directional crawlers, which are all separate websites. This technology is very time-consuming and needs to be processed one by one, such as the official website news center of each A industry company and the industry A platform website, which need to be processed separately. The other type of directional crawlers are mainly large information platforms with search functions, such as Today's Headlines. The code is reusable. After writing it, I directly created a table to store the keywords of the search crawler. A bunch of keywords can be implemented with one set of code. Just enter it and the news containing these keywords will be captured. Now there are more than 700 keywords in this table. The amount of crawled content is too large, so it is recommended to use mongedb to process it. 2. Content source processing After the data comes in, we first sort out the sources, divide them into high-quality sources and junk sources, and increase the weight of high-quality source content. High-quality sources are mainly the official websites of various companies. Junk sources refer to a large amount of meaningless content from the same source for a specific industry. In this case, it will be identified as a junk source. For example, a source called xx talking about cars is identified as a junk source in the construction industry, but when it is copied to the automotive field in the future, it will no longer be a junk source. Junk sources are a long-term job, and there are now about 700 of them. Most of the junk sources are headlines from Toutiao. 3. Data classification After filtering out the spam sources, we started to classify the data. In essence, we classified the news content into the types we established. Since we were working on industry news, we hoped that the data accuracy would be high from the beginning. I thought of two solutions at the time. The first was to build models based on the weights of the types according to the massive tags associated with them, and perform full-text word segmentation on all captured articles. We counted the word frequencies of a large number of articles, and there was a total frequency value for all the segmented words in each article, which was compared with the type model to take the ones with higher relevance. The other method was to compare the tags under the type with all the articles that had been filtered out for spam sources, and classify the articles with the tags into the types to which they belong. The more tags of the same type, the more relevant the article is. In order to go online quickly, we used the second solution, but relatively speaking, the accuracy was a bit worse. Of course, with the intervention of manual work, a series of spam sources were screened out, and the type and tag maintenance work continued, the content accuracy improved. Top-level business application layer The business presentation layer mainly sorts out the keywords that the target users are interested in, and associates these keywords with the types in the type tag layer. In this way, after the user subscribes to the keyword, he can see the content to which the keyword belongs. The front-end now has two products online, a subscription platform and industry headlines, which are matched with the back-end management center. 1. Subscription Platform The subscription platform is semi-closed and is aimed at corporate users in Industry A and self-media practitioners in Industry A. It releases keywords that they are interested in and has higher content accuracy. Corporate users can subscribe to keywords and see related information. After seeing the capabilities of the platform, they will want to customize more keywords. After background review, crawlers will continue to be deployed to push data to users, while recording all user behavior data. 2. Industry headlines Industry headlines are completely open to prospective industry practitioners and general industry enthusiasts, releasing more keywords. However, compared with subscription platforms, the content quality is slightly lower, but the target users are wider, so we hope to record all user behavior data (such as comments, reading volume, changing a batch of events, paying attention to keywords, etc.), obtain user feedback, and establish user portraits, so as to achieve the effect of recommending keywords based on different user portraits and prepare for real recommendations. 3. Backstage Management Center It includes news management, source management (high-quality sources, junk sources), type/tag management, user behavior management, push management, keyword review scheduling management, comment search management, etc. I will not go into details here and will introduce it in detail when I have the chance. I will simply draw a picture to organize the product framework and combine it with the above discussion, which may be easier to understand. 

(Note: Infringement will be prosecuted)

 3. To my colleagues Don’t try to create another Toutiao all the time. If your experience and algorithm are not more than 50% better than Toutiao, there is basically no chance of a head-on confrontation. Find your own entry point and recognize your own advantages. Content recommendations are always dangerous. If you recommend something when users don’t need it, it will be a minus unless the user is pleasantly surprised. Users will have to endure products that they must use, and dispensable products will most likely be uninstalled by users. Friends who run public accounts must have felt this deeply, and they are afraid of losing followers every time they push content. Since I have always been interested in search, I will briefly explain my suggestions on what content I want to create for input method products. Users have their own needs for information: 
  • Active acquisition: RSS crawling (Google subscription), follow/subscribe (immediately)
 
  • Passive acquisition: platform recommendations (traditional portals, news websites), vertical media information (36K, Huxiu, etc., recently Feng Dahui 's readhub), personalized recommendations (Toutiao, Yidian Zixun )
There is extremely fierce competition for this type of demand, and there is another type of demand based on the demand for information in specific scenarios. For example, when looking for a job, you want to know more about a certain company; when eating, you want to know more about nearby restaurants. This type of demand is particularly long-tail. How is it currently being met? Actively search on Baidu, Zhihu and other platforms, but the path to get the information you want is very long. For example, when you are having dinner with friends and you want to know which good restaurants are nearby, the cost of searching is extremely high. Where does this scenario happen frequently? Time to chat and inquire! This is exactly where I think the input method has an opportunity to tap into information. Specifically: 
  • When you are chatting with others about changing jobs, and you are talking about a certain company, there will be a prompt (such as color change) when you input, which can conveniently push the latest information of the company;
  • Chat and make dinner reservations, and push nearby restaurants and reviews;
 
  • Tell your boyfriend that you want to buy the same items as Zhao Liying, so your boyfriend can easily see the information about these products;
 Input method companies should have enough accumulation of the data behind these demands and the frequency of word occurrence. They can prepare content based on the word frequency and give users an unexpected surprise when they are typing something, so as to achieve the purpose of information recommendation. I hope that friends who are engaged in the input method field can give guidance.

Mobile application product promotion service: APP promotion service Qinggua Media advertising

The author of this article @小呆 is compiled and published by (Qinggua Media). Please indicate the author information and source when reprinting! Site Map

<<:  How much does it cost to develop a check-in mini program in Zhumadian?

>>:  GuangDianTong optimization skills, GuangDianTong delivery algorithm, how to promote GuangDianTong?

Recommend

Teacher Cai Danjun's "Outline of Reading Dream of the Red Chamber"

Teacher Cai Danjun's "Outline of Reading...

How to operate Douyin on behalf of others? Tik Tok operation plan

The three major rules of Douyin Enterprise Accoun...

How to operate group buying activities!

Almost all platforms now have “group buying” as p...

Dragon Ball Collection Collector's Edition

For the die-hard fans of Dragon Ball, it is not e...

Operating Douyin Enterprise Account, these 7 tips are worth learning

With its powerful ability to bring goods, Douyin ...

How do community apps stimulate users to produce high-quality content?

The product chain of a community product is: &quo...

How to promote and operate App?

In recent years, mobile Internet has developed ra...

Long Juan's essential image management course for sophisticated women

Through 20 lessons of "Essential Image Manag...