Wang Chuanpeng of Sina Weibo: The Evolution of Weibo Recommendation Architecture

Wang Chuanpeng of Sina Weibo: The Evolution of Weibo Recommendation Architecture

[[150470]]

Wang Chuanpeng, Sina Weibo Recommendation and Advertising Technology Director

Wang Chuanpeng, special guest of WOT Summit, was a special lecturer of the 7th WOT Mobile Internet Developer Conference and one of the co-producers of this "Internet +" Era Big Data Technology Summit. He graduated from Beihang University in 2006, and then joined Honeywell Beijing Research Center as an engineer. Later, he founded Cloud Storage Network Hard Drive (99 Disks) with his partners. After the company was acquired, he joined Dangdang.com to be responsible for recommendation and advertising. In 2011, he joined Sina Weibo Business Product Department, responsible for recommendation and advertising until now.

introduction

Weibo is a broadcast social networking platform that shares short, real-time information through a follow mechanism. Weibo users subscribe to content by following. In this scenario, the recommendation system can be well integrated with the subscription distribution system to promote each other. Weibo has two core basic points: one is user relationship building, and the other is content dissemination. Weibo recommendation has been committed to optimizing these two points to promote the development of Weibo. As shown in Figure 1:

Figure 1 The mission of Weibo recommendation

In the process of Weibo recommendation development, the system direction has changed, the business has been constantly upgraded, and the goals have been re-established. Its product ideas, architecture, and algorithms have also changed accordingly. This article mainly explains the evolution of the recommendation architecture in this process, presenting readers with a complete development context from the dimensions of product goals, algorithm requirements, and technology development. At the same time, we also hope to use this opportunity to discuss the relationship between business and technology with everyone.

In order to understand the evolution of Weibo recommendation architecture, we need to explain the composition of Weibo recommendation process before introducing it. In fact, this has nothing to do with Weibo itself. In theory, the process of recommendation in the industry is basically the same. As shown in Figure 2, recommendation is to solve the relationship between users and items, and recommend items that users are interested in to him/her. Then, when an item is recommended, it will go through candidate, sorting, strategy, display, feedback, evaluation, and then change the candidate, forming a complete loop.

Figure 2 Recommended links

Based on the above overall process, the Weibo recommendation architecture has gone through three stages as shown in Figure 3:

Figure 3 Three stages of Weibo recommendation architecture

Usually, the architecture comes from the team and business environment. It is based on environmental factors and is committed to solving problems in the environment. The architecture is formed with strong characteristics and will produce targeted effects during its implementation. This article will explain the three stages of Weibo recommendation from the three aspects of environmental factors, architecture composition and characteristics, and implementation effects.

1 Standalone 1.0

1.1 Environment

Environmental factors that affect the formation of architecture can be divided into internal environmental factors and external environmental factors. Internal factors are mainly related to the team and its members, while external factors mainly come from external departments, the entire company or the entire industry.

Weibo Recommendation 1.0 was in use from July 2011 to February 2013, and its main goal was to meet current business needs. Explanation of the independent model: Each business project is a complete set of architecture processes, and the architectures are relatively independent, even including the technology stack. There are several reasons why it is called independent:

1) The team was new at the time, and its members were relatively new. They did not collaborate much with each other and lacked overall experience in the recommendation field.

2) Team members all have their own understanding of the recommendation architecture, but there is no consensus on the Weibo recommendation architecture in the current scenario.

Of course, the decisive factor is the external environment, because internal factors are easier to coordinate and evolve. The external environmental factors at that time included:

1) There were many project requirements. At that time, a team of 5 people developed 3-5 projects in parallel on average. Of course, the most important factor was that the Weibo product was in a period of rapid development at that time, and many places needed the support of Weibo recommendations. At the same time, the project cycle was also very short and the schedule was rushed, so it was difficult to have time for detailed sorting and abstraction. Typical products include: micro-bars, micro-groups, micro-magazines, micro-topics, users, content ranking, etc.

2) The team is supportive and most of the demands come from external teams. The different product directions of each external team also make it difficult to cope with the demands.

3) At that time, the recommended architectures in the industry also had different development directions, and everyone was trying to explore some architectural ideas that were suitable for their own development.

Due to the reasons mentioned above, when we face one project after another, we usually use familiar technology stacks to build processes based on our own understanding, thus forming independent architectures one after another.

1.2 Architecture composition and characteristics

The reasons for the formation of an independent architecture were mentioned in the previous section. You may think that it is unnecessary to describe the composition of the architecture. This is wrong. In fact, the foundation of the subsequent layering and platform architecture all originated from this stage. Without the team's continuous pitfalls and summaries at this stage, there would be no subsequent evolution based on local conditions. Therefore, we need to analyze the composition and characteristics of the architecture of Recommendation 1.0.

1) Technical goals

As shown in Figure 2, Weibo Recommendation 1.0, which is mainly aimed at business realization, has not established a complete feedback and evaluation system, and the ranking is also replaced by strategy. Therefore, the main focus is on candidates, strategies, and presentation. The above recommendation process is transformed into a simple form of: candidates  strategies  presentation.

2) Architecture composition

As shown in Figure 4, we try to express the architecture of each project in the figure. In the actual implementation process, each project leader will choose to use apache + mod_python as the service architecture and redis as the storage option. In some specific projects, complex operations were introduced to give birth to the c/c++ service framework woo; at the same time, a series of dbs were developed in projects with specialized data storage requirements, such as mapdb for storing static data in the early days, keylistdb for storing key-lists, etc. Of course, the deployment will be more casual than the figure below. A project deploys Weibo services on several servers to provide http requests, and then finds several servers to install redis as data support. The source data and the business party set rules and use rsync to transfer. Most strategies are implemented in python.

The main technology stack can be seen in the figure:

  • Web service: apache + mod_python, which later developed into the more complete mod_wsgi in the community. Python is used as the WEB development language mainly because it is usually used to process data, and it is easy to get started and has a gentle learning curve.
  • Operation service: c/c++, forming the internal service framework of woo
  • db: redis/mapdb/keylistdb, etc., divided into two storage methods: redis and self-developed
  • Data source: rsync file transfer, firehose as the source of Weibo-related content [a data queue used internally by Weibo]

Figure 4. Simplified diagram of Weibo Recommendation 1.0 architecture

3) Architecture Features

Divide the architectural features into advantages and disadvantages and describe them. The advantages are:

  • Simple, easy to implement, no need for additional foundation support
  • Quickly implement business-friendly functions
  • It is conducive to the parallel development of multiple businesses without affecting each other.

The shortcomings are:

  • The recommendation process is incomplete, lacking important content such as feedback and evaluation, and there is also an extreme lack of unified processing methods for data.
  • Without support for the algorithm, it is difficult to make in-depth recommendations.
  • Almost impossible to carry out professional operation and maintenance
  • QA testing can only reach the functional level. Module-level testing is almost impossible because it is too fragmented.
  • It is difficult to collaborate in a team, which is not conducive to project decomposition

1.3 Results

Despite its many shortcomings, its development also laid the foundation for subsequent architecture optimization, and its achievements are as follows:

1) During the rapid development of Weibo, we met Weibo’s business support requirements for recommendations and completed more than 20 independent projects during this period.

2) The basic framework of Woo was born, and the subsequent internal efficient computing framework originated from this

3) The static storage of mapdb was born, which became the prototype of static storage recommended by Weibo later.

4) Summarize the continuous needs of the web application layer and form a recommended general application framework

#p#

2 Layered 2.0

After introducing the independent 1.0 in the previous section, we came to a fork in the road following the path of architecture development. On one side is the popular LAMP architecture, and on the other side is the CELL architecture that is suitable for advertising and search. The LAMP architecture separates data and strategy, and the scripting language is the main language for business development, making it the first choice for rapid project development and iteration. The CELL structure emphasizes local process processing, with strong coupling between data and business. There are many self-developed services and databases, which are suitable for high-performance effect products. In the end, we chose an architecture system that is compatible with both and tends to be business-oriented. Why is this so? Let's take a look at the environment at the time.

2.1 Environment

The time period of Weibo Recommendation 2.0 is from March 2013 to the end of 2014. The internal environmental factors during this period are:

1) The current team members have been working together for a long time, are familiar with each other, and have reached a certain consensus on technology selection.

2) The team focused on products and organized recommendations into three categories: content/user/vertical. At the same time, the scenarios were divided into key areas: feed flow, text page, and the right side of the PC homepage. This focus is conducive to the unification of the architecture and also buys time for technology.

The external factors are:

1) The company has a clearer positioning for recommendations, which improves the efficiency of relationship building and content dissemination, while laying a solid foundation for technical exploration, scenario intervention, and user experience for recommendation-based advertising.

2) In the recommendation field, various companies have produced architectures, which have provided good guidance for Weibo recommendations.

2.2 Architecture composition and characteristics

While executing core business implementations, the team continued to evolve tools and frameworks, and the goal of building 2.0 was imminent.

1) Technical goals

Unlike 1.0, simply meeting business needs is no longer the technical goal of 2.0. For the complete recommendation process, we need to solve the following problems:

The first step is to implement a complete recommendation process, with an architecture covering candidates, sorting, strategy, display, feedback, and evaluation.

With data as the first priority, we refine the data architecture. We realize data comparison, and the effect is based on the data; we realize data channels, reflect feedback; we realize data landing, and undertake business needs.

Provide a convenient way for algorithms to intervene.

It can not only ensure the rapid iteration and development of the business, but also support efficient computing.

2) Architecture composition

The architecture of Weibo Recommendation 2.0 is shown in Figure 5. It is no longer a separate system, nor does it require developers to use different technologies to solve similar problems. This architecture diagram mainly includes the following parts:

Application layer: mainly responsible for recommendation strategy and presentation, its feature is to give full play to the characteristics of scripting language to respond to iterative needs. Most of the recommended content can be displayed after sorting, but since the setting of front-end product strategy requires integration, deletion and rearrangement operations, this layer is needed to complete it, which is IO intensive at the technical level. In terms of technology selection, the framework was developed on the basis of the original apache + mod_python in the early stage to produce common_recom_frame. This framework is aimed at secondary developers, and based on this framework, the recommendation business process can be well implemented. The core idea of ​​this framework is to extract the three-layer interface of project, work and data. Project is for each recommendation project, work is for different recommendation methods in each recommendation project, and data is the access method for managing downstream data. At the same time, two specifications are set: one is to unify the recommendation interface, whether it is user, content or vertical business; the other is to shield the database access methods of different protocols, which greatly improves the development efficiency. The birth of the common_recom_frame framework basically solves the various recommendation strategy requirements of the product and walks ahead of the product.

Figure 5 Schematic diagram of Weibo Recommendation 2.0 architecture

Computing layer: It is mainly responsible for the recommended sorting calculations and mainly consumes CPU. At this layer, intervention methods are provided to the algorithm to support the model iteration of the algorithm. In the technical selection of this layer, we inherited the original WOO protocol framework, an internal efficient communication framework developed based on c/c++. Of course, we have also made a lot of extensions, still borrowing the ideas of the common_recom_frame mentioned above, and realizing the management of project/work/data based on the WOO framework, providing secondary developers with more efficient development tools. This tool is included in the team's open source project: https://github.com/wbrecom/lab_common_so

Data layer: mainly responsible for the recommended data flow and storage. The work of the data layer is mainly to solve the IN/OUT/STORE problem of data. Among them, IN data enters the system, OUT indicates how to access data, and STORE indicates how to store data. When planning the data layer, the data characteristics of Weibo recommendations were analyzed, and they can be divided into two categories: static and dynamic. The definition of static data is: large-scale data that needs to be updated in full and has a low frequency; the definition of dynamic data is: incremental data that is dynamically updated and has a high frequency. In this way, under the general direction of IN/OUT/SOTRE, static and dynamic data are treated differently at the same time, resulting in tools or frameworks such as RIN/R9-interface, redis/lushan, and tmproxy/gout agents. Let me elaborate here. RIN supports the access of dynamic data, receives data through web services, uses ckestrel for queue management in the backend, and is supplemented by a consumption framework of multiple service clusters. Users only need to develop their own business to quickly go online and consume dynamic data. R9-interface handles the access of static data. A large amount of static data for recommendation comes from the calculation of Hadoop cluster. The r9-interface framework can solve the notification, management and data loading of static calculation [MR, HIVE SQL and SPARK calculation]. For the storage of recommendation data, redis cluster is used for dynamic data, and lushan cluster is used for static data. The lushan tool is also included in the team's open source project: https://github.com/wbrecom/lushan. tmproxy/gout is used to solve the data OUT problem. gout is a proxy middleware used to handle the needs of dynamic and static combined access to data in recommendation, reducing the impact of business on backend data changes.

Basic services: The basic services of the recommendation system mainly include monitoring, alarm and evaluation systems. The data monitoring system is divided into two categories: performance and effect monitoring. The evaluation system is mainly used for offline evaluation, to have certain expectations for the effect before going online and to reduce invalid online. Figure 6 shows the UI of the basic service.

Figure 6 UI of the basic service system

3) Features

The advantages are:

  • Support the complete recommendation process and have a unified approach to data processing
  • While taking into account the rapid realization of business functions, it also ensures the continuous deepening of effect technology
  • Provides good support for the algorithm
  • The idea of ​​putting data first can comprehensively compare the results, and the recommendation effect can be continuously improved.
  • Sealing system is easy to deploy and test with QA involvement

The shortcomings are:

  • There is a certain distance from the core of the recommendation, and it is not completely tailored for the recommendation.
  • The recommended strategy algorithm is completely handed over to the developer, which is not conducive to recommending universal
  • The algorithm training is not involved. It is just an online delivery system, which is not enough to constitute a complete recommendation system.

2.3 Results

The birth of Weibo Recommendation 2.0 has produced good benefits, and its achievements are as follows:

1) The core business of Weibo recommendation is completed under this system: main text page recommendation, trend user recommendation, trend content recommendation, user recommendation in various scenarios, fan economy vermicelli, account recommendation and other products

2) The basic framework of lab_common_so was created and made open source

3) Lushan, a static storage cluster solution, was created and made open source

4) The birth of the RUF framework has greatly improved business production efficiency and also made certain contributions to the openresty community.

#p#

3. Platform 3.0

In the previous section, when describing 2.0, we mentioned an important shortcoming: "It is somewhat distant from the core of recommendation and is not completely tailored for recommendation." We hope to solve this problem in recommendation 3.0. What problems will this shortcoming bring? Why does the recommendation architecture move forward again when it has already met business needs? So, before we show you the 3.0 design of Weibo recommendation platform, let's first look at the environment we are in.

3.1 Environment

The time period of Weibo Recommendation 3.0 is from the end of 2014 to the present. The current internal environmental factors are:

1) Recommend products that are no longer expanding and that place more emphasis on results, shifting the focus of work from business development and iteration to technology iteration with results as the goal.

2) When launching new projects or iterating recommended businesses, we find that there are many repetitive tasks, but the architecture is not resolved and there is redundancy in the work.

The external factors are:

1) The company has also shifted its focus from business expansion to efficiency first, improving user experience and content quality.

2) Weibo recommendation is still a certain distance away from the field in terms of recommendation technology, but there are conditions to catch up now.

3.2 Architecture composition and characteristics

The current environment also reflects the technical goals of 3.0:

1) Technical goals

Unlike 2.0, full coverage of the recommendation process is no longer the goal of 3.0. Its goals are:

Abstract the common methods for candidate/ranking/training/feedback in the recommendation process

Recommendation is an algorithmic data problem. The recommendation system should be built from an algorithmic perspective, so it is necessary to be closer to the algorithmic strategy.

2) Architecture composition

As shown in Figure 7, this is the architecture of Weibo Recommendation 3.0, which is also the currently implemented architecture system. You can actually find that this is developed based on 2.0, since it still retains a lot of layered systems and tool frameworks used in 2.0. Here are a few key differences:

Two standards: one is for the application layer. As the output of the overall framework, the application layer sets the all-in-one interface standard, which includes input and output parameters; the other is for the dynamic input RIN. Since we can determine the structure in offline calculations, an input layer tool R9-interface does not need to set specifications, but RIN needs to be set with standards, divided from the aspects of attributes/interaction data/logs, etc.

The computing layer adds standard candidate generation methods: Artemis content candidate module, item-cands user candidate module, etc. In project development, you only need to select these candidate generation methods.

The strategy platform EROS was added to solve the problem of algorithm model. The main functions of EROS are: 1) training model 2) feature selection 3) online comparative testing.

The r9-interface and rin in the data layer add methods for generating candidates, and use recommended general strategies to generate results online and offline.

Figure 7 Schematic diagram of the architecture of Weibo Recommendation 3.0

3) Features

Mainly describe its advantages:

Inherits the features of the original 2.0 and retains its advantages

A deeper understanding of recommendations and a closer integration

Solved the most important problem of recommendation candidate/ranking/training algorithm

3.3 Results

The birth of Weibo Recommendation 3.0 has the following achievements:

1) The core business of Weibo recommendation will be gradually migrated to this system, driven by algorithm data to improve the effect

2) The EROS training process was born and a standard training method was proposed

3) Standard input and output methods are set for recommendations

4) For the candidates, a set of recommended methods with abstract meaning is generated

4 Conclusion

The above article gives a relatively detailed introduction to the evolution of Weibo recommendation architecture. During this evolution process, the team and individuals have benefited greatly, and the relationship between technology and business has been well reflected in the architecture. There are a few points I would like to share with you:

1) Technology originates from business and promotes business development, while business development in turn promotes the advancement of technology. They are mutually influential and mutually reinforcing. Only technology that develops together with business is viable.

2) The recommended technology architecture selection method is to find the current shortest path and then continuously optimize and iterate. It is unrealistic and unreasonable to try to do everything at once.

3) The best way to promote a framework or tool is not by administrative order or treating people to a meal, but for everyone to be a participant, just like an open source project, everyone is its owner, so that everyone maintains it and everyone uses it.

4) The team advocates simplicity and reliability. It is easier said than done, but a good way is to understand what you should not do, rather than what you should do.

5) When it comes to the special field of recommendation, it is important to set and track goals. Once the data and goals are laid out, products, architecture, and algorithms will find ways to solve the problem.

Finally, I would like to recommend the official blog of Weibo Recommendation: http://www.wbrecom.com/. Everyone is welcome to make suggestions and recommendations. Thank you for your concern and love for Weibo Recommendation and Weibo, thank you!

<<:  2015 Cocos Game Development Competition ended successfully with numerous outstanding works

>>:  The complete guide to RxSwift that all the masters are reading

Recommend

10 operational activities to teach you: How to take advantage of the World Cup

The World Cup period is a good time for major pro...

Decoding the consumer psychology used by Pinduoduo in its promotions

You know exactly what to do to succeed, and then ...

Snapdragon runs Win10. Can the love between Microsoft and Qualcomm last long?

At the Microsoft WinHEC conference held in Shenzhe...

APP can no longer produce dark horses. All fields are becoming saturated.

Not long ago, the industry announced the third qu...

7 simple ways to quickly understand user dads through online data

No matter what kind of marketing we do, we need t...

Which is better, mini program or micro mall? What are the specific differences?

Which one is better, mini program or micro mall? ...

How to collect data more valuable and analyze data efficiently?

Last time we talked about the significance and va...

2015 App Promotion Guide (Full Version)

Online channels 1. Basics are online The major mo...

Product operation: How to truly operate and promote APP for free?

In the process of dividing up channel resources a...