Building a data-based operation system from scratch

Before understanding data-driven operations , have operators ever had the following questions:

Are different channels good or bad?
The number of active users has dropped. What is the reason?
How effective was this promotion?
The version has been released, do users like it?
We always talk about dissemination, but how big is the dissemination?

These are the problems that products and operations encounter every day, every hour, and every moment. Data-driven operations are actually based on solving these problems. It has never been exclusive to BAT, nor has it been the exclusive domain of big data. Every Internet company has suitable soil for data operations. The data operation system is the collection and application of data analysis , and also a data-first strategy. It is not only the job of operators, but also the common vision of products, markets and R&D. From a management perspective, it is a top-down push. If the leader does not pay attention, then no matter how well the executors use the data, they will only be half-hearted. How to build a data-driven operation system? The following are my summary thoughts. I divide the data-driven operation system into a four-layer architecture. Each layer of the architecture evolves gradually and depends on each other, and each layer is indispensable. The four layers are: data collection layer, data product layer, data operation layer, and user reach layer. It is a framework based on the perspective of operators. Data Collection Layer The underlying foundation of the data-driven operation system is data collection, and data is the oil of the entire system.

The core of data collection is to collect all the data as much as possible. It has two principles:

It’s better to collect data sooner rather than later : This means that you need to consciously collect data from the product creation stage, rather than waiting until the company develops to the B or C round. Data-driven operations are implemented throughout the entire product stage, with different operation methods at different stages.
It is better to have complete data rather than less data: it means that there are only inappropriate data but no bad data. Data such as historical data, change records or details all have value.

Here is an example: There is a financial product whose credit reporting system will record user behavior in detail. When users upload guarantee information when borrowing, the system will record the user's operation steps and time on these pages. There is an assumption here that ordinary people must be cautious when uploading guarantee information. If this step is completed very smoothly and quickly, it is very likely that people will default and owe money: You are so good at it, are you trying to make a fortune? This is a skilled work case. The credit reporting system will use this data as characteristics to determine risks. The data that needs to be collected can be divided into four main types: behavioral data, traffic data, business data, and external data. Behavioral data It is a collection of user operations on the product, recorded in chronological order. Users opening the APP, clicking on the menu, and browsing the page are behaviors; users collecting songs, playing songs on a loop, and fast-forwarding and skipping songs are behaviors. The core of behavioral data is to describe which user completed which type of operation at which point in time, in which place, and in what way. We can use it to analyze user preferences, length of time spent on the page, frequency of browsing, and whether or not they like the page. On the other hand, user behavior is also the basis of the user operation system. Different levels are divided according to different behaviors, such as purchasing, commenting, replying, adding friends, etc., and the stratification of core users, important users, ordinary users, and potential users is defined. Behavioral data is collected through tracking technology. There are different ways to implement tracking points, but the content of the collected data is the same, with user ID, user behavior, and behavior timestamp as the main fields. Draw a simplified model using a table:

useID is used to identify the user's unique identity. It is used to determine who the user is and can be thought of as an ID number.
Active refers to the behavior of specific operations, which needs to be set and defined at the technical level.
Timestamp is the time when the behavior occurs. Here I am only accurate to minutes, and it is usually accurate to milliseconds.

The user's behavior record should be detailed, such as what pages were browsed and what elements were on the page at that time (because the elements are dynamic, such as prices). It is in a semi-structured NoSQL form, which I have simplified here. Sometimes, for technical convenience, behavioral data will only be collected on the pages that users browse on the product, and operations such as clicks and swipes will not be recorded. It is a compromise method. In addition, behavioral data also records more detailed information such as user device, IP, geographic location, etc. The screen widths of different devices are different. Will there be differences and impacts on user interaction and design experience? How can we analyze it? This is also one of the applications of data-driven operations, and it reflects the idea that it is better to be complete rather than less. Traffic data Traffic data is the predecessor of behavioral data and is a concept that emerged in Web 1.0. It is generally used for recording on the web page side, and behavioral data on the product side. The biggest difference between traffic data and behavioral data is that traffic data can tell where users come from, whether through search engines, external links or direct visits. This is also the basis of SEO , SEM and various channel marketing . Although this is the mobile era, traffic data from the Web era is not outdated. For example, the content of WeChat Moments are all HTML pages, and activity operations need to be based on this statistical effect. We can regard it as a type of traffic data. In addition, many products are composite frameworks of native + Web, and the built-in activity pages are mostly implemented through the front-end, which is considered both behavior and traffic data. When we send the activity page to the circle of friends, the corresponding statistics can only be collected based on the front-end traffic data. Traffic data is generated based on the web pages visited by users. The main fields are user ID, user browsed page, page parameters, and timestamp. The simplified model is as follows:

The url is the page we visited, recorded in the form of ***.com/***
param is the parameter that describes this page. Our search and attribute information on the page will be recorded in the form of parameters.

Like behavioral data, if traffic data requires more detailed statistics, it is better to use semi-structured data, including operation records. It is a good friend of event and content operations . The conversion rate of the event, the number of readers of the article posted to the circle of friends, etc. are all recorded as traffic data. Mainly collected through JS. The statistics of traffic data are already relatively mature. Google Analytics and Baidu Statistics are both well-known third-party tools and are the most commonly used. However, they do not support private deployment and can only provide statistics. I know that 100 people visited this page, but I cannot locate who these 100 people are, and the data cannot be recorded in the database, which is a hassle for data-based operations. If there are reliable and advanced technical means, we will be able to unify behavioral data and traffic data together. This is the trend of the future. ▍Business data Business data is generated along with the business during the product operation process. For example, for e-commerce products, if I held a promotion, how many users received coupons, how many coupons were used, and on which products were the coupons used? These data are closely related to operations and cannot be explained by behavior and traffic, so they are classified as business data. Inventory, user express delivery addresses, product information, product reviews, promotions, friend relationship chains, operational activities, product functions, etc. are all business data. The business data of different industries are different, and business data has no fixed structure. Business data needs to be configured by back-end R&D. Since the structure cannot be universal, it is best to inform R&D staff in advance and state your requirements. Behavioral data, traffic data, and business data constitute the three pillars of data sources. Collectively referred to as raw data, it means data that has not been processed in any way. External Data External data is a special type of data that is not generated internally but obtained through third-party sources. For example, on WeChat public accounts , after users follow us, we can obtain their region, gender and other data. For example, Alipay ’s Sesame Credit will be used by many financial products. There is also public data, such as weather, population, and national economic indicators. Another way to obtain external data is crawling. We can crawl Douban movie ratings, Weibo content, Zhihu answers, and real estate information for our own use. It is impossible for a third party to support your acquisition, and there will often be anti-crawler mechanisms. It requires certain technical support and is not a stable and easy source. Because the quality of external data is difficult to guarantee, it serves more of a reference and does not have the same huge impact as internal data. These four types of data constitute the cornerstone of data-driven operations. As the level of dataization of Internet companies increases, more and more data can be used. Data structures have gradually evolved from SQL to NoSQL; information sources have become richer, with more and more graphic and sound data; technology has evolved from a single server to distributed; and responses have evolved from offline batch processing to real-time streaming, all of which are challenges for data collection. After we have the data, we move to the next layer, the data product layer. Data product layer Data products are the processing and utilization of data. They belong to the category of technology and automation, and the raw data is processed by computers. It is not a data product in the traditional sense (such as an advertising system), but its purpose is to maximize the value and productivity of data. It can also be understood as a product that processes data.

The raw data cannot be used directly for operations and is usually dirty and messy. We need to integrate and process it according to certain standards. For example, behavioral data and traffic data, a user sees an event on WeChat Moments and thinks it is good, so he downloads the APP, registers and participates in the event. The behavioral data and traffic data here are completely independent. Browsing WeChat Moments records the user's weixinOpenId and cookies, while after downloading, the userId used within the product is recorded. The two cannot correspond, which requires data integration to map (mapping) cookies, mobile phone numbers, userId and other information to the same person. This is data cleaning at the technical level. The whole process is called ETL. There are many ways in which data can be used to create value. That is, through BI, the original data can be aggregated in the form of dimensions and metrics to conduct various types of visual decision analysis and data mining. Determine different uses of data based on business and scenarios. The most important thing here is to have indicators first. ▍Data indicators I have emphasized collecting as much data as possible, but with so much raw data, how can we guide our business? This requires us to find direction from huge amounts of data. At this time, we need to establish indicators. Indicators are our direction and they are the connector between business and raw data.

It can be said that indicators are the lubricant that connects the upper and lower parts of the data-based operation system. They are processed from raw data and in turn drive other products. Need BI? BI is definitely about building dashboards around indicators; is it about using machine learning algorithms? The purpose of the algorithm is to improve the indicator effect; do you want to operate? The KPIs for content, users, and activity modules also revolve around indicators. An indicator is not a data product in the usual sense. I prefer to explain it as a product manager in the data industry, who drives and plans other data products and cooperates in the operation of iterative business. With this explanation, everyone will understand. How indicators are set is determined by business operations and is also the primary driving force for operations. Let’s take a quick look at how the indicators are processed from the raw data. The figure below shows the situation of users opening the APP recorded in the raw data.

Each timestamp means that the corresponding user has opened the app once. Through this table, we can calculate how many users have opened the app every day, which is the opening volume. Deduplicating the number of users is an important indicator in operations: the number of active users. The retention rate can be obtained by performing further complex operations on the table, such as using SQL Left Join. Article reading volume, daily sales, number of event participants, these are almost all aggregated and processed from raw data. After the indicators are summarized, the daily report dashboard will be provided for operations and product personnel. Now that we have the indicators, let’s look at other data products. Due to limited space, I will focus on user portraits. ▍User Profile User portraits are commonly used data products that are often mysterious to product and operation personnel. It has two interpretations, which are also the source of ambiguity for many novices: One type of user portrait belongs to the field of marketing and user research, called Persona, or more accurately translated as user role. It depicts the social attributes of a natural person and is used to determine user needs and scenarios. The user portrait in the data field is called Profile, which is a data label that describes the attributes of a person by processing a series of data. The most well-known example is Taobao's "Thousands of Faces for Each Person": when a user goes to buy pregnancy products, she is very likely to be labeled as a pregnant woman; when browsing car- related products, she will be labeled as having an interest in cars. User portrait is a complex system that relies on big data and machine learning. Accurate and rich user portraits can exponentially improve operational results.

User portraits also have simple uses, and it doesn’t matter if there is no data mining. It shouldn’t be difficult to get information like the user’s gender, age, and region, right? It is not difficult to distinguish user preferences based on simple user behavior. Then we have User Portrait V1.0. Recommendation systems, precision marketing, and advertising are all common applications based on user portraits. If you want to push a cosmetics promotion, users who choose the female label will definitely have a higher success rate. Furthermore, if the operator knows which category of cosmetics female users prefer, the effect will be even better. User portraits can be obtained by refining existing data. For example, if you have the user’s ID card information, you can accurately obtain the three labels of gender, place of origin, and date of birth. It can also be obtained through algorithmic calculation. For example, the name of the recipient left when shopping on Taobao can be obtained through machine learning in the form of probability whether the buyer is a man or a woman. Jianguo is very likely to be a man, and Cuilan is very likely to be a woman. User portraits are based on the processing of raw data. The more complete the raw data is, the richer the user portraits will be. In the data product layer, we process data into indicators and use them as the core to build and plan data products. How to display indicators (BI), how to improve indicators (algorithms), how to calculate indicators (ETL), and how to combine indicators (user portraits). Now that we have obtained these "products", the next step is to use them. Operations and product personnel are their users. ▍Data Operations The data operation layer is where operations personnel transform data into operational strategies. People are the main productive force, corresponding to computer automation of data products.

Before we talk about specific methods, let’s emphasize the role of people. No matter how good the data products we have created are, if the data-driven operational awareness of our employees is not improved, everything will be zero. There are three requirements for people: 01 To make decisions based on data, you need to know both what the data can do and what it cannot do. The former is easy to understand. I have encountered it many times in my work. Even when there is data to provide decision-making, I still believe in personal experience. This is the kind of thinking that should be avoided, not just by one person, but by the team. It must be objectively acknowledged that data-driven operations are not a panacea for business operations; the larger the company, the better the effect of data-driven operations. In a startup or small company, there will be certain restrictions. For example, lack of technical support, insufficient improvement effect, lack of data volume and other reasons lead to delayed priority. This is a trade-off that cannot be avoided, and solving the problem can only be the first priority. 02 The data analysis and operation level is not up to standard Although it is used consciously, employees are only able to seek averages, so don't expect too much. This problem needs to be solved through continuous systematic training and personnel recruitment . Top-down advocacy and initiation is the best result. If the top management has the strategy and awareness of data-driven operations, the management has the guidance experience of data-driven operations, and the executive level can implement data-driven operations, then the entire system will be successfully implemented. 03 Use of product tools This is a skill requirement for employees, such as MySQL query data, BI multi-dimensional analysis, precision marketing, AB testing, and conversion rate analysis, which are all necessary. Only when employees are proficient in using data-related tools can they maximize their value. There are too many specific techniques and methodologies on how operations and products conduct data operations. I will use the core ideas as an introduction. We focus on understanding thinking. 1. Not full volume, but fineness; not only fineness, but also leanness Full-scale operation is a centralized operation strategy. If activities, content push, marketing, and user relationship maintenance are targeted at all users, it would be a waste of operational resources. You cannot satisfy all users in one way, nor can you do your best in one way. There are differences between users, and these differences need to be compensated by refined operations. Refinement means breaking down the goals into finer granularity: national sales become Shanghai sales and Beijing sales, annual sales become first quarter sales and second quarter sales, and users become new users and old users. When e-commerce sells masks, is it better to sell them to users in Beijing or Hainan? When promoting cosmetics, it is obvious that the target group is men and women. Refinement (splitting) is a data analysis idea and also an operational means. Lean goes one step further than precision. Precision is a means and lean is a goal. What is Lean? Lean is the 80/20 rule, which is to find the most critical users. We all know that cosmetics should be sold to women, but there will definitely be some women who pay more. 20% of women account for 80% of sales. Lean is about identifying these 20%. Take the most appropriate measures at the most appropriate time for the most suitable users to generate the greatest value. The first three "most" refer to precision, and the last "most" refers to lean: maximizing value/goals. I have a CRM, so I can find the most valuable customers from the CRM and maintain them; I have risk management, so I can find the investments that are most likely to default; when I want to organize activities, I welcome users who have the highest output rather than those who are trying to take advantage of others; for the points center, the best results will only be achieved by the highest-quality customers. 2. The future is more important than the present, and the present is more important than the past The second core, data-driven operations, can predict the future and grasp the present. The traditional way of operation is to know what has happened in the past, how much sales there are, and how many active users there are. This is not enough in the increasingly fierce competitive environment. Seizing the present moment means obtaining immediate feedback from data. If you want to promote an activity, you can select 5% of users in advance to do a test, and get user feedback in a timely manner, such as whether the conversion rate is high and whether the activity is responsive. Then, decide whether to continue or improve subsequent operations based on the data. This is the progressive advantage brought by technology. Predicting the future is the field of machine learning. Through data modeling, we can obtain probabilistic predictions such as whether users are likely to churn, whether they will like and buy the product, and whether they will prefer a newly released movie... Operations can use these probabilities to conduct targeted operations. If machine learning cannot be used due to technical limitations, estimates will need to be made based on existing data trends, which depends on the operator's experience and data sensitivity. 3. Systematization and Automation In the process of building a data-based operation system, operators will use many tools. When the number of users reaches a certain level, we consider introducing a points center to increase user stickiness. If the product involves field promotion and sales personnel, we need to add CRM (customer relationship management) to maintain the customer base. For O2O and e-commerce, the basic configuration must include the sending of coupons. With more and more feedback, we also need a customer service center to resolve various questions. These tools, which are closely related to operations, occupy an important proportion in the data operation system. In order to better achieve the goal, it will be separated into an independent operation module/operation background. A good operational backend is as important as user-side products and also requires planning by a backend product manager. Taking the coupons we often come into contact with as an example, it must have a set of rules. The core goal is financial data, which is the balance between the cost and revenue of coupons: you can't issue them indiscriminately, otherwise you will definitely lose money, and you can't issue too little, otherwise users won't even know about this. Which coupons are available, how to issue them, how many have been issued and how many have been used, how many will be issued in the future, and how many have been issued but not used, all of these are a big framework, so a coupon issuance system was created. Coupons can be combined with CRM, which divides users into different values and groups through several indicators. This user particularly likes to spend money, so a coupon that gives him a 100 yuan discount on purchases over 1,000 yuan is definitely better than a 20 yuan discount on purchases over 200 yuan. The user has not made any purchases yet, so we need to stimulate him with first-order discounts. There are also users who have not made purchases for a while, so operators have to work harder on marketing. From a higher perspective, the above is a series of evaluations of effects, ROI, and profitability. This is using data to make operational strategies. CRM can also be combined with the customer service center. The phone number must be bound to the user's data. When a VIP user calls in, we select an account supervisor to receive the call and make the guest feel at home. Ordinary users should not be careless either. Customer service needs to at least know the user’s situation through the user portrait in the background, and also provide targeted services. The data operation system not only serves operations and products. Systematization requires us to treat the entire operation process and strategic flow as a product: which methods are easy to use, which means are effective, and which activities can be carried out continuously. We should fix all these and create a product background for operation as a daily routine and trick. This kind of systematic thinking is also called "reuse". The next step is to make the system more and more automatic and more and more powerful, which is another form of lean. ———————— All of the above combines data, product operations, systems and personnel. The reason why a system is a system is that it has moved away from the extensive stage. Everything is orderly, rule-based and full of strategies. Data is the lubricant of the system. Without data, how can you selectively issue coupons, organize activities, push messages, and maintain users? The various labels, user portraits, models processed at the data product layer are to be used by employees to the maximum extent possible at the data operation layer. Data itself has no value, it only becomes valuable when it is transformed into a strategy. To summarize these three points: We systematically use various processed data, take precision and accuracy as the means and goals, take grasping the future as the direction, and specify operational strategies. This is the core of the data operations layer. User Reach When our entire system reaches the final stage, it needs to be user-oriented. No matter how much data is collected, how well it is processed, or how hard it is operated, if it is not delivered to users, the system will fail.

The first three layers of users in the entire system are not aware of it. What users directly perceive are the product’s push notifications, banners, ad spaces, activities, copywriting , and the order in which products are displayed. When interacting with a product, users express their likes and dislikes through direct feedback. Those who are interested will click, those who like it will buy, and those who hate it will quit... These constitute a new round of behavioral data and also constitute feedback indicators: click-through rate , conversion rate, bounce rate, purchase rate, etc. These indicators reflect the results of the user reach layer and also the results of data-driven operations. Good or bad, both need to be verified. The result is not the end. There is a concept in management called PDCA, which translates into Chinese as Plan-Do-Check-Act, a cycle. The user reach layer is not the end of the data-based operation system, it is another beginning. Optimize and improve through data obtained through feedback.

My click-through rate is 5%, so can I reach 10% through operational optimization? After the user accepted the push, he chose to uninstall. What can we do to save him? The retention rate is improved, can this strategy be applied to other users? Maybe we won’t get a satisfactory result after data-driven operation; but if we don’t even make optimizations and improvements, then we won’t even have good opportunities. You see, excellent employees will not be complacent with the results of data-driven operations, but will start a new round. It is both the end and the starting point. This process is iteration and is the core of the system. Summarize We view the four layers in series. The figure below is a simplified data-based operation closed loop of a product.

Data collection layer: When users open the APP and browse news, the user's behavior data is recorded through embedding points: who reads which news and when.
Data product layer: Computers process the collected behavioral data and count the number of users reading different types of news such as military, science and technology, and economy. Use the chi-square test to obtain the user's reading preference for technology news and write it into the user portrait/tag system.
Data operation layer: There is a technology-related event recently that requires a certain number of users to participate. The operator cannot choose to push to all users, so select users who are interested in technology from the user pool.
User reach layer: Select users for targeted push notifications, and users receive messages on their mobile phones. The background will record whether the user opens the push notification, browses the page, and participates in the event. The conversion rate will be recorded as feedback and used for improvement in the next iteration.

This example is a qualified closed loop. The data-driven operation system can be as simple as completing it with Excel, or it can introduce high-end technologies such as machine learning, data mining, and distributed systems. It all depends on the thinking and application. We simplify the four layers in the system into four models to help you understand: Data collection: The interaction between users and products is input, and the raw data (behavior, business, traffic, external) is output. Data products: take raw data as input and processed data (labels, portraits, dimensions, indicators, algorithm results) as output. Data operation: processed data is used as input and operation strategies (users, content, activities, e-commerce) are used as output. User reach: takes operation strategy as input and feedback behavior (conversion rate, click-through rate, response rate) as output. The feedback behavior generated by users serves as new interactive input, and through iteration and optimization, the data-based operation system will operate well. A good data-driven operation system is also highly automated. For example, personalized recommendations can bypass the data operation layer. The server will directly give the recommendation results to the user after real-time calculation, and people do not need to be involved. These are four interconnected and sequential systems that constitute a data-based operation system. Due to differences in technical means, the implementation methods will be different. Even Excel can shine with data-driven operations. The above is the data-based operation system from the product and operation perspective. It does not involve too much R&D technology, and the actual complexity is even higher. Of course, there are thousands of uses, all of which depend on one’s heart. I hope that what everyone learns is the concepts and thinking. In actual work, there are still many ways to play waiting for everyone to explore.

Mobile application product promotion service: APP promotion service Qinggua Media advertising

This article was written by @Harvard Business Review and was compiled and published by (Qinggua Media). Please indicate the author information and source when reprinting!

<<: Home renovation mini program, how much does it cost to make a furniture mini program in Lijiang?

>>: How much does it cost to create a wedding banquet mini program in Urumqi?

This disinfectant should never be put together with laundry detergent, because...

The "formula for becoming popular" summarized after 10 years of marketing experience, 99% of the hot spots became popular this way!

Blog

Purple Sulfur Bacteria—The First Alien Lifeform Discovered?

Celebrities strongly recommend the "Happy Fast Slimming Zumba Dance" which everyone can dance to, which can burn fat, easily lose weight and dance sexy

Celebrities strongly recommend the "Happy Fa...

Respiratory diseases are prevalent in winter and spring. Do you want to know more about nebulizer inhalation?

...

Scientists unveil Ebola virus's 'replication machine'

In September this year, the Ebola outbreak reappe...

Building a data-based operation system from scratch

This disinfectant should never be put together with laundry detergent, because...

Don’t know how to write a Mother’s Day copy? Share 9 classic cases of brands leveraging momentum!

Moxtra Xu Congyi: "Geek" collaboration helps team growth and creates a one-stop communication experience

The "formula for becoming popular" summarized after 10 years of marketing experience, 99% of the hot spots became popular this way!

Purple Sulfur Bacteria—The First Alien Lifeform Discovered?

Dashan's "TikTok Advanced Practical Course" explains the account in detail, traffic operation, and practical realization

WOT2016 Liu Ziqian: Yunti is the defender of Internet security

Galaxy Note 5 battery life/charging results are out: amazing!

L3 autonomous driving is becoming increasingly awkward, Audi is advancing on two fronts to alleviate industry problems

How much does it cost to create a children’s clothing mini program in Zhangjiakou?

Recommend

Devoting himself to scientific research, he drew China's "agricultural map"

What is the difference between WeChat Phonebook and VoLTE?

A must-read for families with adolescent children! How should a girl choose her first piece of underwear?

104g potato chips, 255ml shower gel...Why are the specifications of many products not integers?

Scientific Expedition Diary: A Day “Going to Sea” in the Inland

PPT logical aesthetics, logical aesthetics Baidu network disk resources!

Waymo CEO says Tesla doesn't have self-driving, Musk hits back: "Better than you"

Tips | 3 principles and 4 strategies for community topic UGC operations

Android apk anti-decompilation technology Part 1 - Packing technology

Several strategies and difficult problems you must know about ASO

Product Analysis: A Brief Analysis of Bilibili Content Creation and Video Viewing

The latest ranking of 56 information flow advertising media platforms, with 2018 annual trend chart

Celebrities strongly recommend the "Happy Fast Slimming Zumba Dance" which everyone can dance to, which can burn fat, easily lose weight and dance sexy

Respiratory diseases are prevalent in winter and spring. Do you want to know more about nebulizer inhalation?

Scientists unveil Ebola virus's 'replication machine'