How to quickly and comprehensively build your own big data knowledge system?

Many people have read different types of books and come across many articles about big data, but they are all scattered and unsystematic, and do not help them much. Therefore, the author takes the time to lead everyone to understand the big data product design architecture and technical strategies from the perspective of the overall system.

Big data products are mainly divided into five steps from a systematic and systemic perspective:

Data points are buried for different channels at the front end, and then multi-dimensional data is collected according to different channels, which is the first step in big data. Without full data, how can big data analysis be discussed?
The second step is to use ETL to perform structured processing and loading of various types of data based on the collected multi-dimensional data;
Then the third step is to establish a data storage management subsystem for the standardized structured data after ETL processing and aggregate it into the underlying data warehouse. This step is critical. Based on the data warehouse, its internal data is decomposed into basic homogeneous data marts;
Then, based on the different data marts that are aggregated and decomposed, various R function packages are used to perform data modeling and various algorithm designs on the data sets. The algorithms need to be designed by yourself, and some algorithms can use R functions. This process involves the most product and operations personnel. If this step is done well, it is also the underlying layer of many companies' user portrait systems.
***Based on the various data models and algorithms established, combined with the different business characteristics of different channels on the front end, the back-end model is automatically matched according to channel touchpoints to automatically display personalized products and services to users.

Establish a systematic data collection indicator system

Establishing a data collection and analysis indicator system is the basis for forming a marketing data mart, and it is also a prerequisite for the marketing data mart to cover the breadth and depth of user behavior data. The data collection and analysis system must include user full activity behavior touchpoint data, user structured related data and unstructured related data. Only according to the data analysis indicator system can it be classified and summarized to form the attributes and attribute values for filtering user conditions, which is also the basis for discovering new marketing events.

Build a marketing data indicator analysis model, improve and upgrade data indicator collection, rely on the user's full-process behavior touchpoints, establish user behavior consumption characteristics and individual attributes, and form a user behavior characteristic analysis model from three dimensions: user behavior analysis, business operation data analysis, and marketing data analysis. User dimension data indicators are derived from the two-dimensional intersection of different dimensional analysis elements and each touchpoint of the user's full life cycle trajectory.

Currently, most of the data indicators collected and the visual reports output by companies working on big data platforms have several key problems:

The collected data is counted by channel, date, and region, and it is impossible to locate each specific user;
The calculated and statistical data are all scale data, and mining and analysis of scale data cannot be supported;
The data cannot support the system’s use for user acquisition, retention, and marketing push.

Therefore, in order for the data indicators collected by the system to support the personalized behavior analysis of the platform front end, the portrait design must be centered around the user. On the basis of the initial visual report results, the statistical data of different scales must be segmented and located for each user so that each data has a user attribute.

The scattered and disordered statistical data will be connected based on users. On the existing product interface, a label will be added to each statistical data. Clicking the label can display the corresponding behavior data of each user and link to other statistical data pages.

From this, we can deduce that we can establish data collection indicator dimensions with users as the main line: user identity information, user social life information, user asset information, user behavior preference information, user shopping preferences, user value, user feedback, user loyalty and other dimensions. Based on the established collection data dimensions, we can subdivide them into data indicators or data attribute items.

① User identity information dimension

Gender, age, zodiac sign, city of residence, active areas, ID information, education, income, health, etc.

② User social life information dimension

Industry, occupation, whether you have children, age of children, vehicle, housing type, communication situation, data usage...

③ User behavior preference information

Whether there is online shopping behavior, risk sensitivity, price sensitivity, brand sensitivity, profit sensitivity, product preference, channel preference...

④ User shopping preference information

Category preference, product preference, shopping frequency, browsing preference, marketing advertising preference, shopping time preference, single shopping maximum amount...

⑤ User feedback information dimension

Activities that users participate in, discussions that they participate in, products that they collect, products that they purchase, products that they recommend, products that they review...

Based on the collected multi-dimensional data, ETL is used to perform structured processing and loading of various types of data.

Data filling: fill in the gaps of empty data and missing data, and mark the data that cannot be processed
Data replacement: Replace invalid data
Format normalization: Convert the data format extracted from the source data into a target data format that is easy to enter the warehouse for processing
Primary and foreign key constraints: By establishing primary and foreign key constraints, illegal data can be replaced or exported to error files for reprocessing.
Data merging: multi-table association implementation (each field is indexed to ensure the efficiency of associated queries)
Data splitting: Split data according to certain rules
Swap rows and columns, sort/modify serial numbers, remove duplicate records

The data processing layer is composed of a Hadoop cluster. The Hadoop cluster reads business data from the data collection source, completes the processing logic of the business data through parallel computing, and filters and merges the data to form the target data.

Data modeling, user profiling and feature algorithms

Extract customer, product, and service data related to marketing, use cluster analysis and association analysis methods to build a data model, form a user data rule set through user rule attribute configuration, rule template configuration, and user portrait labeling, use the rule engine to implement marketing push and condition-triggered real-time marketing push, synchronize to the front-end channel interaction platform to execute marketing rules, and return marketing execution effect information to the big data system in real time.

Automatically match rules and trigger push content based on different personalized behaviors of front-end users

According to the user's full-process activity behavior trajectory, all behavioral touchpoints of the user's contact with online and offline channels are analyzed, marketing users are labeled, and user behavior portraits are formed. Based on the user portraits, marketing screening rule attributes and attribute values are refined and summarized, and finally the conditions for segmenting user groups are formed. Each user attribute corresponds to multiple different attribute values, and the attribute values can be personalized according to different activities, supporting the management function of user blacklists and whitelists.

Activity rules and models based on different user identity characteristics can be pre-configured. When the current user triggers the configured marketing event, the data system will automatically push the marketing rules in real time according to the principle of the best match, and configure the pushed activity content, discount information and product information through the real-time push function. At the same time, the effect data fed back by the front end is summarized to optimize and adjust the push rules and content.

The big data system will be combined with the customer marketing system based on the existing user portraits, user attribute labeling, customer and marketing rule configuration push, and the collection and sub-library model of user characteristics of the same type. In the future, it will gradually expand the machine deep learning function. The system will automatically collect and analyze the real-time changes in front-end user data, and automatically calculate the function parameters and corresponding rules that match user needs based on the constructed machine deep learning function model. The marketing system will automatically push highly matching marketing activities and content information in real time based on the calculated rule model.

Machine self-learning model algorithms are the core of deep learning in future big data systems. Only through large-scale sampling training, multiple data verifications and parameter adjustments can relatively accurate function factors and parameter values be finally determined. Therefore, based on the real-time behavioral data generated by front-end users, the system can automatically calculate the corresponding marketing rules and recommendation models.

In addition to deep self-learning, the big data system will gradually open up cooperation concepts in the future, connect with external third-party platforms, expand the scope of customer data and behavioral touchpoints, cover the user's online and offline behavioral trajectories throughout the entire life cycle as much as possible, grasp the user's behavioral touchpoint data, and expand the customer data mart and event library. Only in this way can we deeply explore the full range of customer needs, combine with machine self-learning functions, and fundamentally improve product sales capabilities and customer all-round experience perception.

<<: Comprehensive understanding of reinforcement learning from concepts to applications

>>: 70% of App promotions are fraudulent. Whose cake has been touched by anti-cheating?

APP growth strategy, 6 steps to complete the violent cold start of user fission!

Recommend

【YOTTA】C4D XPresso｜From beginner to advanced - C4D skills that experienced animators want to learn [HD quality with materials]

C4D XPresso｜From beginner to advanced - C4D skill...

The new generation of Huawei Smart Screen V series revealed: comprehensive evolution of audio and video, creating a smart home center

Huawei has been in the smart TV market for nearly...

How to quickly and comprehensively build your own big data knowledge system?

Establish a systematic data collection indicator system

Based on the collected multi-dimensional data, ETL is used to perform structured processing and loading of various types of data.

Data modeling, user profiling and feature algorithms

Automatically match rules and trigger push content based on different personalized behaviors of front-end users

APP growth strategy, 6 steps to complete the violent cold start of user fission!

Review: 27 fans pushed the second time, how did I achieve 1,600 readers

Why does my country have to pursue carbon neutrality? I understand it after reading this

Advertising strategy trio: Why do we need to advertise, where to advertise, and how to advertise?

How to conduct user behavior analysis? What is the value?

Must read: Seven insights from 100 top APP developers

A guide to avoiding pitfalls when promoting KOLs

Virtual Reality Entrepreneurship: How long will it take to see spring?

What is hidden in Apple's top-secret laboratory, the birthplace of black technology?

The latest research found that "Sugar oranges can no longer be eaten"? The truth...

Recommend

【YOTTA】C4D XPresso｜From beginner to advanced - C4D skills that experienced animators want to learn [HD quality with materials]

They are deciphering the "Qinghai-Tibet Code"

Another fitness move is going viral! A man's hemorrhoids ruptured after trying it! Doctor: Not suitable for everyone...

World Cup | Eliminate the "offside" controversy: their eyes are the ruler

Peony is a condiment in the past.

Einstein's brain has 73% more matter than an average person! Scientists reveal the secret of a smart brain

Beidou’s application in space services: Beidou satellites, guiding beacons for spacecraft

A-View: A Guide to China's Core Asset Allocation in a Century of Change

The giant panda in Japan celebrates its 27th birthday. How much do you know about the national treasure panda?

The 2020 Guide to Bargaining for Marketing Ads

A brief history of operations: 20 years of development and evolution of Internet operations (continued)

The new generation of Huawei Smart Screen V series revealed: comprehensive evolution of audio and video, creating a smart home center

In 2020, beware of suffering from "traffic anxiety"!

A brief analysis of the commercial monetization strategy of automotive apps

OPPO Reno6 Pro review: Put the summer sea in your pocket, a real "dessert" phone for young people