How to quickly and comprehensively build your own big data knowledge system?

How to quickly and comprehensively build your own big data knowledge system?

Many people have read different types of books and come across many articles about big data, but they are all scattered and unsystematic, and do not help them much. Therefore, the author takes the time to lead everyone to understand the big data product design architecture and technical strategies from the perspective of the overall system.

Big data products are mainly divided into five steps from a systematic and systemic perspective:

  • Data points are buried for different channels at the front end, and then multi-dimensional data is collected according to different channels, which is the first step in big data. Without full data, how can big data analysis be discussed?
  • The second step is to use ETL to perform structured processing and loading of various types of data based on the collected multi-dimensional data;
  • Then the third step is to establish a data storage management subsystem for the standardized structured data after ETL processing and aggregate it into the underlying data warehouse. This step is critical. Based on the data warehouse, its internal data is decomposed into basic homogeneous data marts;
  • Then, based on the different data marts that are aggregated and decomposed, various R function packages are used to perform data modeling and various algorithm designs on the data sets. The algorithms need to be designed by yourself, and some algorithms can use R functions. This process involves the most product and operations personnel. If this step is done well, it is also the underlying layer of many companies' user portrait systems.
  • ***Based on the various data models and algorithms established, combined with the different business characteristics of different channels on the front end, the back-end model is automatically matched according to channel touchpoints to automatically display personalized products and services to users.

​​

Establish a systematic data collection indicator system

Establishing a data collection and analysis indicator system is the basis for forming a marketing data mart, and it is also a prerequisite for the marketing data mart to cover the breadth and depth of user behavior data. The data collection and analysis system must include user full activity behavior touchpoint data, user structured related data and unstructured related data. Only according to the data analysis indicator system can it be classified and summarized to form the attributes and attribute values ​​for filtering user conditions, which is also the basis for discovering new marketing events.

Build a marketing data indicator analysis model, improve and upgrade data indicator collection, rely on the user's full-process behavior touchpoints, establish user behavior consumption characteristics and individual attributes, and form a user behavior characteristic analysis model from three dimensions: user behavior analysis, business operation data analysis, and marketing data analysis. User dimension data indicators are derived from the two-dimensional intersection of different dimensional analysis elements and each touchpoint of the user's full life cycle trajectory.

Currently, most of the data indicators collected and the visual reports output by companies working on big data platforms have several key problems:

  • The collected data is counted by channel, date, and region, and it is impossible to locate each specific user;
  • The calculated and statistical data are all scale data, and mining and analysis of scale data cannot be supported;
  • The data cannot support the system’s use for user acquisition, retention, and marketing push.

Therefore, in order for the data indicators collected by the system to support the personalized behavior analysis of the platform front end, the portrait design must be centered around the user. On the basis of the initial visual report results, the statistical data of different scales must be segmented and located for each user so that each data has a user attribute.

The scattered and disordered statistical data will be connected based on users. On the existing product interface, a label will be added to each statistical data. Clicking the label can display the corresponding behavior data of each user and link to other statistical data pages.

From this, we can deduce that we can establish data collection indicator dimensions with users as the main line: user identity information, user social life information, user asset information, user behavior preference information, user shopping preferences, user value, user feedback, user loyalty and other dimensions. Based on the established collection data dimensions, we can subdivide them into data indicators or data attribute items.

① User identity information dimension


Gender, age, zodiac sign, city of residence, active areas, ID information, education, income, health, etc.


② User social life information dimension


Industry, occupation, whether you have children, age of children, vehicle, housing type, communication situation, data usage...


③ User behavior preference information


Whether there is online shopping behavior, risk sensitivity, price sensitivity, brand sensitivity, profit sensitivity, product preference, channel preference...


④ User shopping preference information


Category preference, product preference, shopping frequency, browsing preference, marketing advertising preference, shopping time preference, single shopping maximum amount...


⑤ User feedback information dimension


Activities that users participate in, discussions that they participate in, products that they collect, products that they purchase, products that they recommend, products that they review...


​​

Based on the collected multi-dimensional data, ETL is used to perform structured processing and loading of various types of data.

  • Data filling: fill in the gaps of empty data and missing data, and mark the data that cannot be processed
  • Data replacement: Replace invalid data
  • Format normalization: Convert the data format extracted from the source data into a target data format that is easy to enter the warehouse for processing
  • Primary and foreign key constraints: By establishing primary and foreign key constraints, illegal data can be replaced or exported to error files for reprocessing.
  • Data merging: multi-table association implementation (each field is indexed to ensure the efficiency of associated queries)
  • Data splitting: Split data according to certain rules
  • Swap rows and columns, sort/modify serial numbers, remove duplicate records

The data processing layer is composed of a Hadoop cluster. The Hadoop cluster reads business data from the data collection source, completes the processing logic of the business data through parallel computing, and filters and merges the data to form the target data.

Data modeling, user profiling and feature algorithms

Extract customer, product, and service data related to marketing, use cluster analysis and association analysis methods to build a data model, form a user data rule set through user rule attribute configuration, rule template configuration, and user portrait labeling, use the rule engine to implement marketing push and condition-triggered real-time marketing push, synchronize to the front-end channel interaction platform to execute marketing rules, and return marketing execution effect information to the big data system in real time.

​​

Automatically match rules and trigger push content based on different personalized behaviors of front-end users

According to the user's full-process activity behavior trajectory, all behavioral touchpoints of the user's contact with online and offline channels are analyzed, marketing users are labeled, and user behavior portraits are formed. Based on the user portraits, marketing screening rule attributes and attribute values ​​are refined and summarized, and finally the conditions for segmenting user groups are formed. Each user attribute corresponds to multiple different attribute values, and the attribute values ​​can be personalized according to different activities, supporting the management function of user blacklists and whitelists.

Activity rules and models based on different user identity characteristics can be pre-configured. When the current user triggers the configured marketing event, the data system will automatically push the marketing rules in real time according to the principle of the best match, and configure the pushed activity content, discount information and product information through the real-time push function. At the same time, the effect data fed back by the front end is summarized to optimize and adjust the push rules and content.

The big data system will be combined with the customer marketing system based on the existing user portraits, user attribute labeling, customer and marketing rule configuration push, and the collection and sub-library model of user characteristics of the same type. In the future, it will gradually expand the machine deep learning function. The system will automatically collect and analyze the real-time changes in front-end user data, and automatically calculate the function parameters and corresponding rules that match user needs based on the constructed machine deep learning function model. The marketing system will automatically push highly matching marketing activities and content information in real time based on the calculated rule model.

​​

Machine self-learning model algorithms are the core of deep learning in future big data systems. Only through large-scale sampling training, multiple data verifications and parameter adjustments can relatively accurate function factors and parameter values ​​be finally determined. Therefore, based on the real-time behavioral data generated by front-end users, the system can automatically calculate the corresponding marketing rules and recommendation models.

In addition to deep self-learning, the big data system will gradually open up cooperation concepts in the future, connect with external third-party platforms, expand the scope of customer data and behavioral touchpoints, cover the user's online and offline behavioral trajectories throughout the entire life cycle as much as possible, grasp the user's behavioral touchpoint data, and expand the customer data mart and event library. Only in this way can we deeply explore the full range of customer needs, combine with machine self-learning functions, and fundamentally improve product sales capabilities and customer all-round experience perception.

<<:  Comprehensive understanding of reinforcement learning from concepts to applications

>>:  70% of App promotions are fraudulent. Whose cake has been touched by anti-cheating?

Recommend

User Operation: AARRR Model for Products from 0 to 200,000 Users

Without further ado, the full article is structur...

You should know these facts about off-season vegetables!

The Analects of Confucius says: "Don't e...

5 aspects to understand about short video operations!

Where there are users, there is a market; where t...

Tips for event planning and promotion!

When it comes to event planning , many people thi...

"Mars" faces "Chang'e": "Mars and the Moon" will appear in the sky on the 23rd

According to astronomical science experts, Mars w...

Are Brands Misreading Metaverse Marketing?

In April last year, the launch of BAYC Bored Ape ...

How about Baidu bidding hosting agency?

"The world is complicated, and Baidu underst...

Hippocampus portrait intensive course 6 lessons portrait intensive course

Hippocampus Portrait Refining Course 6-hour Portr...

Kuaishou short video advertising platform gameplay + case introduction!

Kuaishou is a well-known short video application ...

The core model and skills of community operation!

This year, the term " community operation &q...