vivo Global Mall: E-commerce transaction platform design

vivo Global Mall: E-commerce transaction platform design

1. Background

The vivo official mall has undergone seven years of iteration, gradually evolving from a monolithic architecture to a microservice architecture. Our development team has accumulated a lot of valuable technology and experience, and has a very deep understanding of the e-commerce business.

At the beginning of last year, the team took on the task of building the O2O mall, as well as the gift platform that was about to be established, and the official mall's online purchase and offline store delivery needs, all of which required the construction of underlying products, transactions and inventory capabilities.

In order to save R&D and operation and maintenance costs and avoid reinventing the wheel, we decided to adopt a platform-based approach to build the underlying system, using general capabilities to flexibly support the personalized needs of upper-level businesses.

A complete set of e-commerce platform systems, including trading platforms, product platforms, inventory platforms, and marketing platforms, came into being.

This article will introduce the architectural design concepts and practices of the trading platform, as well as the challenges and reflections in the continuous iteration process after launch.

2. Overall Architecture

2.1 Architecture Goals

In addition to high concurrency, high performance, and high availability, we also hope to achieve the following:

  1. low cost
    Focus on the reusability of models and services, flexibly support the personalized needs of various businesses, improve development efficiency and reduce labor costs.
  2. High scalability
    The system architecture is simple and clear, the coupling between application systems is low, it is easy to expand horizontally, and business functions can be added and modified conveniently and quickly.

2.2 System Architecture

(1) Trading platform in the overall architecture of e-commerce platform

(2) Trading platform system architecture

2.3 Data Model

3. Key Solution Design

3.1 Multi-tenant design

(1) Background and objectives

  • The trading platform serves multiple tenants (business parties) and needs to be able to store large amounts of order data and provide high-availability and high-performance services.
  • The amount of data and concurrency of different tenants may vary greatly, and storage resources must be flexibly allocated based on actual conditions.

(2) Design plan

  • Considering the OLTP characteristics of the trading system and the proficiency of developers, MySQL is used as the underlying storage, ShardingSphere is used as the sharding middleware, and the user ID (userId) is used as the sharding key to ensure that the orders of the same user fall into the same database.
  • When adding a new tenant, a tenant code (tenantCode) is agreed upon, and all interfaces must carry this parameter; the tenant evaluates the data volume and concurrency, and allocates a number of libraries and tables that can at least meet the needs of the next five years.
  • The mapping relationship between tenants and libraries and tables: tenant code -> {number of libraries, number of tables, starting library number, starting table number}.

Through the above mapping relationship, storage resources can be flexibly allocated to each tenant, and tenants with small data volumes can reuse existing libraries and tables.

Example 1:

Before the new tenant joins, there are 4 databases * 16 tables. The new tenant has a small order volume and low concurrency. The existing database 0 and table 0 are directly reused. The mapping relationship is: tenant code -> 1,1,0,0

Example 2:

Before the new tenant joins, there are 4 databases * 16 tables. The new tenant has a large number of orders but low concurrency. Eight new tables are created in the original database 0 to store them. The mapping relationship is: tenant code -> 1,8,0,16

Example 3:

Before the new tenant joins, there are 4 databases * 16 tables. The new tenant has a large number of orders and high concurrency. Use new 4 databases * 8 tables to store them. The mapping relationship is: tenant code -> 4,8,4,0

Calculation formula for the database table to which the user order belongs

Library serial number = Hash(userId) / table number % library number + starting library number Table serial number = Hash(userId) % table number + starting table number

Some friends may ask: Why do we need to divide by the number of tables when calculating the library number? What is wrong with the following formula?

Library serial number = Hash(userId) % library quantity + starting library number Table serial number = Hash(userId) % table quantity + starting table number

The answer is that when there is a common factor between the number of libraries and the number of tables, there will be a skew problem. Dividing by the number of tables first can eliminate the common factor.

Taking 2 databases and 4 tables as an example, a number that is equal to 1 modulo 4 must also be equal to 1 modulo 2. Therefore, there will be no data in table 1 of database 0. Similarly, there will be no data in table 3 of database 0, table 0 of database 1, and table 2 of database 1.

The routing process is shown in the following figure:

(3) Limitations and countermeasures

  • Globally unique ID

Problem: After the database and table are sharded, the database auto-increment primary key is no longer globally unique and cannot be used as an order number. In addition, many interactive interfaces between internal systems only have order numbers, not user identifiers, which are shard keys.

Solution: As shown in the figure below, refer to the snowflake algorithm to generate a globally unique order number, and implicitly include the library table number in it (two 5-bits store the library table number respectively). In this way, the library table number can be obtained from the order number in the scenario where there is no user ID.


  • Search all databases and tables

Question: The management backend needs to query all orders that meet the conditions in pages based on various filtering conditions.

Solution: Redundantly store a copy of the order data in the search engine Elasticsearch to meet the needs of fast and flexible queries in various scenarios.

3.2 State Machine Design

(1) Background

  • When we were working on the official mall, since it was a customized business development, the status flow of various types of orders and after-sales orders were hardcoded. For example, regular orders are pending payment after placing the order, pending shipment after payment, and pending receipt after shipment; virtual product orders do not need to be shipped and do not have a pending shipment status.
  • What we need to do now is a platform system. It is no longer possible to do customized development for each business party, otherwise it will lead to frequent changes and releases, and complex and redundant codes.

(2) Objectives

  • The introduction of the order state machine can configure multiple sets of differentiated order processes for each business party, which is similar to process orchestration.
  • When adding a new order process, try not to change the code as much as possible to achieve reusability of status and operations.

(3) Plan

  • A series of order types are maintained for each tenant in the management background. The data is converted into JSON format and stored in the configuration center, or stored in the database and synchronized to the local cache.
  • The configuration of each order type includes: the initial order state, the operations allowed in each state, and the target state after the operation.
  • When an order is performing an action, the order state machine is used to modify the order status.
    The formula for the order state machine is:
    StateMachine(E,S —> A, S')
    Indicates that the order executes action A under the trigger of event E and transforms from the original state S to the target state S'
  • After each order type is configured, the structure of the generated data is
 /**
* Order process configuration
**/
@Data
public class OrderFlowConfig implements Serializable {
/**
* Initial order status code
**/
private String initStatus ;
/**
* Under each order status, the operations that can be performed and the target status after the operations are performed
* Map<original status code, Map<order operation type code, target status code>>
*/
private Map < String , Map < String , String >> operations ;
}
  • The order product line state machine and the after-sales order state machine are also implemented in the same way

3.3 General Operation Triggers

(1) Background

There are usually such delay requirements in business, and we used to scan and process them through scheduled tasks.

  • If the order is not paid for a certain period of time after it is placed, the order will be automatically closed.
  • How long after applying for a refund does the merchant not review it and automatically approve the application?
  • How long after the order is signed for is the delivery not confirmed? Automatically confirm the delivery

(2) Objectives

  • When the business side has similar delay requirements, they can easily implement them in a common way.

(3) Plan

Design a general action trigger. The specific steps are as follows:

  1. Configure triggers, the granularity is the process type of the state machine.
  2. When an order/sales order is created or the order status changes, if there is a trigger that meets the conditions, a delayed message is sent.
  3. After receiving the delay message, the execution condition is judged again and the configured operation is performed.

The configuration of the trigger includes:

  1. Registration time: When the order is created or when the order status changes
  2. Execution time: You can use JsonPath expressions to select the time in the order model and add delay time
  3. Registration conditions: Use QLExpress configuration, register only if the conditions are met
  4. Execution conditions: Use QLExpress configuration to execute the operation only when the conditions are met
  5. Executed operations and parameters

3.4 Distributed Transactions

Distributed transactions are a classic problem for trading platforms, such as:

  • When creating an order, you need to deduct inventory and occupy coupons at the same time. When canceling an order, you need to roll back.
  • After the user has paid successfully, the delivery system needs to be notified to ship the goods to the user.
  • After the user confirms receipt of the goods, the points system needs to be notified to issue shopping reward points to the user.

How do we ensure data consistency in a microservice architecture? First, we need to distinguish the consistency requirements of business scenarios.

(1) Strong consistency scenario

For example, calls to the inventory and coupon systems when creating and canceling orders may lead to overselling of inventory or duplicate use of coupons if strong consistency cannot be guaranteed.

For strong consistency scenarios, we use Seata's AT mode to handle it. The following diagram is taken from Seata's official document.

(2) Eventual consistency scenario

For example, after payment is successful, the delivery system is notified to ship the goods, and after receipt is confirmed, the points system is notified to issue points. As long as the notification can be successful, it does not need to succeed and fail at the same time.

For the eventual consistency scenario, we use the local message table solution: the asynchronous operations to be executed in the local transaction are recorded in the message table. If the execution fails, it can be compensated by a scheduled task.

3.5 High Availability and Security Design

  • Circuit Breaker

Use the Hystrix component to add circuit breaker protection to dependent external systems to prevent the impact of a system failure from expanding to the entire distributed system.

  • Current Limitation

Through performance testing, we can identify and resolve performance bottlenecks, understand the system throughput data, and provide a reference for the configuration of current limiting and circuit breaking.

  • Concurrent Locks

Before any order update operation, it will be restricted by database row-level lock to prevent concurrent updates.

  • Idempotence

All interfaces are idempotent. If an exception such as a timeout occurs when the upstream calls our interface, you can safely retry.

  • Network Isolation

Only a very small number of third-party interfaces can be accessed through the external network, and they are all protected by whitelists, data encryption, signature verification, etc. Internal systems interact using intranet domain names and RPC interfaces.

  • Monitoring and Alerting

By configuring the error log alarm of the log platform and the service analysis alarm of the call chain, combined with the monitoring and alarm functions of the company's middleware and basic components, we can detect system anomalies in the first place.

3.6 Other considerations

  • Whether to use domain-driven design

Considering the team's non-agile organizational structure and lack of domain experts, we did not adopt

  • Performance bottleneck during peak hours

During big sales and promotions, especially when hot items are on sale, traffic may trigger traffic restrictions, causing some users to be turned away. Because it is impossible to accurately estimate traffic, it is difficult to expand capacity in advance.

The concurrency can be increased through active downgrade solutions, such as switching from synchronous warehousing to asynchronous warehousing, from DB query to cache query, and only querying orders in the last six months.

Considering that the business complexity and data volume are still in the early stages, and the team size is difficult to support, these designs have long-term plans, but have not yet been done. (The principle of appropriateness of architecture, you can use a sledgehammer to kill a nut if you want).

IV. Summary and Outlook

When designing the system, we did not blindly pursue cutting-edge technologies and ideas, nor did we directly adopt mainstream solutions in the industry when facing problems. Instead, we selected the most appropriate method based on the actual situation of the team and the system. A good system is not designed by a master at the beginning, but is gradually iterated as the business develops and evolves.

The trading platform has been online for more than a year and has been connected to three business parties. The system runs smoothly. New businesses within the company with trading/commodity/inventory needs, as well as existing businesses that encounter system bottlenecks and need to be upgraded, can reuse this capability.

With the increase in the number of upstream business parties and the iteration of versions, the demand for platform systems is continuous. The functions of the platform have been gradually improved and the architecture has been continuously evolving. We are separating the fulfillment module from the trading platform and further decoupling it to prepare for the sustainable development of the business.

<<:  iOS 16.5 update push, this feature will be disabled

>>:  How to avoid Android startup stack trap

Recommend

IDC room rental high bandwidth cost

Most of the bandwidth at home and the data traffi...

The evolution of content platforms and the arena of community operations

Valuable content and users' demand for conten...

Android simple "blow to achieve" and recording and playback examples

I've been working on something related to sen...

What do popular products have in common? Share 3 angles!

The reasons behind the popularity may become one ...

How much does it cost to rent a game server per month?

How much does a game server cost per month? How m...

Which Zhejiang Mobile high bandwidth server rental company is the best?

Which Zhejiang Mobile high-bandwidth server renta...

Zhou Wenqiang's "Top Financial Thinking Course" 22 episodes

Zhou Wenqiang's "Top Financial Thinking ...

【Wuwei Financial School】Li Dong's Bull Stock Training 29 Episodes

【Wuwei Financial School】Li Dong's Bull Stock ...

Xiaohongshu Operation Guide

Let’s take a look at how to achieve conversion ra...