An interface was launched in 4 hours, and the practice of Ctrip's efficient and unified hotel data service platform was realized

Author: Xiao Feng, R&D Director of Ctrip, focuses on distributed database research, real-time computing in the big data field, and system architecture design for big data applications.

background

With the expansion of Ctrip Hotel data and the increase in personalized needs, the personalized scheduling development of each data interface requires a lot of time from demand discussion, data preparation, interface packaging, online debugging to interface API description because there is no standardization. It takes at least 2 days or even more for an interface to be implemented and put into production, and this time cost has to rely on scheduling development;

With the iteration of historical interfaces, more than 500 of the more than 700 data interfaces provided to the outside world are still in use, and the annual increase is more than 100. The development and maintenance costs are high, especially when tracing upstream offline data logic, which is too dependent on R&D resources;

Different R&D teams have different technology stacks. Algorithm-related R&D tends to be developed in Python, and the external output interface is also implemented in Python. However, the company framework has more friendly support for Java interfaces. The stability of external output interfaces of different technology stacks is questionable, especially after personnel turnover and changes in team responsibilities, which also affects maintenance costs;

As the business develops, the data demands of various business systems are increasing, and the requirements for demand response are also becoming higher and higher;

Through the analysis and classification of historical interfaces, more than 80% of data interfaces are actually for offline data or real-time data plus the search conditions of the demander to return data, without excessive processing logic or overly complex business logic implemented in the interface;

In order to support personalized business needs more quickly and reduce R&D costs, we have designed a data service platform that meets the needs, in order to reduce costs and increase efficiency, avoid chimney-style data interface development, improve data reuse, avoid multiple interfaces for the same data, and avoid different R&D teams getting the same data and making data interfaces for their own scenarios, thus reducing data silos.

1. Platform Introduction

The unified data service platform is built on the basis of the company's SOA service. The platform implements a unified technical solution, reduces operating costs, and improves interface stability, maintainability and sustainability.
Operation and maintenance configuration reduces the cost of implementing data interfaces, from 2 days or more for personalized development to 4 hours or even faster launch time. This implementation is basically not strongly dependent on resource scheduling.
Through the unified data service platform visual interface configuration, it does not rely on the intervention of Java developers. The data warehouse team can produce hive tables and configure interface output according to needs;
Unified data source ensures consistency in data usage;
Provide standard templates for demand-side application interfaces to improve communication efficiency and demand-side satisfaction with big data needs.

System level architecture diagram:

The interface application configuration process is as follows:

2. How to achieve

2.1 Platform Closure

Reduce the output team and technical solutions of data interfaces; in addition, with the growth of business volume and data volume, and the accumulation of business types, the current interface cannot be fully supported by MySQL. The platform uniformly plans technical solutions, and the caller does not need to care whether the underlying service uses any database such as ClickHouse, ES, StarRocks, Redis, and the technical characteristics and grammatical features of related databases. In actual configuration, we need to combine the caller's scenario and the characteristics, advantages and disadvantages of different OLAP databases to choose; for example:

ES: core, high-concurrency non-KV structure search scenarios;
Redis: core, high-concurrency KV structure scenario;
MySql: core, used for simple query of small tables with less than 10 million and high concurrency scenarios;
starrocks: secondary core, with low QPS, and a single table with data volumes in the tens of millions or hundreds of millions.
ClickHouse: non-core, QPS is less than 100, and the data volume is in the tens of millions or hundreds of millions.
Trocks/Hbase: non-core KV structure scenarios. At the same time, for different databases, we also need to pay attention to the update mechanism, which ones are suitable for full update and which ones are suitable for incremental update.

2.2 Strengthening data utilization

As long as some data has been synchronized in the table, it can be provided to the outside world the next time it is used in other business scenarios by configuring different query SQL. By monitoring the blood relationship, the repeated synchronization of offline data can be reduced, and the application surface of a piece of data can be improved, thereby improving the availability and consistency of the data, allowing data to be reused instead of copied.

2.3 Interface Security Verification

Each caller appid needs to apply for application permissions for a certain interface in advance. The unified service platform verifies the permissions of appid+token through authorization token to prevent illegal calls by applications that have not applied for interface permissions. The appid is automatically obtained through the company's soa framework to prevent the appid from being tampered with, thereby ensuring the security and stability of interface data.

2.4 Current Limiting Protection

In a high-concurrency system, it is very important to control the flow, especially in a unified service platform. When the flow of a certain interface exceeds the set threshold due to external crawlers and is not blocked, the entire platform's external output interface may become unavailable.

To this end, we introduced the Sentinel current limiting mechanism. Sentinel is a lightweight flow control component for distributed service architecture. It mainly uses flow as the entry point to help us ensure the stability of services from multiple dimensions such as current limiting, service degradation, and system load protection.

The implementation principle is to generate a pre-configured number of tokens within a specified time. Each request consumes a token. After all tokens are claimed, the service will be denied. Currently, each interface name has an independent token. The flow limits between interfaces do not interfere with each other to achieve flow control for each interface. If the qps exceeds the set threshold, the interface will automatically fuse.

2.5 Data Caching

The configuration information of the interface is stored persistently on the hard disk. It will be frequently used when the interface is called. How to quickly and efficiently obtain this configuration information requires the use of a cache mechanism. By establishing active and passive caches, the server load can be avoided from being too high. The configuration information of the data source is cached regularly, so that the interface can quickly obtain basic data when it is used, without the need for initialization.

2.6 Unification of service contracts

Through the interface called by this platform, all requests are now completed from one entry. After receiving the request, the interface automatically performs diversion processing according to the configuration information of the interface. The request contract contains two parts: head and params. Head is responsible for the basic information of the interface, which is used for service verification and business transfer. The params parameter is a json string parameter object. The service will dynamically parse the parameters according to the json information and configuration information. The response contract contains the interface success flag and the result part, where result is a json string parameter object, which needs to be parsed by the caller after receiving it.

The request is shown in the following figure:

The response is as shown below:

2.7 Data Service Configuration and Mapping

A service interface consists of a data source, SQL statement, request parameters, and response parameters. The parameters in the SQL statement are replaced by placeholders such as ? and {serial number} and used together with request parameters. The number of request parameters should be configured as many as the number of parameter placeholders in SQL. When the interface is running, it will automatically match the SQL parameters according to the request parameters. In order to map fields in the query results, the output of the SQL query can be converted into the desired output parameters through mapping. The configured response parameters are the query results returned by the interface service. The following figure shows how to configure the SQL query:

2.8 Automatic generation of contract documents

Personalized interface development requires explanation of the interface to inform the caller how to call it. In view of the fact that both the input and output parameters of the interface are customized, a set of service document display templates is defined, which contains all the detailed information for calling the interface. Once the interface is defined, the contract document will be dynamically generated, and the team applying for the service will send the information via email, saving the cost of explaining the interface. The online document effect is shown in the figure below, and it will also be pushed to the applicant via email.

2.9 Service Monitoring

After the service interface is running normally, the company's clog and ck log frameworks are used to monitor the interface calls. Clog monitoring mainly records all the process records from the start of the interface call to the return, including the call duration, request parameters and return parameters of each process node. It is convenient to locate the entire link of the interface request. ck monitoring mainly records the parameters requested by the interface layer, the returned parameters and the response time. Each request is recorded only once, and the number of calls in each period, the duration of the interface response and other information can be counted and monitored.

2.10 Production and operation results

Since its launch in early December 2021, it has connected to 10 caller appids and provided more than 100 interface services. The number of requests has increased with the increase in interfaces, and the current number of requests per day is more than 3.9 million. The launch cycle of each interface is half a day or less. The interface can be used online immediately after it is configured by the demand side, which greatly reduces the launch cycle. 91.49% of the production interface response time is within 10ms, and 99.99% is within 100ms.

3. Future Outlook

Now all interfaces are deployed in a cluster. For some callers, we can actually distinguish three levels: high, medium and low. High-priority callers are deployed in an independent cluster, medium-priority callers are deployed in a cluster, and low-priority callers are deployed in an independent cluster to isolate resources from each other.

Achieve the connection of the test environment. Since most big data environments only have production environments, but no test environments or test data, the unified service platform can only be used in production environments. The development environment or test environment cannot call joint debugging, and the caller can only test through mocks. This is also what we need to consider later on, how to achieve the availability of the test environment at the lowest cost, so that the caller can use it more conveniently.

<<: In-depth optimization and production practice of Flink engine in Kuaishou

>>: Inference is completed in less than 1ms on iPhone 12, Apple proposes MobileOne, an efficient mobile backbone network