Service registration and discovery based on Zookeeper

background

Most systems start with a single system. As the company's business grows rapidly, this single system becomes larger and larger, which brings several problems:

1. As the number of visits continues to rise, simply improving the performance of the machine can no longer solve the problem, and the system cannot be effectively expanded horizontally.

2. Maintaining this single system becomes increasingly complex

3. At the same time, as business scenarios change and large-scale R&D recruits engineers with different technical backgrounds, the Java technology stack was introduced on the basis of the original Dada Python technology stack.

How to solve these problems? Business service is an effective means to solve the performance bottleneck and complexity of large-scale systems. By splitting the original single large system into small systems, it brings the following benefits:

1. The pressure of the original system is well diverted, effectively solving the bottleneck of the original system and bringing better scalability

2. Independent code base, less business logic, and greatly enhanced system maintainability

At the same time, it also brings a series of problems:

▪ With more and more system services, how to manage these services?

▪ How to distribute requests to multiple hosts that provide the same service (how to do load balancing)

▪ If the endpoint providing the service changes, how to notify the service caller of this information?

Initial solution

Reid Hoffman, the founder of Linkedin, once said:

Building a startup is like throwing yourself off a cliff and assembling a plane on the way down.

This is also true for the startup Dada, whose business is developing at a rocket-like speed. The role of technology in business development is to ensure the stable operation of the business and quickly "assemble an airplane." Therefore, in the early days of business service, we adopted Nginx + local hosts file to register and discover internal services. The architecture diagram is as follows:

The roles of each system component are as follows:

1. The service consumer calls the service through the service provider domain name in the local hosts and the Nginx IP binding information

2. Nginx is used to perform survival checks and load balancing on services provided by service providers

3. The service provider provides services to the service consumer and distributes the requests through Nginx

This solved the problems of service registration, discovery, and load balancing when there were relatively few internal systems and relatively small traffic. However, as the number of internal services increased and the traffic increased, the hidden dangers of this architecture gradually became apparent:

▪ The most obvious problem is that Nginx has a single point of failure (SPOF), and as the number of visits increases, it will become a performance bottleneck

▪ With the increasing number of internal services, different service consumers need to configure different hosts. It is easy to forget to configure hosts when adding new hosts, resulting in service invocation problems, which increases the operation and maintenance burden.

▪ The service configuration information is scattered across various hosts, making it difficult to maintain consistency and inconvenient for service management

▪ The release and offline of the service host requires manual modification of the Nginx upstream configuration, and the modified configuration needs to be online, which is not conducive to the rapid deployment of the service

How to solve it

Before discussing how to solve this problem, let's review the goals of service registration and discovery:

▪ Service registration information should be saved in a unified manner to facilitate service management

▪ Automatically discover services by service name without having to know which host the endpoint provided by the service is

▪ Support service load balancing and fail-over

▪ When adding or removing a service endpoint, it is transparent to the service consumers.

▪ Supports Python and Java

Alternative 1: DNS

DNS is a relatively simple solution for service registration and discovery. All you need to do is configure a DNS name and IP correspondence on the DNS service. To locate a service, you only need to connect to the DNS server and get a random IP address returned. Because of the DNS cache, the DNS server itself will not become a bottleneck.

This Pull-based approach cannot obtain timely updates on the status of the service (for example, updates to the service IP, etc.). If the service provider fails, due to the existence of the DNS cache, the service caller will still forward the request to the failed service provider, and vice versa.

Alternative 2: Dubbo

Dubbo is a distributed service framework launched by Alibaba, dedicated to solving service registration and discovery, orchestration, and governance. Its advantages are as follows:

1. Comprehensive functions and easy to expand

2. Support various serialization protocols (JSON, Hession, java serialization, etc.)

3. Support various RPC protocols (HTTP, Java RMI, Dubbo's own RPC protocol, etc.)

4. Support multiple load balancing algorithms

5. Other advanced features: service orchestration, service governance, service monitoring, etc.

The disadvantages are as follows:

1. Only supports Java, no corresponding support for Python

2. Although it is open source, there is no mature community to operate and maintain it, and future upgrades may be a hassle

3. Heavyweight solutions bring new complexity

Alternative 3: Zookeeper

What is Zookeeper? According to the description on the Apache website:

ZooKeeper is a centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services.

According to the definition on the official website, it can do:

1. As a central server for storing configuration information

2. Naming Service

3. Distributed coordination

4. Mater elections, etc.

The definition specifically mentions naming services. After research, Zookeeper is a solution for service registration and discovery, and it has the following advantages:

1. The simple API it provides

2. Internet companies (e.g. Pinterest, Airbnb) already use it for service registration and discovery

3. Support multi-language clients

4. Implement the Push model through the Watcher mechanism, and the changes in service registration information can promptly notify the service consumer

The disadvantages are:

1. Introducing new Zookeeper components brings new complexity and operation and maintenance issues

2. You need to implement service registration and discovery logic through the API it provides (including Python and Java versions)

After weighing the pros and cons of the above solutions, we decided to implement our own service registration and discovery based on Zookeeper.

Service registration and discovery architecture based on Zookeeper

There are three types of roles in this architecture: service provider, service registry, and service consumer.

Service Provider

As the service provider, the service provider registers its service information in the service registration center. The service information includes:

▪ Which system it belongs to

▪ Service IP, port

▪ The request URL of the service

▪ Weight of service, etc.

Service Registry

The service registration center mainly provides central storage for all service registration information, and is responsible for pushing update notifications of service registration information to service consumers in real time (mainly through Zookeeper's Watcher mechanism).

Serving consumers

The main responsibilities of serving consumers are as follows:

1. The service consumer obtains the required service registration information from the service registration center at startup

2. Cache service registration information locally

3. Monitor changes in service registration information. If a service change notification is received from the service registration center, the service registration information is updated in the local cache.

4. Build a service call request based on the service registration information in the local cache, and forward the request based on the load balancing strategy (random load balancing, Round-Robin load balancing, etc.)

5. Check the survival of the service provider. If a service provider is unavailable, it will be removed from the local cache.

Service consumers only rely on the service registry when they initialize themselves and change services. The single point of failure at this stage is protected by the Zookeeper cluster. During the entire service call process, service consumers do not rely on any third-party services.

Implementation Mechanism Introduction

Introduction to Zookeeper Data Model

In the entire service registration and discovery design, the most important thing is how to store the service registration information.

Before designing the service registration structure based on Zookeeper, let's take a look at the data model of Zookeeper. The data model of Zookeeper is shown in the following figure:

The Zookeeper data model structure is very similar to the Unix file system, which is a tree-like hierarchical structure. Each node is called a Znode, which can have child nodes and allow a small amount of data to be stored under the node. The client can obtain Znode changes in real time by monitoring node data changes and child node changes (Wather mechanism).

Service registration structure

The service registration structure is shown in the figure above.

▪ /dada is used to indicate the company name dada, and can be easily distinguished from other application directories (for example, Kafka brokers registration information is placed under /brokers)

▪ /dada/servicesPlace all service providers in this directory

▪ The /dada/services/category1 directory defines the specific service provider id: category1. At the same time, the Znode node allows the storage of some metadata information of the service provider, such as: name, service provider Owner, context path (Java Web project), health check path, etc. This information can be freely expanded according to actual needs.

▪ The /dada/services/category1/helloworld node defines a service under the service provider category1: helloworld. Helloworld is the ID of the service, and the metadata information of the service is allowed to be stored under the Znode, such as the service name, service description, service path, service call schema, service call HTTP METHOD, etc. This information can be freely expanded according to actual needs.

▪ The /dada/services/category1/helloworld/providers node defines the parent node of the service provider. In fact, the IP and port of the service provider can be placed directly under the helloworld node. A separate node is placed here so that the service consumer's message can be mounted under the helloworld node in the future for some expansion, for example, named: /dada/services/category1/helloworld/consumers.

▪ /dada/services/category__1/helloworld/providers/192.168.1.1:8080 This node defines the IP and port of the service provider, and also defines the weight of the service provider in the node.

Implementation Mechanism

Since the service registration is currently carried out through our service registration center UI, this part of the logic is relatively simple, that is, the service registration structure defined above is constructed through the UI interface.

The following is a brief introduction to how our service discovery works:

In the above class diagram, the ServiceDiscovery class mainly obtains service information through the Zookeeper API (Python/Java version), and adds a Watcher to the providers node of each service in the service registration structure to monitor node changes. The obtained service registration information is saved in the variable service_repos. The load balancing of service providers is achieved by setting the implementation of LoadBalanceStrategy (Round-Robin algorithm, Radmon algorithm) during initialization. Main methods:

1. init obtains Zookeeper's service registration information and caches it in service_repos

2. The get_service_repos method gets the instance variable service_repos

3. get_service_endpoint returns the URL address of a service based on the service_repos built by init and the load balancing strategy provided by lb_strategy

4. update_service_repos uses Zookeeper's Watcher mechanism to update the local cache service_repos in real time

5. heartbeat_monitor is a heartbeat detection thread, which is used to detect the health and survival of the service provider. If there is a problem, the service provider will be removed from the provider list of the service; otherwise, it will be added to the provider list of the service.

LoadBalanceStrategy defines the return of the corresponding service Host and IP based on the service provider's information, that is, deciding which host + port to provide the service.

RoundRobinStrategy and RandomStrategy implement Round-Robin and random load balancing algorithms respectively

Future Outlook

Currently, Dada's service registration and discovery architecture based on Zookeeper is still in its early stages, and many functions have not yet been perfected, such as: service routing function, integration with deployment platforms, service monitoring, etc.

Of course, there are many other things that can be done based on Zookeeper, such as real-time dynamic configuration system. At present, we have implemented a real-time dynamic configuration system based on Zookeeper. If you want to know more, please continue to follow our blog.

[[160360]]

Yang Jun, CTO of Dada

Currently managing Dada's R&D department, responsible for products, technology and data. He worked at Google and Facebook headquarters for nearly 7 years. As one of the earliest Chinese engineers at Facebook, he joined and led multiple R&D teams, responsible for the friend recommendation system and multiple advertising products and backends, and optimized advertising through machine learning and big data analysis. Before joining Dada, he led the Growth team at Square, a well-known mobile payment company in Silicon Valley, and was responsible for the company's user growth strategy and implementation.

He graduated from the Zhu Kezhen College of Zhejiang University and later received his Ph.D. from Carnegie Mellon University, where he conducted research in machine learning and multimedia analysis.

<<: Let pre-sales engineers open the door to unlimited possibilities

>>: A week of crooked comments reflects a year: China will become Pandaria, it all depends on whether Musk's firecrackers are powerful enough

Does drinking coffee really help you lose weight?

Half a bottle is thrown away, and half a bottle is thrown away for a year! Why is the design of oyster sauce not improved?

Expert of this article: Li Yilan, PhD in Chemistr...

Service registration and discovery based on Zookeeper

Does drinking coffee really help you lose weight?

The number of followers increased by 140,000 in 12 hours. How to carry out the fission screen-sweeping activity?

The algorithm recommendation mechanism of Toutiao and Douyin is not as "stupid" as people say on the Internet!

Will a zebra turn solid color if its hair is shaved?

Tencent’s three 10 billion ecosystem empowerments, how to master social advertising!

Xianyu attracts mom fans + sells goods to make money and earns 30,000+ a month (practical video tutorial)

The most comprehensive guide to attracting customers to Xiaohongshu

3 steps to quickly activate private domain inventory!

Why can the last 1% of a cell phone's battery last so long? It turns out we've all been fooled by it

Revealed: Zero cost, zero threshold, earn 500+ a day, everyone can do the Xianyu proxy project

Recommend

How to avoid bad single-minded thinking when running an event?

Which method should I choose to declare infant and child care expenses? What are the deduction ratios and standards?

Now that it has entered the ranks of official vehicle purchases, can we look at Zotye in a different light?

The "anti-gravity waterfall" flows backwards and rushes straight to the cliff! How beautiful is the rising air flow?

Gao Qiqiang is not the only one who can use the "Art of War"

Half a bottle is thrown away, and half a bottle is thrown away for a year! Why is the design of oyster sauce not improved?

Samsung's unreleased TV preview: finally able to play happily with mobile phones

How to write a promotional copy that goes viral with product thinking

The nearest black hole is in our "cosmic backyard"? Gaia BH1 is 1,600 light-years away from Earth!

15 major brands play with private domain traffic, 5 strategies to increase growth!

"China's Compound Eye" makes another big move!

Taking Meituan Takeaway’s personal report as an example, let’s talk about user personal behavior reports!

How to effectively avoid Apple’s capricious removal penalties?

Who is so "bold" to "judge" Einstein?

Can Douyin’s yellow V certification increase push notifications? How to increase fans?