Square’s App Visibility Principles

Square’s App Visibility Principles

[[147070]]

background

Over the past five years, our site has evolved significantly. Our technical community has grown from a monolithic application processing community to a microservices architecture community. The changes and growth of our services have brought new challenges to application visibility. In today's blog post, we will provide some guiding principles and show the technology we use to detect and visualize our service ecosystem. Starting today, we will open source various parts of our service monitoring and visualization technology!

in principle

Some guiding principles are as follows:

  • Pay attention to ease of use as early as possible. With the microservice architecture, it is very easy to collect a large number of signals. A good user interface must be able to extract information from the signals.

  • Identify and present the most important aspects of the metric. We all agree that humans can only effectively handle a few tasks at a time. Therefore, any question about the metric should be answered in a limited number of times. For example:

    • Top-N API indicators are categorized by underlying factors or changed weekly.

    • Automatic problem detection. The inspect tool can reveal obvious system problems.

  • Applications should have good metrics by default from the start. We ensure good monitoring in our standard application, and our dashboard includes the following metrics:

    • database

    • Hosts and containers

    • Performance metrics for HTTP/REST endpoints

    • VM data used by the JVM to run service components

  • Warnings should be concise and relevant. We monitor a large number of indicators when warnings occur, with the goal of improving the performance of use and avoiding the lack of corresponding measures for warnings.

    • Warnings should be prompt and responded to immediately.

    • Warnings should be considered unusual events when they occur.

    • All warnings should require AI processing.

    • All warnings should be reproducible.

application

Under the above principles, the applications we use on the Square website are:

  • Appdash. Use this app to quickly get information about your app, including:

    • Operation information, such as which hosts are running, what has been released, etc. can be obtained

    • Application-dependent geometry

    • Events and exceptions from your application

    • Capacity Modeling

  • MetricsDashboard. Using this application, you can view the metrics of all platforms and applications. Below is an example of a dashboard in the metricsdashboard UI database.

  • Presidio. It is a log search application based on Elasticsearch. It provides an interface for application developers to easily find patterns that may cause errors, or help developers track an event in multiple services.

  • Equilibrium. It is our next generation alerting system and is rapidly replacing the Nagios infrastructure. Equilibrium is easy to use, has better reliability and balance. It was influenced by our experience using Nagios and working for other companies, and is in line with the current open source trend.

Now, we open source a seemingly small but very important project in the system: inspect. Inspect is a collection of libraries that we use to collect Linux, MySQL, and PostgreSQL metrics. The project also provides Linux command-line tools that can perform basic problem detection.

Conclusion

We hope that inspect was helpful, and that this blog post gave you a good understanding of the monitoring and alerting systems we use at Square. We will go into more detail about each system in subsequent blog posts. As always, please check back at https://corner.squareup.com/***Updates

<<:  It’s that simple! 5 simple principles to help you master the user experience design of your product

>>:  The future of virtual reality: from game accessories to control devices

Recommend

Terminal Wars: Xiaomi's Model Is Not Huawei's Cup of Tea

Recently, Huawei confirmed to the media that Zhu ...

What are the tasks that SEO personnel have to do every day?

Many people say that SEO is no longer effective, ...

Observed! This is the oldest black hole ever discovered →

Webb telescope detects oldest black hole yet The ...

Are peanuts harmful or beneficial? How to eat them more healthily?

Peanuts are not unfamiliar to us. They are also k...

Want to be an independent freelance developer? It’s always hard to get started

[[139008]] Getting started is hard, it can get di...

Useful Information | What does a high-conversion information flow ad look like?

For information flow advertising , the content an...