Square’s App Visibility Principles

Square’s App Visibility Principles

[[147070]]

background

Over the past five years, our site has evolved significantly. Our technical community has grown from a monolithic application processing community to a microservices architecture community. The changes and growth of our services have brought new challenges to application visibility. In today's blog post, we will provide some guiding principles and show the technology we use to detect and visualize our service ecosystem. Starting today, we will open source various parts of our service monitoring and visualization technology!

in principle

Some guiding principles are as follows:

  • Pay attention to ease of use as early as possible. With the microservice architecture, it is very easy to collect a large number of signals. A good user interface must be able to extract information from the signals.

  • Identify and present the most important aspects of the metric. We all agree that humans can only effectively handle a few tasks at a time. Therefore, any question about the metric should be answered in a limited number of times. For example:

    • Top-N API indicators are categorized by underlying factors or changed weekly.

    • Automatic problem detection. The inspect tool can reveal obvious system problems.

  • Applications should have good metrics by default from the start. We ensure good monitoring in our standard application, and our dashboard includes the following metrics:

    • database

    • Hosts and containers

    • Performance metrics for HTTP/REST endpoints

    • VM data used by the JVM to run service components

  • Warnings should be concise and relevant. We monitor a large number of indicators when warnings occur, with the goal of improving the performance of use and avoiding the lack of corresponding measures for warnings.

    • Warnings should be prompt and responded to immediately.

    • Warnings should be considered unusual events when they occur.

    • All warnings should require AI processing.

    • All warnings should be reproducible.

application

Under the above principles, the applications we use on the Square website are:

  • Appdash. Use this app to quickly get information about your app, including:

    • Operation information, such as which hosts are running, what has been released, etc. can be obtained

    • Application-dependent geometry

    • Events and exceptions from your application

    • Capacity Modeling

  • MetricsDashboard. Using this application, you can view the metrics of all platforms and applications. Below is an example of a dashboard in the metricsdashboard UI database.

  • Presidio. It is a log search application based on Elasticsearch. It provides an interface for application developers to easily find patterns that may cause errors, or help developers track an event in multiple services.

  • Equilibrium. It is our next generation alerting system and is rapidly replacing the Nagios infrastructure. Equilibrium is easy to use, has better reliability and balance. It was influenced by our experience using Nagios and working for other companies, and is in line with the current open source trend.

Now, we open source a seemingly small but very important project in the system: inspect. Inspect is a collection of libraries that we use to collect Linux, MySQL, and PostgreSQL metrics. The project also provides Linux command-line tools that can perform basic problem detection.

Conclusion

We hope that inspect was helpful, and that this blog post gave you a good understanding of the monitoring and alerting systems we use at Square. We will go into more detail about each system in subsequent blog posts. As always, please check back at https://corner.squareup.com/***Updates

<<:  It’s that simple! 5 simple principles to help you master the user experience design of your product

>>:  The future of virtual reality: from game accessories to control devices

Recommend

Physical stores doing private domain community marketing? 7 ways to tell you!

When doing store customer development activities,...

Some useful UX/UI design tools and download addresses abroad

The design tools you use today may no longer be s...

China's mobile phone sales report for 2021 is released

On January 26, the third-party research organizat...

A brief discussion on Li Jiaqi’s private domain traffic strategy!

Li Jiaqi may be one of the hottest names in 2019....

How to carry out Shatin SEO optimization? How to optimize SEO?

Website optimization is divided into internal web...

How can community operations improve community activity?

In this article, the author divides community ope...

Google Chrome for Android gets two-factor authentication

[[415122]] In order to reduce the risk of Interne...

A must-read for APP promotion, complete promotion data analysis in 3 steps!

When I was reporting business dynamics to the bos...

Activity launch mode (launchMode) detailed explanation

There are four Activity startup modes: standard, ...