What do you need to know to ensure mobile app quality?

What do you need to know to ensure mobile app quality?

This article is the on-site dry goods of WOT2016 Internet Operation and Developer Conference. The new session with the theme of WOT2016 Enterprise Security Technology Summit will be held at JW Marriott Hotel in Beijing Pearl River Delta from June 24th to 25th, 2016!

My name is Yang Qiang. I am responsible for a wide range of tasks at Alibaba, from offline R&D management, release processes, quality assurance, to online client monitoring and operation, etc. Today I would like to share with you some of the operations and optimization work we have done on Taobao Mobile.

[[166240]]

After I joined Taobao Mobile in 2012, I found that the APP was much more painful than the PC version. The first was crashes, which made Taobao Mobile unusable. The second was the slow response time, which could be as long as five or six seconds, which was unacceptable to users. There was also the problem of power and data consumption. There were also some similar problems to those on the PC, such as abnormal page loading or some security issues.

Faced with so many difficulties to be solved, the solution must be to discover, discover and then solve, and then see if the problem still exists. It is easier said than done, but there are still many complexities on the end.

First, the models are complex . According to incomplete statistics, there are more than 20,000 models on the market. How to ensure that our APP is available on these 20,000 models is a very difficult task for us. We once received feedback from users that Taobao orders on mobile phones could not be used. We tried offline and tried all models. It worked offline. Why can't it work? ***The customer took out a BlackBerry phone and said that BlackBerry can also be compatible with Android APK.

Another issue is the network . Mobile developers may have a headache about the network because the network is a mobile network and users use it in many environments, such as on high-speed trains, in basements, or in remote mountainous areas. We must ensure the high availability of the software in these scenarios.

With the above two characteristics, plus some other reasons, another problem arises: there are many small-probability problems . Because our Taobao Mobile may have hundreds of millions of users, but sometimes only a few hundred or ten users have some problems. When a problem is found, it is necessary to reproduce and locate it. This method is not like the server side, where you can just find a bug on the server and check the server log. You also need to understand the environment where the mobile phone is at the time.

Another problem is the means of repair . The client is released and delivered through the application market, but how to solve the problem when it occurs? Now an APK may be 30M or 40M. In order to solve a small bug, it costs a lot to push a 30M or 40M package.

So, how do we deal with and solve these problems? The first is to ensure the stability of the client. We did the following: customize various clustering algorithms, multi-dimensional analysis and distribution, and alarm and monitoring according to business dimensions.

Let's talk about the more common error monitoring . Error monitoring can be roughly divided into several categories: the first one is definitely the network layer, various timeouts, various test failures, etc. Interface calls are calls to interfaces on the network. Whether it is a failure at the network layer or a failure at the business layer, there will be corresponding monitoring. Business errors are on the end. In fact, your end sometimes also has a lot of business logic, so we will add some monitoring points to these logics. Data errors mean that some of the data itself may be man-made, and the modification of the code will cause a change in format or data volume, so we also need to analyze and monitor the data itself.

In line with the principle of saving costs and doing big things, we divide these things. First of all, real-time alarms may be at the minute level or the second level, which is to ensure these key error points, such as crash rate, network errors, some business errors, order failures, and other types of errors. Then we can use Alibaba Cloud's big data computing service to achieve hourly or half-hourly monitoring, that is, run an indicator every half hour or every hour, and then compare it with the previous data.

With the above monitoring, indicator measurement, etc., we can carry out some special work to promote the repair of some problems, or even promote a change in the architecture. Of course, we will have an active monitoring system to carry out this kind of active monitoring at the functional and network levels. To put it simply, it continuously runs automated cases online, and alarms are issued when cases fail.

After talking about technical monitoring, the most important point I want to say is that user feedback is very important. Let's take a simple example. For example, if an APP has a problem at the earliest stage of startup, it may not start at all after installation, and you can't feel anything. What should you do? You can only rely on user feedback, what users have reported in major markets.

We have done several things with the public opinion system , of course, this is only for user feedback. First, we classified them into system failures, experience issues, product issues, and customer inquiries. We discovered them through the public opinion system, and then intervened, investigated, and repaired them.

What should we do when we encounter a series of very serious business logic errors? We will perform dynamic deployment . Compare the old package to generate a difference, and then send it to the client. ***If all other means fail, we can only re-release the version. Of course, there are strategies for re-releasing the version. So we have a complete set of release strategies, which can be used for release, dynamic deployment, hotpatch configuration, etc.

Smart human control of the volume release means that when the volume of the APP is very large, it is impossible to release it all at once. First release a version, check the indicators, and then release it again if the system indicators are qualified. It is a gradual gray process.

*** is the full amount , using a combination of push and pull. Pull is to pull the interface, push is to push a message down from the channel, and then receive the message. High timeliness may also put a lot of pressure on the server, so this combination of push and pull can improve our arrival rate.

To summarize what has been said above, we will have real-time alarms, offline analysis, user feedback analysis, real-time log analysis, analysis of the server-side distributed call link, and a complete set of fault solutions. We will continue to optimize the offline quality assurance system, release strategy, release process, etc. Although we have done some work at this stage, it may not be so perfect. We have been working hard to build a set of end-to-end operation and maintenance analysis tools.

I've finished, thank you everyone!

This article is compiled from the wonderful speech by Yang Qiang, a senior technical expert from Alibaba, on the theme of "Mobile APP Quality Assurance Work" at the WOT2016 Internet Operation and Developer Conference hosted by 51CTO Media.

Lecture video: http://edu..com/lesson/id-100756.html

Lecturer Profile:

[[166245]]

Yang Qiang joined Alibaba in 2012 and is responsible for the R&D support of wireless products and the work related to the wireless online monitoring and operation and maintenance platform. He has supported the R&D support and online monitoring and operation and maintenance work of Alibaba's internal apps such as Taobao Mobile, Tmall, Juhuasuan, and DingTalk.

<<:  Creating JavaScript modules with Babel and ES7

>>:  Who has been the winner in life in the past few days?

Recommend

What impact does collective narcissism have on society?

© Caravaggio/The Atlantic Leviathan Press: I pers...

Marketing planning and promotion: How to create Labor Day H5?

The May Day holiday is a hot topic, so brands cer...

With an accuracy rate of only 15%, GPT-4 is far inferior to humans?

Currently, large language models (LLMs) may be th...

How to solve the problem of too high average click price in bidding promotion?

The most effective way to directly control the av...

A guide to marketing and screen-sweeping on Bilibili’s “Houlang”!

During the May 4th Youth Day not long ago, "...

Urgent release! iOS14.7.1 official version update, fix issues

This time, the official version of iOS14.7.1 was ...