Automatic test input generation for Android: are we done yet?

Citation: SR Choudhary, A. Gorla, and A. Orso. Automated Test Input Generation for Android: Are We There Yet? In 30th IEEE/ACM International Conference on Automated Software Engineering (ASE 2015), 2015.

summary:

Like all software, mobile applications (Apps) must be thoroughly tested to ensure that they behave and perform as expected by developers. Therefore, in recent years, both researchers and practitioners have begun to study methods for automated testing of mobile applications. In particular, due to the open source nature of Android and its huge market share, there has been a large amount of research on tools for generating test inputs (usually GUI events such as clicks, slides, and inputs) for Android applications. Currently, there are many such tools, which have different test input generation methods, testing strategies, and different specific heuristics. In order to better understand the advantages and disadvantages of these existing methods and understand how to improve the effectiveness of the tools, we conducted a comprehensive comparison of the main existing test input generation tools for Android. We evaluate the effectiveness of these tools based on four metrics: ease of use, multi-platform compatibility, code coverage, and the ability to detect faults. Our results clearly demonstrate the state-of-the-art in input generation for Android applications and identify future research directions that, if properly studied, can lead to more effective and efficient testing tools for Android.

introduction:

There are many existing automated test input generation techniques with different input generation methods, test strategies, and specific heuristics. However, it is not clear what the advantages and disadvantages of these different methods are, how effective they are in general, and whether and how they need to be improved.

To answer these questions, we present a comparative study of existing test input generation techniques for Android.1 The study has two goals. The first goal is to evaluate these techniques (and corresponding tools), compare them, and assess for which test environments (e.g., application types) they may be more suitable. Our second goal is to better understand the general tradeoffs involved in Android test input generation and identify existing techniques that can be improved or define new techniques.

Tool Introduction:

Android test input generation tools can be divided into the following three categories according to their different test strategies:

Random testing strategy:

The advantage of input generators based on random testing strategies is that they can generate events efficiently, which makes them particularly suitable for stress testing. Their main disadvantage is that random strategies can hardly generate specific inputs. In addition, these tools do not know how much of the application's behavior has been covered, so they may generate redundant events. Finally, they have no stopping criteria for test completion, but instead adopt manually specified timeouts.

Tools that use this strategy: Monkey, Dynodroid, Null intent fuzzer, Intent Fuzzer, DroidFuzzer.

Model-based testing strategy:

Some Android testing tools build and use a model of the application's GUI to generate events and systematically test the application's behavior. These models are usually finite state machines, with the application's activities as states and GUI events as transitions. Some tools build precise models by distinguishing between activity states (for example, the same activity with enabled and disabled buttons will be represented as two separate states). Most tools build such models dynamically and stop testing when all generated events reach an existing state.

Tools that use this strategy: GUIRipper, ORBIT, A3E-Depth-First, SwiftHand, PUMA

System testing strategy:

Parts of an application's behavior can only be triggered by specific inputs, which is why some Android testing tools use more sophisticated techniques such as symbolic execution and evolutionary algorithms to guide testing of previously uncovered code. Using systematic testing strategies has clear advantages for application behaviors that random strategies cannot trigger. However, these tools are much less scalable than tools that use random strategies.

Tools that use this strategy: A3E-Targeted, EvoDroid, ACTEve, JPF-Android

Table 1. Overview of test input generation tools for Android Apps. The light grey rows represent the tools studied in this paper.

Empirical research:

The experiment uses four indicators to evaluate the test input generation tool: ease of use , multi-platform compatibility , code coverage , and the ability to detect faults. We selected a total of 68 Android mobile applications and used VirtualBox to provide virtual machines for the experiment. Each virtual machine was configured with 2 cores and 6GB RAM. Three Android versions of the virtual machine were configured, corresponding to the SDK versions: 10 (Gingerbread), 16 (Ice-cream sandwich), and 19 (Kitkat). For each tool run on each application, we reset the virtual machine and repeated it 10 times, taking the average of the experimental data.

For each run, we collected code coverage using Emma (http://emma.sourceforge.net/). We obtained application failures by collecting the logs (also known as logcat) during the test on the virtual machine, and we manually reviewed the authenticity of these application failures.

1. Ease of use and multi-platform compatibility

Table 2 reports whether the tool worked out of the box (NO EFFORT), required some effort (LITTLE EFFORT), either to configure it correctly or to fix minor issues, or required significant effort (MAJOR EFFORT). As of now, we are simply reporting our experience installing each tool.

Table 2. Ease of use and compatibility of each tool on common Android versions

2. Code coverage and fault detection capabilities

From Figure 1, we can see that, on average, Dynodroid and Monkey outperform the other tools, followed by ACTEve. The other three tools (i.e., A3E, GUIRipper, and PUMA) achieve quite low coverage levels. Nevertheless, even those tools that achieve low coverage on average can achieve very high coverage (around 80%) for some applications. We manually investigated these applications and found that they are the simplest applications.

Figure 1. Differences in coverage of each tool after 10 runs on each application Figure 2. Changes in coverage of each tool over time

Figure 2 shows that all tools reach maximum coverage within a few minutes (between 5 and 10), with the only exception being GUIRipper. The possible reason for this difference is that GUIRipper frequently restarts the test from its initial state, which is a time-consuming operation. (This is actually the main problem that SwiftHand solves by implementing a testing strategy that limits the number of restarts.)

Figure 3. Distribution of faults triggered by each tool

Figure 3 shows that only a minority of the failures involved custom exceptions (i.e., exceptions declared in the application under test). The vast majority of them resulted in standard Java exceptions, the most common of which was the NullPointerException.

Summarize:

In this paper, we present a comparative study of the major existing test input generation tools (and corresponding techniques) for Android. We evaluate these tools based on four criteria: ease of use, Android framework compatibility, achieved code coverage, and fault detection capabilities. Based on this comparative result, we identify and discuss the pros and cons of different techniques and highlight potential directions for future research in this area.

Acknowledgements

This article was translated and reprinted by Tian Yuanhan, a 2018 master's student from the School of Software, Nanjing University.

<<: The "confrontation" between domestic mobile phones and Apple

>>: QQ's 20th year: slimming down, becoming bloated, and being challenged by WeChat

What does Android 5.0 look like? See here

"If you don't dry your quilt in the sun for years, you will have millions of mites sleeping with you." This is not a joke

Blog

How do Auntie, Integrity Selection, and Hot Mama Gang promote and operate?

Recommend

A Douyin video brings 270 million exposures to the brand? The most comprehensive guide to becoming a Blue V on Douyin for your business!

I just checked and got 683 points. All these year...

The entire process of new product launch promotion plan!

Table of contents: Chapter 1: Industry Background...

The thinking strategies that must be learned for promotion and traffic generation, let’s talk about how to generate traffic currently?

In the Internet age, everyone only recognizes fou...

The difficulty is comparable to the docking of a space capsule! For the first time in Olympic history, a robot relayed the Winter Olympics torch underwater

◎ Science and Technology Daily reporter He Liang,...

Automatic test input generation for Android: are we done yet?

summary:

introduction:

Tool Introduction:

Empirical research:

Summarize:

Acknowledgements

What does Android 5.0 look like? See here

Do you know how Thread works?

Soaking the server in water turns out to be for... | Digital Literacy

How to play fission well? Share 2 tips!

CATL is in a panic. New technology breakthroughs have greatly reduced Panasonic's battery manufacturing costs

China Passenger Car Association: Analysis of my country's passenger car production capacity issues in 2021

"If you don't dry your quilt in the sun for years, you will have millions of mites sleeping with you." This is not a joke

How do Auntie, Integrity Selection, and Hot Mama Gang promote and operate?

New media operation: How to create a "million-dollar" self-media matrix?

Why are the giants still flocking to autonomous driving despite frequent accidents?

Recommend

A Douyin video brings 270 million exposures to the brand? The most comprehensive guide to becoming a Blue V on Douyin for your business!

The entire process of new product launch promotion plan!

The thinking strategies that must be learned for promotion and traffic generation, let’s talk about how to generate traffic currently?

The difficulty is comparable to the docking of a space capsule! For the first time in Olympic history, a robot relayed the Winter Olympics torch underwater

Konka portable projector L1 review: a "pocket cinema" for only 1,000 yuan

What is SEM bidding promotion? What parts does it consist of?

A comprehensive review of online education live broadcast operation skills

Analysis of the entire process of community operation (8 steps of comprehensive and detailed analysis)

How can financial management apps scientifically formulate activation strategies?

AI is invited to give its "acceptance speech"! Has the new era of science empowered by AI arrived?

Which SEO training is best? Which SEO technical training is the best?

Apple's self-driving test vehicles have increased to 66, aiming to catch up with Google and GM

Magical contrast: Why can some people not open bottle caps but split watermelons with their bare hands?

China's most high-tech luxury car Aion LX is on sale, starting at RMB 249,600

Yu Yongfu: "Internet +" will bring increments and variables