Automatic test input generation for Android: are we done yet?

Automatic test input generation for Android: are we done yet?

Citation: SR Choudhary, A. Gorla, and A. Orso. Automated Test Input Generation for Android: Are We There Yet? In 30th IEEE/ACM International Conference on Automated Software Engineering (ASE 2015), 2015.

summary:

Like all software, mobile applications (Apps) must be thoroughly tested to ensure that they behave and perform as expected by developers. Therefore, in recent years, both researchers and practitioners have begun to study methods for automated testing of mobile applications. In particular, due to the open source nature of Android and its huge market share, there has been a large amount of research on tools for generating test inputs (usually GUI events such as clicks, slides, and inputs) for Android applications. Currently, there are many such tools, which have different test input generation methods, testing strategies, and different specific heuristics. In order to better understand the advantages and disadvantages of these existing methods and understand how to improve the effectiveness of the tools, we conducted a comprehensive comparison of the main existing test input generation tools for Android. We evaluate the effectiveness of these tools based on four metrics: ease of use, multi-platform compatibility, code coverage, and the ability to detect faults. Our results clearly demonstrate the state-of-the-art in input generation for Android applications and identify future research directions that, if properly studied, can lead to more effective and efficient testing tools for Android.

introduction:

There are many existing automated test input generation techniques with different input generation methods, test strategies, and specific heuristics. However, it is not clear what the advantages and disadvantages of these different methods are, how effective they are in general, and whether and how they need to be improved.

To answer these questions, we present a comparative study of existing test input generation techniques for Android.1 The study has two goals. The first goal is to evaluate these techniques (and corresponding tools), compare them, and assess for which test environments (e.g., application types) they may be more suitable. Our second goal is to better understand the general tradeoffs involved in Android test input generation and identify existing techniques that can be improved or define new techniques.

Tool Introduction:

Android test input generation tools can be divided into the following three categories according to their different test strategies:

Random testing strategy:

The advantage of input generators based on random testing strategies is that they can generate events efficiently, which makes them particularly suitable for stress testing. Their main disadvantage is that random strategies can hardly generate specific inputs. In addition, these tools do not know how much of the application's behavior has been covered, so they may generate redundant events. Finally, they have no stopping criteria for test completion, but instead adopt manually specified timeouts.

Tools that use this strategy: Monkey, Dynodroid, Null intent fuzzer, Intent Fuzzer, DroidFuzzer.

Model-based testing strategy:

Some Android testing tools build and use a model of the application's GUI to generate events and systematically test the application's behavior. These models are usually finite state machines, with the application's activities as states and GUI events as transitions. Some tools build precise models by distinguishing between activity states (for example, the same activity with enabled and disabled buttons will be represented as two separate states). Most tools build such models dynamically and stop testing when all generated events reach an existing state.

Tools that use this strategy: GUIRipper, ORBIT, A3E-Depth-First, SwiftHand, PUMA

System testing strategy:

Parts of an application's behavior can only be triggered by specific inputs, which is why some Android testing tools use more sophisticated techniques such as symbolic execution and evolutionary algorithms to guide testing of previously uncovered code. Using systematic testing strategies has clear advantages for application behaviors that random strategies cannot trigger. However, these tools are much less scalable than tools that use random strategies.

Tools that use this strategy: A3E-Targeted, EvoDroid, ACTEve, JPF-Android

Table 1. Overview of test input generation tools for Android Apps. The light grey rows represent the tools studied in this paper.

Empirical research:

The experiment uses four indicators to evaluate the test input generation tool: ease of use , multi-platform compatibility , code coverage , and the ability to detect faults. We selected a total of 68 Android mobile applications and used VirtualBox to provide virtual machines for the experiment. Each virtual machine was configured with 2 cores and 6GB RAM. Three Android versions of the virtual machine were configured, corresponding to the SDK versions: 10 (Gingerbread), 16 (Ice-cream sandwich), and 19 (Kitkat). For each tool run on each application, we reset the virtual machine and repeated it 10 times, taking the average of the experimental data.

For each run, we collected code coverage using Emma (http://emma.sourceforge.net/). We obtained application failures by collecting the logs (also known as logcat) during the test on the virtual machine, and we manually reviewed the authenticity of these application failures.

1. Ease of use and multi-platform compatibility

Table 2 reports whether the tool worked out of the box (NO EFFORT), required some effort (LITTLE EFFORT), either to configure it correctly or to fix minor issues, or required significant effort (MAJOR EFFORT). As of now, we are simply reporting our experience installing each tool.

Table 2. Ease of use and compatibility of each tool on common Android versions

2. Code coverage and fault detection capabilities

From Figure 1, we can see that, on average, Dynodroid and Monkey outperform the other tools, followed by ACTEve. The other three tools (i.e., A3E, GUIRipper, and PUMA) achieve quite low coverage levels. Nevertheless, even those tools that achieve low coverage on average can achieve very high coverage (around 80%) for some applications. We manually investigated these applications and found that they are the simplest applications.

Figure 1. Differences in coverage of each tool after 10 runs on each application Figure 2. Changes in coverage of each tool over time

Figure 2 shows that all tools reach maximum coverage within a few minutes (between 5 and 10), with the only exception being GUIRipper. The possible reason for this difference is that GUIRipper frequently restarts the test from its initial state, which is a time-consuming operation. (This is actually the main problem that SwiftHand solves by implementing a testing strategy that limits the number of restarts.)

Figure 3. Distribution of faults triggered by each tool

Figure 3 shows that only a minority of the failures involved custom exceptions (i.e., exceptions declared in the application under test). The vast majority of them resulted in standard Java exceptions, the most common of which was the NullPointerException.

Summarize:

In this paper, we present a comparative study of the major existing test input generation tools (and corresponding techniques) for Android. We evaluate these tools based on four criteria: ease of use, Android framework compatibility, achieved code coverage, and fault detection capabilities. Based on this comparative result, we identify and discuss the pros and cons of different techniques and highlight potential directions for future research in this area.

Acknowledgements

This article was translated and reprinted by Tian Yuanhan, a 2018 master's student from the School of Software, Nanjing University.

<<:  The "confrontation" between domestic mobile phones and Apple

>>:  QQ's 20th year: slimming down, becoming bloated, and being challenged by WeChat

Recommend

How Li Huaiyu built Whale into the Alibaba of VR

“Weijing wants to be the Alibaba of VR.” At a med...

Model S loses crown in U.S. electric car sales ranking in October

In October, the sales of electric vehicles (inclu...

High imitation Xiaomi Mall splash animation

Source code introduction This is a high imitation...

12 tips for writing titles for new media operations!

Writing a headline is like having a difficult lab...

Huawei reportedly in talks with third-party app stores to replace Google Play

On May 23, according to Phone Arena, Huawei is ne...

Does smoking affect your appearance? It’s true!

Author: Xiao Dan, Director of the Department of T...

10 things to know about Tik Tok live streaming!

Accompanying entrepreneurs in the entertainment i...

Peach Bunny Alliance short video shooting and editing skills

Introduction to Tao Xiaotu Alliance's short vi...

Community operation methodology: how to gather target users

In early 2015, communities began to become popula...

A comprehensive summary of design anomalies and how to handle them

In design, some abnormal situations are often eas...