What is Wrong with Statement Coverage

This paper presents an in-depth discussion of the risks and misconceptions of a commonly used code coverage metric.

Summary

Software developers and testers commonly use statement coverage because of its simplicity and availability in object code instrumentation technology. Of all the structural coverage criteria, statement coverage is the weakest, indicating the fewest number of test cases. Bugs can easily occur in the cases that statement coverage cannot see. The most significant shortcoming of statement coverage is that it fails to measure whether you test simple if statements with a false decision outcome. Experts generally recommend to only use statement coverage if nothing else is available. Any other metric is better.

Introduction

Statement coverage is a code coverage metric that tells you whether the flow of control reached every executable statement of source code at least once.

Attaining coverage on every source statement seems like a good objective. But statement coverage does not adequately take into account the fact that many statements (and many bugs) involve branching and decision-making. Statement coverage's insensitivity to control structures tends to contradict the assumption of code coverage testing itself: thorough testing requires exercising many combinations of branches and conditions.

In particular, statement coverage does not call for testing the following:

Simple if statements
Logical operators (&&, ||, and ?:)
Consecutive switch labels
Loop termination decisions
Do-while loops

Statement coverage has three characteristics that make it seem like a good coverage metric. Upon close inspection, they all become questionable. Statement coverage is:

Simple and fundamental
Measurable by object code instrumentation
Sensitive to the size of the code

Experts agree. A number of software testing books and papers give descriptions of statement coverage that range from "the weakest measure" to "not nearly enough".

Line coverage, basic block, and segment coverage are variations of statement coverage. They all have similar characteristics and this document applies equally to all of them, except where noted.

Code Coverage Testing is Really Path Testing

The fundamental assumption of code coverage testing is that to expose bugs, you should exercise as many paths through your code as possible. The more paths you exercise, the more likely your testing is to expose bugs. A path is a sequence of branches (decisions), or conditions (logical predicates). A path corresponds to a test case, or a set of inputs. In code coverage testing, branches have more importance than the blocks they connect. Bugs are often sensitive to branches and conditions. For example, incorrectly writing a condition such as i<=n rather than i<n may cause a boundary error bug.

Statement coverage encourages a view of source code as relatively important blocks of statements, incidentally connected by branches. When using statement coverage, you can easily focus on testing the blocks of code and forget about testing the logic that binds them. If you were testing a brick wall, you would focus on the mortar as much as the bricks.

Specific Issues

Simple If-Statements

Statement coverage does not call for testing simple if statements. A simple if statement has no else-clause. To attain full statement coverage requires testing with the controlling decision true, but not with a false outcome. No source code exists for the false outcome, so statement coverage cannot measure it.

If you only execute a simple if statement with the decision true, you are not testing the if statement itself. You could remove the if statement, leaving the body (that would otherwise execute conditionally), and your testing would give the same results.

Since simple if statements occur frequently, this shortcoming presents a serious risk.

See Simple If-Statement Example.

Logical Operators

Statement coverage does not call for testing logical operators. In C++ and C these operators are &&, ||, and ?:. Statement coverage cannot distinguish the code separated by logical operators from the rest of the statement. Executing any part of the code in a statement causes statement coverage to declare the whole statement fully covered. When logical operators avoid unnecessary evaluation (by short circuit), statement coverage gives an inflated coverage measurement.

This problem often occurs even when logical operators occur on different source code lines. Some compilers, such as Microsoft C++, only provide one debug line number for a decision, even if it spans multiple source lines.

See Logical Operator Example.

Consecutive Switch Labels

Statement coverage does not show the need to test separate consecutive switch statements labels. Consecutive switch labels have no statements between them. Statement coverage only calls for testing the code following the labels.

This pitfall leads to incomplete testing because statement coverage assumes the value checking done by switch statements (the object code) is irrelevant. In fact, the different values in a switch controlling expression may reflect different test scenarios even if the values are handled by the same code.

See Consecutive Switch Labels Example.

Loop Termination Decisions

Statement coverage does not call for testing loop termination decisions. Statement coverage only calls for executing loop bodies. In a loop that stops with a C++/C break statement, this deficiency hides test cases needed to expose bugs related to boundary checking and off-by-one mistakes.

See Loop Termination Decision Example.

Do-While Loops

Statement coverage does not call for testing iteration of do-while loops. Since do-while loops always execute at least once, statement coverage sees them as fully covered whether or not they repeat.

If you only execute a do-while without repeating the loop, you are not testing the loop. You could remove the do-while, leaving the statements that would otherwise execute repetitively, and your testing would give the same results.

See Do-While Loop Example.

Common Misconceptions

Simple and Fundamental

Statement coverage is the simplest structural coverage metric in that it calls for the least testing in order to achieve full coverage. Additionally, statement coverage is a fundamental metric in that most other structural coverage metrics include statement coverage. However, statement coverage is not the simplest metric to understand and statement coverage is not fundamental to good testing.

Some coverage metrics other than statement coverage are fairly simple. Condition/decision coverage calls for exercising all decisions and logical conditions with both true and false outcomes. This metric is simple to understand and leads to more complete testing than statement coverage.

Testing experts often describe statement coverage as a basic or primary level of coverage. Most other structural coverage metrics subsume, or include, statement coverage. However, this only holds for full coverage, which rarely occurs in practice even with statement coverage. The difficulty of attaining additional coverage increases exponentially with all types of coverage. Rather than spend your time on the most difficult part of statement coverage, you make better progress using a more sensitive coverage metric that offers more test cases, some of which may require relatively little effort.

Even if you do achieve 100% statement coverage, you have not necessarily exercised all your object code even though it appears you have exercised all your source code. The object code corresponding to branches is still vulnerable.

Statement coverage may be the most basic metric, but it is not part of good testing.

Measurable By Object Code Instrumentation

Compared to source code instrumentation, object code instrumentation typically operates more quickly and supports multiple programming languages.

However, the reason object code instrumentation coverage analyzers measure statement coverage is because statement coverage is the only metric they can implement. Stronger coverage metrics require source code instrumentation.

A statement coverage analyzer usually results from leveraging an existing product line that is based on object code instrumentation. The instrumentation needed for statement coverage analysis shares similarities with the technology needed for profiling, debugging and run-time error checking. Rarely does anyone develop object code instrumentation for the sole purpose of making a coverage analyzer. Typically, a company develops other code analysis tools, and then applies the technology to coverage analysis later. Conversely, coverage analyzers that use source code instrumentation invariably support coverage metrics stronger than statement coverage.

Choosing statement coverage because your profiler supports it is like using locking pliers as a wrench. They will work, but if you are going to tighten more than a few nuts, you want to get a wrench.

Sensitivity To Basic Block Length

At first, sensitivity to basic block length might seem like a benefit. If you assume an even distribution of bugs through code, it makes sense to expect the percentage of statements covered to reflect the percentage of bugs discovered. See Sensitivity To Basic Block Length Example 1

However, if you assume bugs occur more often due to interactions with control structures than in isolated computations, statement coverage's insensitivity to control structures is a drawback. Path testing fundamentally assumes that you must exercise many paths through your code to find bugs. It makes more sense to expect the number of tested branches and conditions to reflect the percentage of bugs discovered. See Sensitivity To Basic Block Length Example 2

Sensitivity to basic block length is not beneficial since it comes at the expense of sensitivity to paths and test cases.

Basic block coverage is not sensitive to basic block length. Basic block coverage is the same as statement coverage except the unit of code measured is each sequence of non-branching statements. Segment coverage is another name for basic block coverage.

Code Examples

Simple If-Statement Example

The C++ code fragment below contains a simple if statement.

int* p = NULL;
if (condition) {
    p = &variable;
    *p = 1;
}
*p = 0; // Oops, possible null pointer dereference

Without a test case that causes condition to evaluate false, statement coverage declares this code fully covered. In fact, if condition ever evaluates false, this code dereferences a null pointer.

Logical Operator Example

The C++ function below contains a statement with a logical-or operator that may circumvent executing the rest of the statement.

void function(const char* string1, const char* string2 = NULL);
...
void function(const char* string1, const char* string2)
{
    if (condition || strcmp(string1, string2) == 0) // Oops, possible null pointer passed to strcmp
    ...
}

Statement coverage declares this code fragment fully covered when condition is true. With condition false, the call to strcmp gets an invalid argument, a null pointer.

Consecutive Switch Labels Example

The C++ code fragment below uses a switch statement to convert error codes to strings.

message[EACCES] = "Permission denied";
message[ENODEV] = "No such device";
message[ENODEV] = "No such file or directory"; // Oops, should be ENOENT
...
switch (errno) {
case EACCES:
case ENODEV:
case ENOENT:
    printf("%s\n", message[errno]);
    break;
...

This program clearly anticipates three different errors. You can satisfy statement coverage with just one error, errno=EACCESS. Statement coverage says that testing with this error is just as good as another. However, this code incorrectly initializes message for ENODEV twice, but does not initialize message for ENOENT. Testing with either of these errors exposes the problem, but statement coverage does not call for them.

Loop Termination Decision Example

The C++ function below copies a string from one buffer to another.

char output[100];
for (int i = 0; i <= sizeof(output); i++) { // Oops, buffer overrun; comparison should be <
    output[i] = input[i];
    if (input[i] == '\0') {
        break;
    }
}

The main loop termination decision, i <= sizeof(output), intends to prevent overflowing the output buffer. You can achieve full statement coverage without testing this condition. The overflow decision correctly ought to use operator < rather than operator <=. You get full statement coverage of this code with any input string of length less than 100, without exposing the bug.

Do-While Loop Example

Consider the C++ function below, which initializes a string buffer with an optional input string.

void initString(char* output, const char* input = "")
{
    int i = 0;
    do {
        output[i] = input[i];
    } while (input[i] != '\0'); // Oops, loop variable not incremented
}

You can achieve full statement coverage without repeating this loop. Testing with a zero-length input string is sufficient for statement coverage. The problem is the programmer forgot to increment the index. Any non-zero length input string causes an infinite loop.

Sensitivity To Basic Block Length Example 1

The C++ if-else statement below contains a lot of code in the then-clause, but very little in the else-clause.

if (condition) {
    // 99 statements
    statement1;
    statement2;
    ...
    statement99;
} else {
    // 1 statement
    statement100;
}

With condition true, you obtain 99% statement coverage. With a successful test, you can conclude that 99% of the code has no bugs. In the reverse senario with condition false, you obtain just 1% statement coverage. Statement coverage seems to measure the relative importance of the two test cases proportionately.

Sensitivity To Basic Block Length Example 2

You can achieve 100% statement coverage of the C++ code fragment below with one test case, without exposing any bugs. The test case is {condition=true, errno=EACCES, input=""}. However, there are many other feasible paths through this code which expose one of the five bugs. Statement coverage indicates of the number of bugs very poorly.

int* p = NULL;
if (condition) {
    p = &variable;
    *p = 1;
}
*p = 0; // Oops, possible null pointer dereference
const char* string2 = NULL;
if (condition || strcmp(string1, string2) == 0) // Oops, possible null pointer dereference
    statement;
message[EACCES] = "Permission denied";
message[ENODEV] = "No such device";
message[ENODEV] = "No such file or  directory"; // Oops, should be ENOENT
switch (errno) {
case EACCES:
case ENODEV:
case ENOENT:
    printf("%s\n", message[errno]);
    break;
    ...
}
char output[100];
for (int i = 0; i <= sizeof(output); i++) { // Oops, buffer overrun; comparison should be <
    output[i] = input[i];
    if (input[i] == '\0') {
        break;
    }
}
int i = 0;
do {
    output[i] = input[i];
} while (input[i] != '\0'); // Oops, loop variable not incremented

What Others Say About Statement Coverage

Testing Computer Software by Cem Kaner, Hung Quoc Nguyen and Jack Falk (1999) compares statement coverage, branch coverage and condition coverage. The book says:

Line coverage is the weakest measure. ... Although line coverage is more than some programmers do, it is not nearly enough.

In the paper Software unit test coverage and adequacy (1997), the authors say:

... statement coverage is so weak that even some control transfers may be missed from an adequate test.

Managing the Software Process by Watts S. Humphrey (1989) says:

The simplest approach is to ensure that every statement is exercised at least once. A more stringent measure is to require coverage of every path within a program. ... A more practical measure is to exercise each condition for each decision statement at least once ...

The paper Software Negligence and Testing Coverage by Cem Kaner (1996) discusses statement coverage, branch coverage and path coverage. He says:

Line coverage measures the number / percentage of lines of code that have been executed. But some lines contain branches - the line tests a variable and does different things depending on the variable's value.

Software Testing Techniques by Boris Beizer (1996) discusses path coverage, statement coverage and branch coverage. He says:

[Statement coverage] is the weakest measure in the family [of structural coverage criteria]: testing less than this for new software is unconscionable ...

Brian Marick, a noted expert and author on software testing, said:

I'd rather use branch coverage, but if I can't - perhaps I don't have the source to instrument, ... line coverage is better than nothing.