jody@acm.org |
Software
Engineering General Testing and Debugging Guidelines |
CONTENTS Testing The
Testing Attitude Black Box
Testing White Box
Testing Testing
Guidelines Debugging Definition
of Debugging Debugging
by ... Brute
Force Induction Deduction Backtracking Testing Debugging
Guidelines (Error Locating) Debugging
Guidelines (Error Repairing) Error
Collection and Analysis Test
Completion Criteria Specific
Test Case Design Detecting
a Specified Number of Errors Using
Error Detection Rate Charts Sources |
TESTINGThe Testing AttitudeAs with any form of human behavior, software testing is dominated by the attitude of the tester toward the testing activity. This attitude toward testing can be summarized in a statement of the primary goals of testing. Some of the prevailing goals of testing are stated as follows:
While these are admirable goals of testing, that of demonstrating the correctness of the software, the underlying attitude tends to permeate the complete testing activities resulting in test case selection and execution which tend to show that no errors are present. One problem with these goals is that it is virtually impossible to
remove all of the errors in a non-trivial software program. Hence
the goals are unrealistic to start with. Another problem is that although
a program may perform all of its intended functions, it may still contain
errors in that it also performs unintended functions.
A much more productive goal of testing is the following:
This goal assumes that errors are present in the software, an assumption which is true for virtually all software and one which exhibits a much more productive attitude towards software testing, that of stressing the software to the fullest, with the goal of finding the errors. Since this goal is much more conducive to finding errors, it is also much more likely to increase the reliability of the software. One way of detecting the prevailing attitude toward testing is to see how the words "successful" and "unsuccessful" are used in describing test case results. If a test case which uncovers no errors is considered to be successful, this is a sign that an unproductive attitude exists. Such a test case adds no reliability to the software and is hence largely a waste of time and energy. A successful test case should be one that uncovers errors. In fact, the more errors uncovered, the better the test case. Black-Box TestingTwo alternate and complimentary approaches to testing are called
black-box and white-box testing. Black-box testing is also called
data-driven (or input/output-driven) testing. In using this approach, the
tester views the program as a black box and is not concerned about the
internal behavior and structure of the program. The tester is only
interested in finding circumstances in which the program does not behave
according to its specifications. Test data are derived solely from the
specifications (i.e., without taking advantage of knowledge of the internal
structure of the program). If one wishes to find all errors in the program, using this approach,
the criterion is exhaustive input testing. Exhaustive input testing is the
use of every possible input condition as a test case. Since this is
usually impossible or impractical from an economic view point, exhaustive
input testing is rarely used. In order to maximize the yield on the
testing investment (i.e., maximize the number of errors found by a finite
number of test cases), the white-box approach is also used. White-Box TestingAnother testing approach, white-box or logic-driven structural testing,
permits one to examine the internal structure of the program. In using
this strategy, the tester derives test data from an examination of the
program's logic and structure. The analog to exhaustive input testing of the black-box approach is
usually considered to be exhaustive path testing. That is, if one executes
(via test cases) all possible paths of control flow through the program,
then possibly the program can be said to be completely tested. There are two flaws in this statement, however. One is that the number
of unique logic paths through a program is astronomically large. The
second flaw in the statement that exhaustive path testing means a complete
test is that the path in a program could be tested, yet the program might
still be loaded with errors. There are three explanations for this. The
first is that an exhaustive path test in no way guarantees that a program
matches its specification. Second, a program may be incorrect because of
missing paths. Exhaustive path testing, of course, would not detect the
absence of necessary paths. Third, an exhaustive path test might not
uncover data-sensitivity errors. Although exhaustive input testing is superior to exhaustive path
testing, neither prove to be useful strategies because both are infeasible.
Some way of combining elements of both black-box and white-box testing to
derive reasonable, but not air-tight, testing strategy is desirable. |
Testing GuidelinesThe following set of testing guidelines are suggested by
Myers*. They are interesting in that most of them appear to be
intuitively obvious, yet they are often overlooked.
If the expected result of a test case has not been predefined, chances
are that a plausible, but erroneous, result will be interpreted as a
correct result because there is a subconscious desire to see the correct
result. One way of combating this is to encourage a detailed examination
of all output by precisely spelling out, in advance, the expected output of
the program.
It is extremely difficult, after a programmer has been constructive
while designing and coding a program, to suddenly, overnight, change his or
her perspective and attempt to form a completely destructive frame of mind
toward the program. In addition, the program may contain errors due to the
programmer's misunderstanding of the problem statement or specification.
If this is the case, it is likely that the programmer will have the same
misunderstanding when attempting to test his or her own program. This does
not mean that it is impossible for a programmer to test his or her own
program, because, of course, programmers have had some success in testing
their programs. Rather, it implies that testing is more effective and
successful if performed by another party. Note that this argument does not
apply to debugging (correcting known errors); debugging is more efficiently
performed by the original programmer.
This is particularly true in the latter stages of testing where the
program is verified against its objective. In most environments, a
programming organization or a project manager is largely measured on the
ability to produce a program by a given date and for a certain cost. One
reason for this is that it is easy to measure time and cost objectives, but
it is extremely difficult to quantify the reliability of a program.
Therefore it is difficult for a programming organization to be objective in
testing its own program, because the testing process, while increasing the
reliability of the program, may be viewed as decreasing the probability of
meeting the schedule and cost objectives.
This is probably the most obvious principle, but again, it is something
that is often overlooked. A significant percentage of errors that are
eventually found were actually made visible by earlier test cases, but
slipped by because of the failure to carefully inspect the results of those
earlier test cases.
There is a natural tendency, when testing a program, to concentrate on
the valid and expected input conditions, at the neglect of the invalid and
unexpected conditions. Hence many errors are suddenly discovered in
production programs when the program is used in some new or unexpected way.
Test cases representing unexpected and invalid input conditions seem to
have a higher error-detection yield than do test cases for valid input
conditions.
Examining a program to see if it does not do what it is supposed to do
is only half the battle. The other half is seeing whether the program does
what it is not supposed to do. This is simply a corollary to the previous
principle. It also implies that programs must be examined for unwanted
side effects.
This problem is seen most often in the use of interactive systems to
test programs. A common practice is to sit at a terminal, invent test
cases on the fly, and then send these test cases through the program. The
major problem is that test cases represent a valuable investment that, in
this environment, disappears after the testing has been completed.
Whenever the program has to be tested again (e.g., after correcting an
error or making an improvement), the test cases have to be reinvented.
More often than not, since this reinvention requires a considerable amount
of work, people tend to avoid it. Therefore, the retest of the program is
rarely as rigorous as the original test, meaning that if the modification
causes a previously functional part of the program to fail, this error
often goes undetected.
This is a mistake often made by project managers and is a sign of the
use of the incorrect definition of testing, that is, the assumption that
testing is the process of showing that the program functions correctly.
This counter-intuitive phenomenon at first glance makes little sense,
but it is a phenomenon that has been observed in many programs. Errors
seem to come in clusters, and in the typical program, some sections seem to
be much more error prone than other sections. This phenomenon gives us
insight or feedback in the testing process. If a particular section of a
program seems to be much more error prone than other sections, then in
terms of yield on our testing investment, additional testing efforts are
best focused against this error-prone section.
It is probably true that the creativity required in testing a large
program exceeds the creativity required in designing that program, since it
is impossible to test a program such that the absence of all errors can be
guaranteed. |
DEBUGGINGDefinition of DebuggingDebugging is that activity which is performed after executing a
successful test case. Debugging consists of determining the exact
nature and location of the suspected error and fixing the
error. Debugging is probably the most difficult activity in software development from a psychological point of view for the following reasons:
Of the two aspects of debugging, locating the error represents
about 95% of the activity. Hence, the rest of this section
concentrates on the process of finding the location of an error, given a
suspicion that an error exists, based on the results of a successful test
case. Debugging by Brute ForceThe most common and least effective method of program debugging is by
"brute force". It requires little thought and is the least mentally taxing
of all the methods. The brute-force methods are characterized by either
debugging with a memory dump; scattering print statements throughout the
program, or debugging with automated debugging tools. Using a memory dump to try to find errors suffers from the following
drawbacks:
Scattering print statements throughout the program, although often
superior to the use of a dump in that it displays the dynamics of a program
and allows one to examine information that is easier to read, is not much
better and exhibits the following shortcomings:
The biggest problem with the brute-force methods is that they ignore the most powerful debugging tool in existence, a well trained and disciplined human brain. Myers suggests that experimental evidence, both from students and experienced programmers, shows:
Hence, the use of brute-force methods is recommended only when all other
methods fail or as a supplement to (not a substitute for) the thought
processes described in the subsequent sections. Debugging by InductionMany errors can be found by using a disciplined thought process without
ever going near the computer. One such thought process is induction, where
one proceeds from the particulars to the whole. By starting with the
symptoms of the error, possibly in the result of one or more test cases,
and looking for relationships among the symptoms, the error is often
uncovered. The induction process is illustrated in Figure 1 and described by Myers
as follows:
![]() Figure 1. Inductive Debugging Process Debugging By DeductionAn alternate thought process, that of deduction, is a process of
proceeding from some general theories or premises, using the processes of
elimination and refinement, to arrive at a conclusion. This process is
illustrated in Figure 2 and also described by Myers as follows:
![]() Figure 2. Deductive Debugging Process Debugging by BacktrackingFor small programs, the method of backtracking is often used effectively
in locating errors. To use this method, start at the place in the program
where an incorrect result was produced and go backwards in the program one
step at a time, mentally executing the program in reverse order, to derive
the state (or values of all variables) of the program at the previous step.
Continuing in this fashion, the error is localized between the point where
the state of the program was what was expected and the first point where
the state was not what was expected. Debugging by TestingThe use of additional test cases is another very powerful debugging
method which is often used in conjunction with the induction method to
obtain information needed to generate a hypothesis and/or to prove a
hypothesis and with the deduction method to eliminate suspected causes,
refine the remaining hypothesis, and/or prove a hypothesis. The test cases for debugging differ from those used for integration and
testing in that they are more specific and are designed to explore a
particular input domain or internal state of the program. Test cases for
integration and testing tend to cover many conditions in one test, whereas
test cases for debugging tend to cover only one or a very few conditions.
The former are designed to detect the error in the most efficient manner
whereas the latter are designed to isolate the error most efficiently. |
Debugging Guidelines (Error Locating)As was the case for the testing guidelines, many of these debugging
guidelines are intuitively obvious, yet they often forgotten or overlooked.
The following guidelines are suggested by Myers to assist in locating
errors.
Debugging is a problem solving process. The most effective method of
debugging is a mental analysis of the information associated with the
error's symptoms. In efficient program debugger should be able to pinpoint
most errors without going near a computer.
The human subconscious is a potent problem-solver. What we often
refer to as inspiration is simply the subconscious mind working on a
problem when the conscious mind is working on something else, such as
eating, walking, or watching a movie. If you cannot locate an error in a
reasonable amount of time (perhaps 30 minutes for a small program, a few
hours for a large one), drop it and work on something else, since your
thinking efficiency is about to collapse anyway. After "forgetting" about
the problem for a while, either your subconscious mind will have solved the
problem, or your conscious mind will be clear for a fresh examination of
the symptoms.
By doing so, you will probably discover something new. In fact, it is
often the case that by simply describing the problem to a good listener,
you will suddenly see the solution without any assistance from the
listener.
And then, use them as an adjunct to, rather than as a substitute for,
thinking. 15 noted earlier in this section, debugging tools, such as dumps
and traces, represent a haphazard approach to debugging. Experiments show
that people who shun such tools, even when they are debugging problems that
are unfamiliar to them, tend to be more successful than people who use the
tools.
Use it only as a last resort. The most common mistake made by novice
debuggers is attempting to solve a problem by making experimental changes
to the program. This totally haphazard approach cannot even be considered
debugging; it represents an act of blind hope. Not only does it have a
miniscule chance of success, but it often compounds the problem by adding
new errors to the program. Debugging Guidelines (Error Repairing)The following guidelines for fixing or repairing the program after the
error is located are also suggested by Myers.
When one finds an error in a section of a program, the probability of
the existence of another error in that section is higher. When repairing
an error, examine its immediate vicinity for anything else that looks
suspicious.
Another common failing is repairing the symptoms of the error, or just
one instance of the error, rather than the error itself. If the proposed
correction does not match all the clues about the error, one may be fixing
only a part of the error.
Tell this to someone, and of course he would agree, but tell it to
someone in the process of correcting an error, and one often gets a
different reaction (e.g., "Yes, in most cases, but this correction is so
minor that it just has to work"). Code that is added to a program to fix
an error can never be assumed correct. Statement for statement,
corrections are much more error prone than the original code in the
program. One implication is that error corrections must be tested, perhaps
more rigorously than the original program.
Experience has shown that the ratio of errors due to incorrect fixes
versus original errors increases in large programs. In one widely used
large program, one of every six new errors discovered was an error in a
prior correction to the program.
Not only does one have to worry about incorrect corrections, but one has
to worry about a seemingly valid correction having an undesirable side
effect, thus introducing a new error. Not only is there a probability that
a fix will be invalid, but there is also a real probability that a fix will
introduce a new error. One implication is that not only does the error
situation have to be tested after the correction is make, but one must also
perform regression testing to determine if a new error has been
introduced.
One should realize that error correction is a form of program design.
Given the error-prone nature of corrections, common sense says that
whatever procedures, methodologies, and formalism were used in the design
process should also apply to the error-correction process. For instance,
if the project rationalized that code inspections were desirable, then it
must be doubly important that they be used after correcting an error.
When debugging large systems, particularly a system written in an
assembly language, occasionally there is the tendency to correct an error
by making an immediate change to the object code, with the intention of
changing the source program later. Two problems associated with this
approach are (l) it is usually a sign that "debugging by experimentation"
is being practiced, and (2) the object code and source program are now out
of synchronization, meaning that the error could easily surface again when
the program is recompiled or reassembled. |
ERROR COLLECTION AND ANALYSISDuring each phase of software development, it is very important to
categorize and collect information about software errors. Then, later on,
this information can be analyzed to provide valuable feedback in terms of
improving future design and testing processes. In addition to the simple summarization of the errors and calculations
of what percentage of the total errors are represented by a certain type, a
more detailed analysis is needed to answer the following very important
questions (also suggested by Myers):
TEST COMPLETION CRITERIAIf one is to formalize any type of activity, such as we are trying to do
with software testing, the criteria for completing the activity must be
defined. This is particularly important in software testing since, except
for small programs, there is virtually no way to tell when the last
remaining error has been detected. Two commonly used criteria for the end of software testing are the
following:
The first criterion is useless because it can be satisfied by doing
nothing. The second criterion is also useless because it is independent of
the quality of the test cases. It also encourages one to write test cases
that have a low probability of detecting errors. Three much more useful criteria for ending software testing are
discussed in the following paragraphs:
The best criterion is probably a combination of the three. Specific Test Case DesignThe first criterion is the use of specific test case design procedures.
For example, module testing might be completed when the test cases which
are derived from satisfying the multi-condition coverage criterion and a
boundary-value analysis of the module interface execute without errors. On the other hand, function testing might be completed when the test
cases which are derived from cause-effect graphing, boundary-value
analysis, and error guessing, and all resultant test cases are eventually
unsuccessful. This criterion is better then the two mentioned earlier, however it is
not helpful in a test phase in which specific methodologies are not
available, such as the system test phase. Also it is a subjective
measurement, since there is no way to guarantee that a person has used a
particular methodology (e.g., boundary-value analysis) properly and
rigorously. Detecting a Specified Number of ErrorsThe second criterion, is to state the test completion requirements in
terms of the detection of some specified number of errors. For example,
the completion criteria for a performance test might be defined to be the
detection of 90 errors or an elapsed time of 3 months, whichever comes
later. Using this criterion requires one to estimate:
In order to estimate the number of errors in a program, one can search
for an error model based upon historical date for similar programs. Myers
suggests that the number of errors that exist in typical programs at the
time that coding is complete (before a code walkthrough or inspection is
employed) is approximately 4-8 errors per 100 program statements. An estimate of the percentage of errors that can be found is somewhat
arbitrary and depends upon the impact of the error. Estimating when errors are likely to occur and be detected is even mere
difficult. However once this goal is established, historical date can be
collected and used to help predict the time of occurrence and detection of
the errors. The real advantage to this criterion is the emphasis on detecting errors
by establishing a goal and partitioning it into the phases of testing, as
opposed to emphasis on the running of test cases. Using Error Detection Rate ChartsUse of this criterion requires one to plot the number of errors detected
as a function of time for each phase of the program. Then by looking at
the shape of the error detection rate curve, one can decide whether or not
to continue with one phase or go on to the next phase. The main idea is to
continue a phase so long as the error detection rate is high or is
increasing. When the error detection rate is declining, however, more
efficiency in detection of errors may be obtained by proceeding to the next
phase, where the error detection rate will again start to increase. Graphs shown in Figure 3 (from Myers) show first an increasing rate
where the phase should be continued and secondly, a decreasing rate where
the phase should probably have been terminated 10% earlier. ![]() ![]() Figure 3. Estimating Completion by Plotting Errors Detected Per Unit Time Figure 4 is an illustration of what happens when one fails to plot the
number of errors being detected. The graph represents three testing phases
of an extremely large software system; it was drawn as part of a postmortem
study of the project. An obvious conclusion is that the project should not
have switched to a different testing phase after period 6. During period
6, the error-detection rate was good (to a tester, the higher the rate, the
better), but switching to a second phase at this point caused the
error-detection rate to drop significantly. Using the error detection rate charts in conjunction with either of the other two criteria for test completion is highly recommended. ![]() Figure 4. Post-Mortem Study of the Testing Processes of a Large Project SourcesA significant portion of this material was derived from notes for an unpublished H.A.C./A.F. document and The Art of Software Testing by Glenford J. Myers, (ISBN: 0471043281) John Wiley & Sons, 1979. |