As with any form of human behavior, software testing is dominated by the attitude of the tester toward the testing activity. This attitude toward testing can be summarized in a statement of the primary goals of testing. Some of the prevailing goals of testing are stated as follows:
While these are admirable goals of testing, that of demonstrating the correctness of the software, the underlying attitude tends to permeate the complete testing activities resulting in test case selection and execution which tend to show that no errors are present.
One problem with these goals is that it is virtually impossible to remove all of the errors in a non-trivial software program. Hence the goals are unrealistic to start with. Another problem is that although a program may perform all of its intended functions, it may still contain errors in that it also performs unintended functions.
A much more productive goal of testing is the following:
To discover what errors are present in the software
This goal assumes that errors are present in the software, an assumption which is true for virtually all software and one which exhibits a much more productive attitude towards software testing, that of stressing the software to the fullest, with the goal of finding the errors. Since this goal is much more conducive to finding errors, it is also much more likely to increase the reliability of the software.
One way of detecting the prevailing attitude toward testing is to see how the words "successful" and "unsuccessful" are used in describing test case results. If a test case which uncovers no errors is considered to be successful, this is a sign that an unproductive attitude exists. Such a test case adds no reliability to the software and is hence largely a waste of time and energy. A successful test case should be one that uncovers errors. In fact, the more errors uncovered, the better the test case.
Two alternate and complimentary approaches to testing are called black-box and white-box testing. Black-box testing is also called data-driven (or input/output-driven) testing. In using this approach, the tester views the program as a black box and is not concerned about the internal behavior and structure of the program. The tester is only interested in finding circumstances in which the program does not behave according to its specifications. Test data are derived solely from the specifications (i.e., without taking advantage of knowledge of the internal structure of the program).
If one wishes to find all errors in the program, using this approach, the criterion is exhaustive input testing. Exhaustive input testing is the use of every possible input condition as a test case. Since this is usually impossible or impractical from an economic view point, exhaustive input testing is rarely used. In order to maximize the yield on the testing investment (i.e., maximize the number of errors found by a finite number of test cases), the white-box approach is also used.
Another testing approach, white-box or logic-driven structural testing, permits one to examine the internal structure of the program. In using this strategy, the tester derives test data from an examination of the program's logic and structure.
The analog to exhaustive input testing of the black-box approach is usually considered to be exhaustive path testing. That is, if one executes (via test cases) all possible paths of control flow through the program, then possibly the program can be said to be completely tested.
There are two flaws in this statement, however. One is that the number of unique logic paths through a program is astronomically large. The second flaw in the statement that exhaustive path testing means a complete test is that the path in a program could be tested, yet the program might still be loaded with errors. There are three explanations for this. The first is that an exhaustive path test in no way guarantees that a program matches its specification. Second, a program may be incorrect because of missing paths. Exhaustive path testing, of course, would not detect the absence of necessary paths. Third, an exhaustive path test might not uncover data-sensitivity errors.
Although exhaustive input testing is superior to exhaustive path testing, neither prove to be useful strategies because both are infeasible. Some way of combining elements of both black-box and white-box testing to derive reasonable, but not air-tight, testing strategy is desirable.
The following set of testing guidelines are suggested by Myers . They are interesting in that most of them appear to be intuitively obvious, yet they are often overlooked.
A necessary part of a test case is a definition of the expected output or result.
If the expected result of a test case has not been predefined, chances are that a plausible, but erroneous, result will be interpreted as a correct result because there is a subconscious desire to see the correct result. One way of combating this is to encourage a detailed examination of all output by precisely spelling out, in advance, the expected output of the program.
A programmer should avoid attempting to test his or her own program.
It is extremely difficult, after a programmer has been constructive while designing and coding a program, to suddenly, overnight, change his or her perspective and attempt to form a completely destructive frame of mind toward the program. In addition, the program may contain errors due to the programmer's misunderstanding of the problem statement or specification. If this is the case, it is likely that the programmer will have the same misunderstanding when attempting to test his or her own program. This does not mean that it is impossible for a programmer to test his or her own program, because, of course, programmers have had some success in testing their programs. Rather, it implies that testing is more effective and successful if performed by another party. Note that this argument does not apply to debugging (correcting known errors); debugging is more efficiently performed by the original programmer.
A programming organization should not test its own programs.
This is particularly true in the latter stages of testing where the program is verified against its objective. In most environments, a programming organization or a project manager is largely measured on the ability to produce a program by a given date and for a certain cost. One reason for this is that it is easy to measure time and cost objectives, but it is extremely difficult to quantify the reliability of a program. Therefore it is difficult for a programming organization to be objective in testing its own program, because the testing process, while increasing the reliability of the program, may be viewed as decreasing the probability of meeting the schedule and cost objectives.
Thoroughly inspect the results of each test.
This is probably the most obvious principle, but again, it is something that is often overlooked. A significant percentage of errors that are eventually found were actually made visible by earlier test cases, but slipped by because of the failure to carefully inspect the results of those earlier test cases.
Test cases must be written for invalid and unexpected, as well as valid and expected, input conditions.
There is a natural tendency, when testing a program, to concentrate on the valid and expected input conditions, at the neglect of the invalid and unexpected conditions. Hence many errors are suddenly discovered in production programs when the program is used in some new or unexpected way. Test cases representing unexpected and invalid input conditions seem to have a higher error-detection yield than do test cases for valid input conditions.
A program must be tested to see if it does something it is not supposed to do.
Examining a program to see if it does not do what it is supposed to do is only half the battle. The other half is seeing whether the program does what it is not supposed to do. This is simply a corollary to the previous principle. It also implies that programs must be examined for unwanted side effects.
Avoid throw-away test cases unless the program is truly a throw-away program.
This problem is seen most often in the use of interactive systems to test programs. A common practice is to sit at a terminal, invent test cases on the fly, and then send these test cases through the program. The major problem is that test cases represent a valuable investment that, in this environment, disappears after the testing has been completed. Whenever the program has to be tested again (e.g., after correcting an error or making an improvement), the test cases have to be reinvented. More often than not, since this reinvention requires a considerable amount of work, people tend to avoid it. Therefore, the retest of the program is rarely as rigorous as the original test, meaning that if the modification causes a previously functional part of the program to fail, this error often goes undetected.
Do not plan a testing effort under the tacit assumption that no errors will be found.
This is a mistake often made by project managers and is a sign of the use of the incorrect definition of testing, that is, the assumption that testing is the process of showing that the program functions correctly.
The probability of the existence of more errors in a section of a program
is proportional to the number of errors already found in that section.
This counter-intuitive phenomenon at first glance makes little sense, but it is a phenomenon that has been observed in many programs. Errors seem to come in clusters, and in the typical program, some sections seem to be much more error prone than other sections. This phenomenon gives us insight or feedback in the testing process. If a particular section of a program seems to be much more error prone than other sections, then in terms of yield on our testing investment, additional testing efforts are best focused against this error-prone section.
Testing is an extremely creative and intellectually challenging task.
It is probably true that the creativity required in testing a large program exceeds the creativity required in designing that program, since it is impossible to test a program such that the absence of all errors can be guaranteed.
Debugging is that activity which is performed after executing a successful test case. Debugging consists of determining the exact nature and location of the suspected error and fixing the error.
Debugging is probably the most difficult activity in software development from a psychological point of view for the following reasons:
Of the two aspects of debugging, locating the error represents about 95% of the activity. Hence, the rest of this section concentrates on the process of finding the location of an error, given a suspicion that an error exists, based on the results of a successful test case.
The most common and least effective method of program debugging is by "brute force". It requires little thought and is the least mentally taxing of all the methods. The brute-force methods are characterized by either debugging with a memory dump; scattering print statements throughout the program, or debugging with automated debugging tools.
Using a memory dump to try to find errors suffers from the following drawbacks:
Scattering print statements throughout the program, although often superior to the use of a dump in that it displays the dynamics of a program and allows one to examine information that is easier to read, is not much better and exhibits the following shortcomings:
The biggest problem with the brute-force methods is that they ignore the most powerful debugging tool in existence, a well trained and disciplined human brain. Myers suggests that experimental evidence, both from students and experienced programmers, shows:
Hence, the use of brute-force methods is recommended only when all other methods fail or as a supplement to (not a substitute for) the thought processes described in the subsequent sections.
Many errors can be found by using a disciplined thought process without ever going near the computer. One such thought process is induction, where one proceeds from the particulars to the whole. By starting with the symptoms of the error, possibly in the result of one or more test cases, and looking for relationships among the symptoms, the error is often uncovered.
The induction process is illustrated in Figure 1 and described by Myers as follows:
An alternate thought process, that of deduction, is a process of proceeding from some general theories or premises, using the processes of elimination and refinement, to arrive at a conclusion. This process is illustrated in Figure 2 and also described by Myers as follows:
For small programs, the method of backtracking is often used effectively in locating errors. To use this method, start at the place in the program where an incorrect result was produced and go backwards in the program one step at a time, mentally executing the program in reverse order, to derive the state (or values of all variables) of the program at the previous step. Continuing in this fashion, the error is localized between the point where the state of the program was what was expected and the first point where the state was not what was expected.
The use of additional test cases is another very powerful debugging method which is often used in conjunction with the induction method to obtain information needed to generate a hypothesis and/or to prove a hypothesis and with the deduction method to eliminate suspected causes, refine the remaining hypothesis, and/or prove a hypothesis.
The test cases for debugging differ from those used for integration and testing in that they are more specific and are designed to explore a particular input domain or internal state of the program. Test cases for integration and testing tend to cover many conditions in one test, whereas test cases for debugging tend to cover only one or a very few conditions. The former are designed to detect the error in the most efficient manner whereas the latter are designed to isolate the error most efficiently.
As was the case for the testing guidelines, many of these debugging guidelines are intuitively obvious, yet they often forgotten or overlooked. The following guidelines are suggested by Myers to assist in locating errors.
Debugging is a problem solving process. The most effective method of debugging is a mental analysis of the information associated with the error's symptoms. In efficient program debugger should be able to pinpoint most errors without going near a computer.
If you reach an impasse, sleep on it.
The human subconscious is a potent problem-solver. What we often refer to as inspiration is simply the subconscious mind working on a problem when the conscious mind is working on something else, such as eating, walking, or watching a movie. If you cannot locate an error in a reasonable amount of time (perhaps 30 minutes for a small program, a few hours for a large one), drop it and work on something else, since your thinking efficiency is about to collapse anyway. After "forgetting" about the problem for a while, either your subconscious mind will have solved the problem, or your conscious mind will be clear for a fresh examination of the symptoms.
If you reach an impasse, describe the problem to someone else.
By doing so, you will probably discover something new. In fact, it is often the case that by simply describing the problem to a good listener, you will suddenly see the solution without any assistance from the listener.
Use debugging tools only as a second resort.
And then, use them as an adjunct to, rather than as a substitute for, thinking. 15 noted earlier in this section, debugging tools, such as dumps and traces, represent a haphazard approach to debugging. Experiments show that people who shun such tools, even when they are debugging problems that are unfamiliar to them, tend to be more successful than people who use the tools.
Use it only as a last resort. The most common mistake made by novice debuggers is attempting to solve a problem by making experimental changes to the program. This totally haphazard approach cannot even be considered debugging; it represents an act of blind hope. Not only does it have a miniscule chance of success, but it often compounds the problem by adding new errors to the program.
The following guidelines for fixing or repairing the program after the error is located are also suggested by Myers.
Where there is one bug, there is likely to be another.
When one finds an error in a section of a program, the probability of the existence of another error in that section is higher. When repairing an error, examine its immediate vicinity for anything else that looks suspicious.
Fix the error, not just a symptom of it.
Another common failing is repairing the symptoms of the error, or just one instance of the error, rather than the error itself. If the proposed correction does not match all the clues about the error, one may be fixing only a part of the error.
The probability of the fix being correct is not 100%.
Tell this to someone, and of course he would agree, but tell it to someone in the process of correcting an error, and one often gets a different reaction (e.g., "Yes, in most cases, but this correction is so minor that it just has to work"). Code that is added to a program to fix an error can never be assumed correct. Statement for statement, corrections are much more error prone than the original code in the program. One implication is that error corrections must be tested, perhaps more rigorously than the original program.
The probability of the fix being correct drops as the size of the program increases.
Experience has shown that the ratio of errors due to incorrect fixes versus original errors increases in large programs. In one widely used large program, one of every six new errors discovered was an error in a prior correction to the program.
Beware of the possibility that an error correction creates a new error.
Not only does one have to worry about incorrect corrections, but one has to worry about a seemingly valid correction having an undesirable side effect, thus introducing a new error. Not only is there a probability that a fix will be invalid, but there is also a real probability that a fix will introduce a new error. One implication is that not only does the error situation have to be tested after the correction is make, but one must also perform regression testing to determine if a new error has been introduced.
The process of error repair should put one back temporarily in the design phase.
One should realize that error correction is a form of program design. Given the error-prone nature of corrections, common sense says that whatever procedures, methodologies, and formalism were used in the design process should also apply to the error-correction process. For instance, if the project rationalized that code inspections were desirable, then it must be doubly important that they be used after correcting an error.
Change the source code, not the object code.
When debugging large systems, particularly a system written in an
assembly language, occasionally there is the tendency to correct an error
by making an immediate change to the object code, with the intention of
changing the source program later. Two problems associated with this
approach are (l) it is usually a sign that "debugging by experimentation"
is being practiced, and (2) the object code and source program are now out
of synchronization, meaning that the error could easily surface again when
the program is recompiled or reassembled.
During each phase of software development, it is very important to categorize and collect information about software errors. Then, later on, this information can be analyzed to provide valuable feedback in terms of improving future design and testing processes.
In addition to the simple summarization of the errors and calculations of what percentage of the total errors are represented by a certain type, a more detailed analysis is needed to answer the following very important questions (also suggested by Myers):
If one is to formalize any type of activity, such as we are trying to do with software testing, the criteria for completing the activity must be defined. This is particularly important in software testing since, except for small programs, there is virtually no way to tell when the last remaining error has been detected.
Two commonly used criteria for the end of software testing are the following:
The first criterion is useless because it can be satisfied by doing nothing. The second criterion is also useless because it is independent of the quality of the test cases. It also encourages one to write test cases that have a low probability of detecting errors.
Three much more useful criteria for ending software testing are discussed in the following paragraphs:
The best criterion is probably a combination of the three.
The first criterion is the use of specific test case design procedures. For example, module testing might be completed when the test cases which are derived from satisfying the multi-condition coverage criterion and a boundary-value analysis of the module interface execute without errors.
On the other hand, function testing might be completed when the test cases which are derived from cause-effect graphing, boundary-value analysis, and error guessing, and all resultant test cases are eventually unsuccessful.
This criterion is better then the two mentioned earlier, however it is not helpful in a test phase in which specific methodologies are not available, such as the system test phase. Also it is a subjective measurement, since there is no way to guarantee that a person has used a particular methodology (e.g., boundary-value analysis) properly and rigorously.
The second criterion, is to state the test completion requirements in terms of the detection of some specified number of errors. For example, the completion criteria for a performance test might be defined to be the detection of 90 errors or an elapsed time of 3 months, whichever comes later.
Using this criterion requires one to estimate:
In order to estimate the number of errors in a program, one can search for an error model based upon historical date for similar programs. Myers suggests that the number of errors that exist in typical programs at the time that coding is complete (before a code walkthrough or inspection is employed) is approximately 4-8 errors per 100 program statements.
An estimate of the percentage of errors that can be found is somewhat arbitrary and depends upon the impact of the error.
Estimating when errors are likely to occur and be detected is even mere difficult. However once this goal is established, historical date can be collected and used to help predict the time of occurrence and detection of the errors.
The real advantage to this criterion is the emphasis on detecting errors by establishing a goal and partitioning it into the phases of testing, as opposed to emphasis on the running of test cases.
Use of this criterion requires one to plot the number of errors detected as a function of time for each phase of the program. Then by looking at the shape of the error detection rate curve, one can decide whether or not to continue with one phase or go on to the next phase. The main idea is to continue a phase so long as the error detection rate is high or is increasing. When the error detection rate is declining, however, more efficiency in detection of errors may be obtained by proceeding to the next phase, where the error detection rate will again start to increase.
Graphs shown in Figure 3 (from Myers) show first an increasing rate
where the phase should be continued and secondly, a decreasing rate where
the phase should probably have been terminated 10% earlier.
Figure 4 is an illustration of what happens when one fails to plot the number of errors being detected. The graph represents three testing phases of an extremely large software system; it was drawn as part of a postmortem study of the project. An obvious conclusion is that the project should not have switched to a different testing phase after period 6. During period 6, the error-detection rate was good (to a tester, the higher the rate, the better), but switching to a second phase at this point caused the error-detection rate to drop significantly.
Using the error detection rate charts in conjunction with either of the other two criteria for test completion is highly recommended.