Issue #2: You didn't really KNOW that the change was safe.
There being no instances of ">2 line" transactions in the data does not prove that there is no code that can produce them; it only shows that in that period of time none did. If the system imported transactions from another system, for example, it might be just a coincidence that they had not yet done "year end processing" or some other process that produces "N line" transactions. And after refactoring this functionality out of the system, what would you do on the day that the system breaks online, because you missed one of the cases?
Here's another reason to be nervous: None of the UnitTests failed. How well are you testing the functionality of the Transaction class, if changing some of its methods so they throw away data does not cause any of the tests to fail?
Hang on a minute. Has the airframe on the right undergone strict Fagan Inspection by a trained team of problem-domain competent Inspectors? Thanks, I'll take that one. Four people, skilled and experienced, trained in the Way of the Process, have inspected this airframe for defects and have found none. None. Zero. Zip. Zilch. Nada. Any defects that were found in previous inspections were corrected and the fixes themselves inspected in context for potential new defects introduced by the fix. This iterative process continued until no more defects showed up.
Gas 'er up.
Jeff, what alternative practice do you follow that provides even better certainty during refactoring?
We crossed this threshold while I was working on LifeTech. At first nobody could make a single move because they would break something else. (ExtremeFrustration!) After about four months of OnDemandUnitTesting? (see UnitTestTrial, I think), we changed from thinking "all the tests running just proves we don't have enough tests" to thinking "all the tests run, we can't think of anything that might break, so put it into production".
Terrific. We tested this implanted defibrillator. Well, we tested it until we ran out of things to test. Here, let's stick it in your chest...
Most times we were right, but occasionally we were wrong and we would put code into production that broke. This happened less and less frequently as we got more tests. Every time something broke in production we learned about another kind of test we needed to write. -- KentBeck
Coverage tools add comfort, for sure. The fact that coverage is complete also increases the number of "edge" cases that are checked. However, it's coverage in combination that really does the checking - this generally requires human thought: "OK, but what if he's at the savings match and gets a raise this month". Don't let your coverage tester discourage thinking and discussion of what to test.
The first is a misconception that comes from familiarity with BigBangTesting. That is, you sit down and just test and test your code all in one session. And you always, always miss several big bugs. Thus you find TestsCantProveTheAbsenceOfBugs. But this is not XP style. With XP you create tests before you code, while you code, and then even after you code. You create tests whenever a bug is found. You create tests when you look at someone else's code and see a testing hole. And you don't just throw those tests away after you believe you are done. You evolve and expand your test suite over time. The testing suite that results is (trust me) very different than what you might imagine.
The second misconception comes from a lack of YouArentGonnaNeedIt. That is you wonder how much of your system is not being exercised by your tests. The XP answer is none of it. You have been very careful not to include anything extra and you tested 100% of everything you did include. If you missed something someone else will add a test for it sooner than later.
After a while (and I am talking about only a few months of evolution) you will have a collection of tests which do in fact prove the absence of many bugs. Not all of course, but any relevant ones will be covered. -- DonWells
To illustrate, a story by EricUlevik (don't trust the details):
Doing forex trading, we found a case where our software would get a bad calculation. Once a week or so. Eventually, the problem turned out to be a failure in a CPU cache line refresh. This was a hardware design fault in the PC.
The test suite included running for two weeks at maximum update rate without error, so this bug was found.
If you need this level of quality, a good place to look is Intel's web site for processor errata.
The unit tests for the Transaction class are not sufficient to detect this loss of functionality.
My response would be to add more Transaction unit tests. -- JeffGrigg
ItDepends. If the functionality is required, i.e. you really do have a card for it, then test it and leave it in. The test was erroneously left out. On the other hand, if the functionality is not an immediate requirement, you should invoke YouArentGonnaNeedIt and remove the functionality rather than test it.
Either way, by paying attention to having tests and functionality aligned, our confidence in the system increases.
In refactoring, you are either changing implementation, or you are changing interface. It is bad to change both at once. If you are changing only implementation, then the tests do not require changing.
If you are changing interface, the tests need changing. Generally, of course, they are quite similar to the old ones.
It is in no way a problem: if you were implementing new functionality, you'd have to do tests, typically from scratch. New tests are not a problem. If you're refactoring interface, you still have to do tests, and they aren't even as hard as new tests, which aren't hard.
As always, with new code or refactoring, if a test breaks, there are two possibilities: the code is wrong, or the test is wrong. If the developer can't tell which, he should consider a job in Marketing. ;-> -- RonJeffries
Maybe a sufficient suite of unit (and functional) tests serve as a probabilistic proof that the system is correct.
It's refreshing to hear this point of view. People get hung up on extremes, either its provable or everything sucks. After all, just because a test can't definitively prove the absence of bugs in a system does not mean that we as engineers shouldn't test that system as rigorously as possible. It's funny how hung-up we get on extremes. So, I would agree, you don't have to convince everyone that software can be proven correct in order to show the value of testing. I think someone (maybe WattsHumphrey?) came up with the maxim that if you have found a bug, you can be certain there is another one. :) -- RobertDiFalco
That's a pretty safe conclusion even if you haven't found a bug yet. -- GarethMcCaughan
I think WattsHumphrey's view is that finding a bug in a particular area suggests that there are more bugs in that area. Example: code that has a lot of syntax errors will likely also have more serious errors. Not sure how that fits on this page... -- JasonYip
Yeah, that was it. Maybe I was mistaken, but it seemed relevant to the idea that you cannot prove the absence of bugs. In fact, finding a bug is often an indication that more serious bugs exist. So, while testing can not prove the absence of bugs but it can prove their existence -- and this is an important and good thing. This allows the engineer to remove those detected bugs and, hence, increase confidence in the system. Seems logical to me. Another great thing about UnitTesting is that (when you are rigorous about them and add them to your daily build) you can waste less of QA's time when they do integration testing. There is no excuse for giving QA a build that has errors in the implementation (un-integrated).
This will be slightly off-topic, but I have always been pretty religious about developing many small and atomic UnitTests that map to component interfaces. Actually, I use to create a single TestCase for every public method. Now I just make sure that each method works, write tests to simulate every fail-condition, and make sure that running everyone's UnitTests is an automatic part of the DailyBuild script. However, (while admittedly a subtle distinction) I don't really believe that I am removing any bugs (nor proving the absence of new bugs). Actually, I visualize my unit tests much like an extension to the traditional DesignByContract. The TestMethods? provide a harness for exercising these contracts. My general rule is that I shouldn't have behavior (for failure or success) that can't be asserted in some way. The more rigorous my assertions, the higher my confidence level is in the code's behavior. So, in this way, each TestCase is really just a harness or an artificial agitator for my Contracts.
Finally, just as I would never skip UnitTesting, I would also never be naive enough to think UnitTesting alone is sufficient. And so, I also make certain that QA performs black and white box functionality tests that don't test my implementation as much as they test the solutions my units of implementation provide when integrated together. -- RobertDiFalco
There is evidence of BadModules? where one finds 80% of a system's faults in 20% of its modules. This seems to be true, so far, of all systems whatever methodology or tools are used.
Another thought experiment. Two modules to run your pacemaker have undergone large amounts of testing, equivalent as far as you know. From module A no defects have ever been found. From module B, hundreds have been found so far. Your pacemaker has a JulY2K problem and you must choose a new module NOW, or die. Which one do you choose? Justify your answer. -- RonJeffries
I know what you are trying to prove, Ron - but your example has major holes. Obviously, it must depend on the kind of bugs found, how much actual time the two have been in use, the reputation of the manufacturers and many other factors. Otherwise, we would expect a bugridden OS like Windows to fail utterly compared with the much more stable Linux.
The observation that bugs have been found is certainly a good predictor that still more remain - but it is not a reliable predictor that people will not use the product -- RussellGold
All you know is that both modules have undergone extensive testing, equivalent as far as you know. One has never exhibited a defect, one has exhibited hundreds. Your heart is stopping. Pick one. Justify your answer. -- rj
Ron, if both underwent rigorous testing, I would have to select module A. Just by the fact that hundreds have been found in module B, I can pretty safely bet that many more are yet to be found. However, since so few (actually none) have been found in module A, chances are that none will be found. Is this where you were heading? Which would you pick? BTW, I think this was a useful thought experiment. -- RobertDiFalco
A. I can accept no errors in my pacemaker. Only one of the systems has exhibited no errors. End of consideration. For extra credit, consider pacemaker software C, which has shown a declining error curve. It started with lots, but no new defects have been discovered for a "long time" now. Do we prefer it to B? Do we prefer it to A? And why? Your heart is stopping. Pick one. ;->
Ron, I would still choose module A. I still prefer code which has never shown bugs in production testing to one that has radically reduced its number of bugs. For anyone who disagrees with the concept of UnitTesting here's another question. How can you achieve the success of module A without UnitTesting? Personally, I think the only answer is either serendipity or a team mate who did some unit testing behind your back. :) -- RobertDiFalco
This is the TheresTheBug principle.
My flippant remark appears to be being taken much more seriously than it deserved. Anyway, obviously the answer to the pacemaker question is that I pick module A rather than module B. If pacemakers actually contain non-trivial software (which I rather doubt, but what do I know?) then I'd guess A probably has bugs, but clearly any bugs it has show up less than the bugs in B. Do I prefer A or C? Depends on the amount of testing they've had. If the total is the same, then C's latest version, despite not having exhibited any bugs, hasn't been tested as much as A's (presumably only) version. If the total on C's latest version equals the total on A's only version, I might pick C. I don't think any of this is surprising, unless someone here is claiming that testing isn't useful. I doubt anyone is. -- GarethMcCaughan
I'm going to step out on a limb since I haven't actually worked on a pacemaker but from talking to biomedical engineers, they do consider pacemaker software to be trivial. Or at least, it is straight forward to formally verify the algorithms. Software tends not to be the main failure condition but rather rejection and hardware problems. -- JasonYip
The people wearing the pacemaker do not consider the software to be nontrivial. My father's pacemaker refused to speed up when appropriate. Programming? Configuration? Still an error. -- GeorgeCatlin?
I might pick C. Because when you do TDD, your module should fail the test at least once. Even if A has failed the test case once the first time, the fact the A has never failed the test makes me think that it is possible that the developer doesn't write the good test case at all (may be all his test case was an empty test method). While C has failed all the test first and gradually fixed to do what the test cases said. I'm quite sure, module C's developer spend quite time wrtitng his test case. Module B's test case may be as good as C's or as bad as A's. However module B doesn't pass the test so I'll not pick it.
Of course not. But proof is not a required condition for usefulness.
It is for liability.
Veering away from the absolute for a moment, while testing cannot prove the absence of all errors, it can demonstrate the absence of certain errors.
This seems to be a semantic quibble masquerading as some profound truth. Tests, particularly as done in TestDrivenDesign are a good indicator of correctness.
An individual test shows correct operation under a very specific set of conditions, some of which may not be obvious. It remains a matter of experience to determine how much one can extrapolate from a single test. It is not feasible to test every possible parameter value and every combination of parameter values for a function. Hence, one chooses a small representative set of parameters to verify operation of the function and assumes correct operation for the remaining cases.
A second scenario concerns differences between the operational scenario of a function and the test scenario. If the operational scenario requires the function to do something that was not covered in the test scenario, the result will be indeterminate. It may work, it may not. If one follows Test Driven Design, this can only result if the developer did not know what needed to be built. In an independent testing approach, this will result if the tester did not know what needed to be tested (we cannot make any assumptions about the knowledge of the developer in this case).
Testing, especially when using Test Driven Design is a powerful way to determine the conditions when a function will operate correctly. It takes some knowledge and experience to extrapolate whether these successful conditions indicate the function will work without fault in operation.
This page mirrored in ExtremeProgrammingRoadmap as of April 29, 2006