Testing Is Not Quality¶

"Program testing can be used very effectively to show the presence of bugs but never to show their absence." The remark is Edsger Dijkstra's, made in 1970, and it remains the most concise correct statement of what testing does.¹ Testing reveals defects that exist. It does not prove that defects do not.

This matters because many engineering cultures treat passing tests as the definition of working software. A clean test suite is evidence of one thing only: that the defects the tests were written to catch are not currently present. It is not evidence that the software is correct. It is not evidence that the software does what the customer needs. It is not evidence that the software will continue to work tomorrow under conditions slightly different from those exercised today.

Testing is one tool in a quality system, not the source of quality. Confusing the two produces software that passes its tests and fails its users.

What Testing Actually Tells You¶

A passing test answers a narrow question: "did this code do this thing this way?" That is useful information. It is not most of the information that matters.

The questions a test suite does not answer include:

Is the design right? Tests verify behavior against the assumptions of the person who wrote them. If the assumptions were wrong, the tests confirm wrongness.
Are the requirements right? A test can confirm that the code matches the spec. It cannot tell you the spec describes what the customer actually needs.
What happens in conditions the test did not exercise? The space of possible production conditions is much larger than any test suite can enumerate. Most outages occur in unexercised regions.
Does the system behave correctly when its parts interact? Unit tests verify pieces. They are silent on emergent behavior.
Is the system maintainable? A passing test says nothing about whether the code is comprehensible to the next person who has to change it.
Is the system secure? Functional tests do not exercise adversarial input. Security testing is a different discipline with different tools.

Tests are evidence about a specific slice of behavior. Treating them as evidence about the whole product is a category error.

Types of Testing and What Each Is For¶

A mature testing strategy uses several types together, each catching what the others miss:

Unit tests. Verify the behavior of small components in isolation. Fast, cheap, narrow. They catch logic errors and regressions in the smallest pieces. They are silent on integration, deployment, and customer experience.
Integration tests. Verify that components work together as expected. Slower and more brittle than unit tests, but they catch the class of defects that emerge when otherwise-correct parts interact incorrectly.
Contract tests. Verify that the interfaces between services match the assumptions on both sides. Useful in any system where teams own different components and have to integrate without coordinating every release.
End-to-end tests. Verify that the system, end to end, produces the right outcome for a representative scenario. The most realistic and the most expensive to write, run, and maintain. Useful in small numbers, dangerous in large ones.
Exploratory testing. Skilled humans deliberately probe the system for behaviors the automated suite does not exercise. The most effective way to discover what the team did not think to test for.²
Regression testing. A subset of the above, automated where possible, that runs against every change to catch the reappearance of defects that have already been seen.
Performance and load testing. Verifies the system's behavior under expected and unexpected scale. Different from functional testing; sometimes neglected until it produces an incident.
Production monitoring. Treats the running system itself as the most realistic test environment. Observability, alerting, and structured logging convert real traffic into ongoing evidence about how the software actually behaves.

No single layer is sufficient. A test pyramid that is heavy at the bottom and light at the top is usually right; a test suite that is one of these layers and not the others is usually wrong.

Common Testing Anti-Patterns¶

Confusing coverage with quality. A code coverage number measures how much of the code was executed by tests, not whether the right things were tested. High coverage of trivial paths is common; low coverage of the actually risky logic is also common.
Treating green builds as proof. A passing pipeline is evidence the team's known concerns are not currently present. It is not evidence the software is correct, or that the concerns the team has not articulated are absent.
Inspecting in via QA at the end. Quality is designed in upstream, not added downstream. A late-stage QA pass is useful as an additional layer; it is dangerous as the primary one.³
Letting flaky tests live in the suite. A test that fails intermittently teaches the team to ignore failures. The cost of one flaky test is paid in every real failure that follows, when the team's first reflex is to re-run rather than investigate.
Writing tests after the code with no eye to the design. Tests written purely to satisfy a coverage target tend to pin down the current implementation rather than verify behavior. The cost is paid the next time someone tries to change anything: every test breaks, none for reasons that matter.
Treating "more tests" as the answer to every quality problem. Sometimes the answer is fewer tests at the wrong layer and more tests, or more upstream work, somewhere else. Sometimes the answer is not testing at all but better design.

What This Looks Like in Practice¶

A few habits keep testing honest about its role:

Use tests as feedback, not as gates alone. Tests that run fast enough to influence the code while it is being written change the design of the code. Tests that only run at the end of the pipeline catch defects but do not prevent them.⁴
Test the riskiest behavior, not the easiest. The most valuable tests are usually the ones that exercise the parts of the system where failure is expensive. A test suite that covers easy cases thoroughly and risky cases poorly is misallocating effort.
Invest in observability as a form of testing. Production monitoring catches the class of defects that automated tests cannot anticipate. Treat instrumentation as part of the quality investment, not as an operations afterthought.
Retire tests that no longer earn their keep. Test suites accumulate. Some of the accumulation is paying for itself; some is not. Pruning is part of maintenance.
Pair testing with design review. A defect caught by a test took some amount of work to fix. A defect caught by a five-minute design conversation usually never reached the code at all.
Treat every production defect as a question about the test strategy. Not "why didn't a test catch this?" but "what about how we work let this slip through?" The answer is sometimes a missing test. It is at least as often something further upstream.

Key principle

Testing can reveal defects. It cannot prove their absence, and it cannot substitute for the upstream work that makes defects unlikely in the first place. A team that ships quality has a test suite; a team that has a test suite does not necessarily ship quality.

Edsger W. Dijkstra, "Notes on Structured Programming" (1970), later collected in Structured Programming by Dahl, Dijkstra, and Hoare (Academic Press, 1972). The full quote: "Program testing can be used very effectively to show the presence of bugs but never to show their absence." Often cited as the cleanest statement of what testing can and cannot do. ↩
Lisa Crispin and Janet Gregory, Agile Testing: A Practical Guide for Testers and Agile Teams (Addison-Wesley, 2009). The canonical text on integrating testing into agile delivery rather than treating it as a separate downstream phase. Their later More Agile Testing (2014) extends the work to larger and more complex organizations. ↩
See Quality Is Designed In for the longer argument and the underlying Deming citation. ↩
Kent Beck, Test-Driven Development by Example (Addison-Wesley, 2002). The canonical articulation of TDD as a design discipline rather than a testing technique. The argument is that writing the test first changes how the code under test is designed, and that the test artifact is a useful but secondary byproduct of that design pressure. ↩