Characterization Tests: How to Add Test Coverage to Legacy Code
How to write characterization tests to safely add coverage to legacy code before refactoring, using golden master and approval testing techniques.
In this article:
- The Legacy Code Testing Problem
- What Characterization Tests Are and How They Work
- Step-by-Step: Writing Your First Characterization Test
- Golden Master Testing and Approval Tests
- Conclusion
Characterization tests solve a specific problem: how do you safely refactor legacy code that has no tests? The answer is not “write tests that verify what the code should do.” In legacy systems, you often do not know what the code should do. You only know what it currently does. Characterization tests capture that current behavior, creating a safety net that tells you if your refactoring changed anything.
This technique is the standard starting point for any serious legacy modernization effort. If your team is working on systems that lack coverage and needs to improve them without breaking production, characterization tests are the prerequisite.
The Legacy Code Testing Problem
Legacy code often has high cyclomatic complexity, no dependency injection, global state, and database calls embedded in business logic. Writing unit tests for such code in the traditional sense requires either a working database, complex mocking setup, or a full integration environment. None of these are fast to set up, and all of them require understanding the code before you have tested it.
The usual consequence is that teams either skip tests entirely (“we’ll add them after the refactor”) or spend weeks trying to build a test infrastructure before they can make any improvements. Both paths lead to the same place: code that does not get improved, or code that gets “improved” and then breaks production.
Characterization tests cut this knot by inverting the usual question. Instead of asking “what should this code do?” you ask “what does this code currently do?” The answer becomes the test.
What Characterization Tests Are and How They Work
A characterization test, as described by Michael Feathers in “Working Effectively with Legacy Code,” is a test that captures the current behavior of a piece of code without judging whether that behavior is correct.
The process is:
- Call the code under test with specific inputs.
- Observe the outputs (return values, side effects, database writes, log output).
- Write assertions against those observed outputs.
- The test passes when the code behaves exactly as it behaved when you wrote the test.
If the code has a bug, the characterization test will capture the buggy behavior. That is intentional. The goal is not to fix bugs yet; the goal is to create a regression detector so you can refactor without introducing new problems.
For example, if a function returns None when passed an empty string, even if that seems wrong, your characterization test asserts that function("") == None. When you later refactor the function and it starts returning an empty list instead, the test fails. You now know your refactoring changed behavior, which gives you a choice: was that change intentional or accidental?
Step-by-Step: Writing Your First Characterization Test
Step 1: Identify the boundary. Choose a method, function, or class to characterize. Start with the smallest unit you can isolate: a single method rather than a whole class, a single class rather than a whole module.
Step 2: Run the code and observe. Call the function with representative inputs. If the function requires database access or external services, run it against a test database or use a record-and-replay tool to capture the responses. The goal is to get actual outputs, not mocked ones.
Step 3: Write assertions on the observed outputs. Write a test that calls the function with the same inputs and asserts that the outputs match what you observed. If the function returns a complex object, serialize it to JSON or a string representation and assert against that.
Step 4: Run the test until it passes consistently. If the function has non-deterministic behavior (timestamps, random numbers, external calls), you need to control those before the test can be reliable.
Step 5: Expand coverage to edge cases. Once the main path is covered, add tests for edge cases you can identify: empty inputs, null values, boundary conditions. Each additional case makes your safety net stronger.
After this process, you have a set of tests that will detect any behavioral change in the code. Now you can refactor. See the legacy modernization guide for how to sequence refactoring after characterization.
Golden Master Testing and Approval Tests
Golden master testing is a variant of characterization testing that captures the entire output of a system rather than specific assertions. You run the system with a set of inputs, save the complete output to a file (the “golden master”), and future test runs compare the current output to the saved file.
This is particularly useful for functions that produce complex outputs: reports, HTML, XML, JSON structures with many fields. Instead of writing dozens of individual assertions, you capture the whole output once and assert that nothing changed.
The limitation of golden master testing is that the golden master can become outdated when legitimate behavior changes. A change to the output format requires updating the golden master file. Tools like ApprovalTests (available for Java, C#, Python, and other languages) automate this workflow: when a test fails because the output changed, you can review the diff and “approve” the new output, updating the golden master.
The workflow is: run the test, see the diff, decide if the change was intentional, approve if yes, fix the code if no. This makes golden master testing particularly useful for refactoring sessions where the goal is explicitly to preserve behavior.
For output-intensive legacy code like reporting modules, billing calculation engines, and data transformation pipelines, golden master testing often provides more coverage in less time than writing individual assertions.
Conclusion
Characterization tests are the practical answer to the question of how to refactor legacy code safely. They do not require understanding the code; they require running it and observing it. They do not require a clean architecture; they require a way to call the code under test and capture its output.
Applied consistently, characterization testing transforms a codebase with zero coverage into one with enough coverage to refactor safely. Teams that use this technique typically achieve 60-80% functional coverage of a legacy module within one to two days, enabling confident refactoring that was previously impossible.
Does your codebase have these problems? Let’s talk about your system