Test Doubles: Mocks, Stubs, Fakes, and Spies
Every developer has encountered the problem: you want to test a function, but it depends on a database, an external API, or some other component that makes the test slow, fragile, or impossible to run in isolation. Test doubles solve this by replacing real dependencies with controlled substitutes. But the term "mock" gets used as a catch-all for at least four distinct concepts, and confusing them leads to tests that are brittle, misleading, or testing the wrong thing entirely. Understanding test doubles properly is what separates a test suite that helps your team from one that slows it down.
What test doubles are and why they matter
A test double is any object that stands in for a real dependency during a test. The term comes from Gerard Meszaros, who borrowed the concept from stunt doubles in film. Just as a stunt double replaces an actor for a specific scene, a test double replaces a real component for a specific test scenario.
Test doubles matter because real dependencies introduce three problems in unit tests. First, they make tests slow. A unit test that queries a real database takes milliseconds per query, which adds up to minutes across thousands of tests. Second, they make tests non-deterministic. If your test depends on an external API, it fails whenever that API is down, rate-limited, or returning different data. Third, they make tests hard to set up. Creating the exact conditions to trigger a specific error response from a real payment gateway requires significant effort compared to telling a test double to return an error.
The goal is always the same: isolate the code under test so you can verify its behavior without interference from external systems. But the way you achieve that isolation varies depending on what you need from the substitute. That is where the four main types of test doubles come in.
Stubs: providing canned answers
A stub is the simplest type of test double. It returns predefined responses to method calls and does nothing else. Stubs do not track whether they were called, how many times they were called, or with what arguments. They simply provide the data your code needs to continue executing.
Use a stub when you need to control what a dependency returns but do not care how the code interacts with that dependency. For example, if you are testing a function that calculates shipping costs based on the user's address, you need the address lookup to return a specific location. A stub that always returns a particular zip code lets you test the shipping calculation logic without involving a real geocoding service.
Stubs are low-risk test doubles because they do not introduce assertions about how the code works internally. They only influence the inputs. This means tests using stubs tend to be resilient to refactoring since changing the internal implementation does not break the stub as long as the same data flows through.
The common mistake with stubs is making them too specific. A stub that returns different values depending on the exact argument passed is drifting toward a more complex test double and should be reconsidered. If the test needs that level of control, a fake might be more appropriate.
Mocks: verifying interactions
A mock is a test double that records how it was used and allows you to make assertions about those interactions. Unlike a stub, which only provides data, a mock also verifies behavior. Did the code call this method? How many times? With what arguments? In what order?
Mocks are appropriate when the interaction itself is the behavior you are testing. For instance, if your feature sends an email when a user signs up, the important behavior is that the email-sending function was called with the correct recipient and template. You do not actually want to send an email during the test. A mock email service lets you verify the call happened without side effects.
The danger with mocks is overuse. When tests verify every interaction with every dependency, they become coupled to the implementation rather than the behavior. A refactor that achieves the same outcome through different internal calls breaks all the mock assertions even though nothing is actually wrong. This is the most common reason test suites become a maintenance burden, and it is why experienced developers treat mocks as a tool of last resort rather than a default.
A good guideline: use mocks for verifying side effects at architectural boundaries (sending an email, publishing an event, writing to a queue) and stubs for everything else. If you find yourself mocking internal collaborators between your own classes, that is usually a sign that the code needs restructuring rather than more mocks.
Fakes: lightweight implementations
A fake is a working implementation of a dependency that takes shortcuts to make it fast and easy to use in tests. The classic example is an in-memory database. It implements the same interface as your real database layer but stores data in a hash map instead of on disk. Queries work, inserts work, and deletes work, just without the overhead of a real database engine.
Fakes are the most powerful type of test double because they allow you to test realistic scenarios without any of the constraints of the real dependency. You can insert test data, simulate error conditions, and verify outcomes all through the same interface your production code uses. Because the fake implements real behavior (just simplified), your tests exercise more of the actual code path than they would with stubs or mocks.
The tradeoff is effort. Building and maintaining a fake requires more work than setting up a stub. The in-memory database needs to handle the queries your code actually uses, which means updating it as your data access patterns evolve. For core dependencies that many tests share (databases, caches, message queues), the investment usually pays off. For one-off dependencies, a stub is more practical.
Fakes also need their own tests. If your fake implements sorting differently from the real database, every test using the fake will pass while production breaks. This is a subtle failure mode that teams discover the hard way. The solution is contract tests: a shared test suite that runs against both the fake and the real implementation to verify they behave the same way.
Spies: recording without replacing
A spy wraps a real implementation and records information about how it was called while still delegating to the actual code. Think of it as a mock that also executes the real behavior. Spies are useful when you want to verify that an interaction happened but still need the real dependency to function.
A practical example: your application logs errors to a monitoring service. During testing, you want to verify that a specific error condition triggers a log entry, but you also want the logging code to execute normally so you catch any bugs in the logging logic itself. A spy on the logger records the call for assertion purposes while still invoking the real logger.
Spies are less common than mocks and stubs because most testing scenarios fall cleanly into either "I need to control the output" (stub) or "I need to verify the interaction" (mock). Spies occupy the middle ground where you need both. Use them sparingly and with clear intent.
Choosing the right test double for each situation
The decision tree is straightforward once you know what each double does:
- Need to control what a dependency returns? Use a stub. It is the simplest option and introduces the least coupling.
- Need to verify a side effect happened? Use a mock. Reserve this for interactions that cross architectural boundaries.
- Need realistic behavior without the real infrastructure? Use a fake. Worth the investment for dependencies shared across many tests.
- Need to verify an interaction while keeping the real behavior? Use a spy. This is a niche use case, so reach for it only when the other options do not fit.
The most important principle across all test doubles is to use the least powerful option that gets the job done. Stubs are less risky than mocks because they do not create assertions about internal behavior. Fakes are more work than stubs but produce more realistic tests. Choose based on what the test actually needs, not on what the framework makes easiest.
Over-reliance on mocks is a code smell that often indicates design issues. If a class requires six mocks to test, it probably has too many dependencies. Refactoring toward smaller, more focused classes with fewer dependencies reduces the need for test doubles in the first place. For more on structuring tests effectively, the guide on why developers should not be your only testers explores how different perspectives catch different types of issues.
Test doubles are essential for writing fast, reliable unit tests. But even a well-structured unit test suite with perfect use of test doubles only validates individual components in isolation. The bugs that reach production often live in the interactions between components, which is something no amount of mocking can simulate. That is where integration testing and structured QA come in. If your team has solid unit tests but still sees bugs in production, the gap is likely in those higher layers. Take a look at how a managed QA service complements your existing automation by testing the system the way real users experience it.
Ready to level up your QA?
Book a free 30-minute call and see how Pinpoint plugs into your pipeline with zero overhead.