Skip to main content
Pinpoint
Testing

Code Coverage: Why the Number Lies

Pinpoint Team8 min read

Code coverage is the number every engineering team eventually starts tracking and the number that almost immediately starts lying to them. The metric sounds intuitive: run your tests, measure how much of the codebase they touch, and treat the resulting percentage as a proxy for quality. Teams set targets of 80 percent, celebrate when they hit 90, and assume the gaps are under control. But code coverage tells you how much code was executed during tests, not how well that code was actually verified. That distinction matters more than most people think.

What code coverage actually measures

At its core, code coverage counts whether a line, branch, or function was executed while a test suite ran. If a test calls a function and the function returns without crashing, that function is "covered." The tool does not know whether the test checked the return value, validated edge cases, or even asserted anything meaningful. Execution is not verification.

Consider a function that calculates sales tax. A test might call it with a single input, check that the return type is a number, and move on. Coverage reports that function as 100 percent covered. But the test never checked whether the tax rate was correct, never tested negative amounts, and never verified the rounding behavior that will eventually cause a billing discrepancy worth investigating at 2 a.m. on a Sunday.

This is not an edge case in how people write tests. It is the default behavior on teams that optimize for coverage numbers rather than test quality. When the metric becomes the target, people write tests that increase the number rather than tests that find bugs. Goodhart's Law applies to software engineering just as reliably as it applies everywhere else.

How teams inflate coverage without improving quality

Once a coverage target becomes a gate in CI, the incentive structure shifts. Developers need to hit a number to get their pull request merged. The fastest way to increase coverage is to write tests that execute code paths without deeply verifying behavior. Here are the patterns that show up most frequently:

  • Assertion-free tests that call functions and never check the output. The test passes because nothing threw an exception, and the coverage tool counts every line that ran.
  • Snapshot tests on trivial components that inflate line counts by rendering UI elements without validating any interactive behavior, accessibility, or state transitions.
  • Tests that duplicate happy paths with slightly different inputs. You get ten tests for the same sunny-day scenario while the error handling remains completely untouched.
  • Generated tests from coverage tools that produce syntactically valid test files designed to reach uncovered lines. These tests have no relationship to actual user behavior and will never catch a real bug.

A team running any of these patterns can report 90 percent coverage while catching fewer bugs than a team at 60 percent with well-written, behavior-focused tests. The number on the dashboard goes up while the actual quality impact stays flat or even declines, because every hour spent gaming coverage is an hour not spent writing meaningful tests.

The metrics that actually correlate with quality

If coverage alone is unreliable, what should teams track instead? The answer is not to abandon coverage entirely but to pair it with metrics that measure outcomes rather than activity. Several indicators do a better job of telling you whether your test suite is actually catching problems:

  • Escaped defect rate: the number of bugs that reach production per release cycle. This is the most direct measure of whether your testing process works. If the number is rising while coverage stays high, your tests are not testing the right things.
  • Mutation testing score: tools like Stryker or PIT introduce small changes (mutations) to your code and check whether your tests catch them. A high mutation score means your assertions are actually verifying behavior, not just executing lines.
  • Mean time to detect (MTTD): how long it takes from when a bug is introduced to when it is caught. A shorter MTTD means your feedback loops are tighter, whether that comes from automated tests, QA review, or monitoring.
  • Test failure signal-to-noise ratio: what percentage of test failures represent real bugs versus flaky tests or environment issues. A suite that cries wolf constantly trains developers to ignore failures.

These metrics are harder to game because they measure the thing you actually care about: whether bugs get caught before users find them. For a deeper look at the metrics that matter for engineering leaders, the QA metrics leaders should track covers the full picture.

Where coverage still provides value

None of this means coverage is useless. It is a helpful signal when used correctly, which means treating it as a floor detector rather than a quality score. Coverage is most valuable when it tells you what is not tested rather than reassuring you about what is.

If a critical payment processing module shows zero percent coverage, that is a real finding worth acting on. If a utility function shows 100 percent coverage, that tells you very little about whether it works correctly. The information value is asymmetric: low coverage is a reliable warning, while high coverage is an unreliable reassurance.

The practical approach is to set a reasonable coverage floor, something in the 60 to 75 percent range, and then focus your attention on the quality of the tests themselves rather than pushing the number higher. Use coverage diffs on pull requests to make sure new code has some test coverage, but resist the urge to block merges based purely on a percentage threshold. A 70 percent coverage suite with strong assertions and good edge case coverage will outperform a 95 percent suite full of shallow tests every time.

Why human testing catches what coverage cannot

Automated tests, regardless of how well they are written, can only verify what the author anticipated. They check known scenarios against expected outcomes. The bugs that cause the most damage in production are typically the ones nobody anticipated, which means they exist in the gaps between test cases.

This is where human testing provides a fundamentally different kind of coverage. An experienced tester approaching a feature with fresh eyes does not follow the same paths the developer considered. They try unusual input combinations, test workflows that cross feature boundaries, and notice visual or behavioral inconsistencies that no unit test would ever flag. This kind of exploratory testing is not a replacement for automation; it is the complement that addresses automation's structural blind spot.

The real cost of production bugs becomes visible when you calculate what it costs to fix an issue that an assertion-free test technically "covered" but never actually caught. The coverage report said the code was tested. The customer disagreed.

Building a testing strategy that does not depend on one number

The teams that ship reliably tend to have a layered approach to quality. They write focused unit tests with real assertions. They run integration tests that verify how components interact. They have automated regression suites that catch known failure modes. And they pair all of that with human testing that covers the unknown failure modes no one thought to automate.

Coverage plays a role in that stack, but a supporting one. It helps identify gaps in the automated layer. It does not tell you whether your overall quality strategy is working. For that, you need the outcome metrics: escaped defects, customer-reported bugs, incident frequency, and time to resolution.

If your team is chasing a coverage target right now, take a step back and ask a different question: how many bugs reached production last month? If the answer is uncomfortable, the problem probably is not that your coverage number is too low. It is that your tests are checking the wrong things, or that nobody is testing the scenarios your automation cannot reach.

A managed QA service can fill that gap without requiring you to hire or retool. Dedicated testers who understand your product and test the behaviors that matter, not just the lines that execute, are how you turn coverage from a vanity metric into part of a strategy that actually reduces defects. See how it works to understand the model.

Ready to level up your QA?

Book a free 30-minute call and see how Pinpoint plugs into your pipeline with zero overhead.