Skip to main content
Pinpoint
Testing

Property-Based Testing: Beyond Example Tests

Pinpoint Team8 min read

Traditional example-based tests verify that a specific input produces a specific output. You write a test with one set of data, assert the result, and move on. This works well for documenting known cases, but it has a fundamental limitation: you only test the scenarios you think of. Property based testing flips the approach. Instead of specifying individual examples, you describe the properties that should always hold true, and a framework generates hundreds or thousands of random inputs to verify those properties. The result is a testing approach that finds edge cases you never would have imagined.

What property based testing actually means

A property is a statement about your code that should be true for every valid input, not just the five or ten examples you wrote by hand. For example, if you have a function that sorts a list, several properties hold regardless of the input: the output has the same length as the input, every element in the input appears in the output, and each element is less than or equal to the next. These properties do not specify a particular list. They describe what "correctly sorted" means in general terms.

A property-based testing framework takes these property definitions and generates random inputs to test them. If the framework finds an input that violates the property, it reports the failure. Better yet, most frameworks also perform "shrinking," which means they automatically simplify the failing input to the smallest case that still triggers the bug. Instead of telling you that a list of 847 random integers caused a failure, the framework distills it down to a list of 3 integers that reproduces the same issue.

The concept originated in Haskell with the QuickCheck library, but mature implementations now exist for virtually every language. Hypothesis covers Python. fast-check handles JavaScript and TypeScript. jqwik works for Java. ScalaCheck serves Scala. The principles are the same across all of them: define properties, generate inputs, find violations.

Why example-based tests miss bugs

When you write an example-based test, you choose inputs based on your mental model of the code. If your mental model is incomplete (and it always is), the examples you choose will be incomplete too. You test with positive numbers but forget about zero. You test with short strings but miss the behavior with Unicode characters. You test the happy path because that is what you built the code to handle.

The numbers are striking. A function that takes two 32-bit integers as input has roughly 18 quintillion possible input combinations. Even a thorough example-based test suite might cover a few dozen of those. Property-based testing does not cover all 18 quintillion either, but running 1,000 random combinations per test still represents orders of magnitude more exploration than hand-picked examples.

More importantly, property-based testing explores inputs that a human would never choose. Empty strings, negative numbers, extremely large values, special characters, and combinations of these that seem unlikely but occur in production. The framework does not share your assumptions about what inputs are "normal," which is exactly why it finds the bugs you do not.

A real-world example: a team building a scheduling application tested their time-slot overlap detection with a handful of carefully chosen scenarios and felt confident it worked. When they added property-based tests with random time ranges, the framework found that two events with identical start and end times were not detected as overlapping. The boundary condition was invisible to the developers because they never generated that exact scenario by hand. For more on how edge cases compound into production incidents, the breakdown of production bug costs quantifies what these misses actually cost.

Writing your first property-based tests

The hardest part of property-based testing is not the tooling. It is identifying the right properties to test. There are several reliable categories of properties that apply to most code:

  • Round-trip properties: if you serialize and deserialize an object, you should get the original back. If you encrypt and decrypt a message, you should get the original back. Any pair of inverse operations gives you a round-trip property.
  • Invariant properties: something that should always be true after an operation. A sorted list should have each element less than or equal to the next. A balanced binary tree should have a height difference of at most one between subtrees. A bank transfer should not change the total amount across all accounts.
  • Idempotency properties: applying an operation twice should produce the same result as applying it once. Normalizing whitespace, deduplicating a list, or formatting a phone number should all be idempotent.
  • Comparison properties: when you have a new implementation replacing an old one, both should produce the same output for any input. This is sometimes called an "oracle" test and is extremely effective during refactoring.
  • Hard to compute, easy to verify: sorting is hard but checking that a list is sorted is easy. Finding the shortest path in a graph is hard but verifying that a path exists and calculating its length is easy. When the verification is simpler than the computation, property-based testing shines.

Start with round-trip properties because they are the easiest to identify and write. Any code that transforms data in a reversible way has an obvious round-trip test. From there, look for invariants in your business logic. "The total of line items should equal the order total" is an invariant. "No user should have a negative balance" is an invariant. These properties map directly to business rules that you already know should hold.

Shrinking: why failing examples get small

When a property-based test finds a violation, the raw failing input is often large and hard to debug. A random string of 200 characters or a list of 50 elements does not immediately point to the root cause. This is where shrinking makes property-based testing practical.

Shrinking is an automated process where the framework tries smaller and simpler versions of the failing input to find the minimal case that still triggers the bug. If a list of 50 integers causes a failure, the framework tries lists of 25, then 12, then 6, narrowing down until it finds the smallest list that reproduces the issue. For strings, it tries shorter substrings and simpler characters.

The result is a minimal reproduction case delivered automatically. This is something that would take a developer significant time to produce manually, especially for bugs involving complex input combinations. Shrinking turns a "your code fails with this incomprehensible input" into "your code fails with these two specific values," which makes debugging straightforward.

Once you have the shrunk example, convert it into a regular example-based regression test. This gives you the best of both worlds: property-based testing discovers the bug, and an example-based test ensures it never comes back. Over time, your example-based test suite grows to include cases that no human would have thought to write.

Common challenges and practical solutions

Generating valid inputs is the first challenge most teams encounter. If your function expects a valid email address, random strings will fail validation before reaching the interesting logic. Property-based testing frameworks solve this with custom generators that produce structurally valid inputs. You define a generator for email addresses that combines random local parts with random domains, ensuring the format is correct while the content is random.

Slow tests become an issue when each property test runs hundreds of iterations. If the function under test is itself slow (maybe it queries a database), running it 1,000 times per property is impractical. The solution is the same as with any unit test: isolate the logic from external dependencies. Property-based testing works best on pure functions and business logic that can execute quickly. For integration-level testing, reduce the iteration count and accept a smaller sample size.

Flaky seeds can cause confusion when a test passes locally but fails in CI, or vice versa. Most frameworks use a random seed that can be fixed for reproducibility. When a property test fails, the framework reports the seed so you can reproduce the exact same inputs. Always log or print the seed in CI so you can investigate failures deterministically.

Understanding how property-based testing fits alongside other approaches helps you allocate effort wisely. The comparison of manual testing versus automation clarifies which types of issues each approach is best at catching.

Where property-based testing fits in your strategy

Property-based testing is not a replacement for example-based tests. It is a complement. Example-based tests document specific scenarios and serve as executable specifications. Property-based tests explore the input space broadly and catch edge cases. Both belong in your test suite, covering different dimensions of correctness.

The ideal combination is example-based tests for known scenarios and business rules, property-based tests for data transformation and algorithmic correctness, integration tests for component interactions, and exploratory testing for the unexpected behaviors that no automated approach will find. Each layer catches what the others miss.

Property-based testing is particularly valuable during refactoring. If you are rewriting a module and want confidence that the new version behaves identically to the old one, a comparison property that runs both versions against thousands of random inputs is far more thorough than manually checking a few examples. This is where the "oracle" style of property testing becomes indispensable.

Your automated tests, including both example-based and property-based, handle the deterministic verification layer. But the bugs that frustrate users most often emerge from the interactions between components and the unexpected ways real people use your product. Those are the issues that automated generation cannot reach. If you want to combine strong automated testing with structured human QA that covers those gaps, see how a managed QA service works alongside engineering teams that already invest in their test suites.

Ready to level up your QA?

Book a free 30-minute call and see how Pinpoint plugs into your pipeline with zero overhead.