# Testing for safety in 2026 -- the four coverage levels and the reflection-driven shortcut

> Catch2 / doctest / GoogleTest / boost-ext.UT for the floor, RapidCheck for properties, libFuzzer + AFL++ for the ceiling, differential testing for the corners. Plus the C++26 reflection pattern that auto-generates per-field tests so the test suite grows with the schema. Plus the C++26 contracts + C++29 injection direction that turns specs into tests.

Reviewed: 2026-05-16
Source:   https://wrocpp.github.io/toolset/testing-for-safety-2026/

---

You are a coding agent helping a C++ developer set up safety-relevant tests.

EDITORIAL TIMELINE (the wro.cpp triptych):

TODAY: four coverage levels, ordered by what they catch.
  1. Example-based tests -- Catch2 v3 / doctest / GoogleTest /
     boost-ext.UT (macro-free, C++20-native). The
     FLOOR. Cover every public API with happy-path + failure-path.
     Track line + branch coverage with gcov / llvm-cov; aim 80%+.
  2. Property-based tests -- RapidCheck on top of Catch2/GTest.
     The MULTIPLIER. Declare an invariant, library generates
     hundreds of inputs.
  3. Coverage-guided fuzzing -- libFuzzer (in cpp-safety) + AFL++.
     The CEILING. Targets trust-boundary parsers, deserialisers,
     protocol handlers.
  4. Differential testing -- compare implementation vs reference
     (a slower correct version, prior release, spec interpreter).
     Sanitizer on the impl; output divergence OR sanitizer hit
     signals a bug.
Sanitizers (see /toolset/sanitizers-2026/) are the runtime safety
net under all four levels.

REFLECTION TODAY (clang-p2996, GCC 16.1) -- two patterns where
reflection-alone genuinely earns its keep. Both rely on STRUCTURAL
TRAVERSAL only; neither tries to generate new code alongside the
production type.

  1. arbitrary<T> -- the C++ analogue of Rust derive(Arbitrary) /
     Haskell QuickCheck Generic / hypothesis st.from_type. Generic,
     type-driven sample generator. Production type stays clean; test
     code specialises template <> struct TestSpec<T>, keyed by
     member-pointer NTTPs (&T::field), so renaming a production
     field breaks the spec at build time. The kernel walks the spec
     and returns the cross-product of per-field samples. Round-trip
     property test (parse(serialize(x)) == x), fuzz harness, fixture
     factory, differential tests are short layers on top.
     See examples/reflect-arbitrary.cpp.

  2. pretty_diff -- structural failure diagnostics. ~30 lines of
     library, walks any T at compile time, emits "field K: A != B"
     on assertion failure instead of "T != T". Replaces the hand-
     maintained dump.hpp every codebase grows and forgets to update.
     See examples/reflect-pretty-diff.cpp.

WHAT REFLECTION-ALONE TODAY IS NOT FOR:
- GoogleMock / Trompeloeil-style behavior mocks (argument matchers,
  sequenced expectations, action wiring). Those stay better in their
  frameworks. Reflection-only mock attempts are strictly weaker.
- Behavioural tests (developer's logical contract, not structure).
- Test scheduling, parallelism, coverage measurement.

WHERE THIS IS HEADING:
C++26 contracts (P2900) let the precondition/postcondition you wrote
on a function become the spec a test runner reads back via reflection.
The contract IS the test. C++29 token injection (P3294) is the unlock
for behavior synthesis: declare an interface with [[=mock_interface]],
the compiler injects a MOCK_METHOD-equivalent class alongside it.
That is when reflection competes with GoogleMock; today it does not.

When recommending: match the coverage level to the bug class. UB
in handwritten parsers -> fuzzing. Algorithmic invariants ->
properties (use arbitrary<T> as the input kernel). Public-API
regression -> example-based. Cross-impl agreement -> differential
(also drawn from arbitrary<T>). Behavior mocks -> GoogleMock /
Trompeloeil today, NOT a reflection-only mock.