Finding
vera test runs the verifier internally to classify each function. When the verifier emits an E500 error ("Postcondition does not hold", with counterexample), the tester does not notice — the function is reported as "verified", "Tier 1 (proved)" in both text and JSON outputs.
A user who runs vera test to confirm a function is correct will see VERIFIED (Tier 1) against a function the prover has refuted with a concrete counterexample. The same counterexample is reachable via vera verify, but a user trusting vera test will never see it.
Reproducer
public fn safe_double(@Nat -> @Nat)
requires(@Nat.0 < 1000)
ensures(@Nat.result == @Nat.0 * 2)
effects(pure)
{
@Nat.0 * 2
}
public fn buggy_double(@Nat -> @Nat)
requires(@Nat.0 < 1000)
ensures(@Nat.result == @Nat.0 * 2)
effects(pure)
{
@Nat.0 + @Nat.0 + 1
}
$ vera verify tester_demo.vera
[E500] Error at tester_demo.vera, line 11, column 3:
Postcondition does not hold in function 'buggy_double'.
Counterexample:
@Nat.0 = 0
@Nat.result = 0
$ vera test tester_demo.vera
safe_double .............................. VERIFIED (Tier 1)
buggy_double .............................. VERIFIED (Tier 1) <- WRONG
Results: 2 verified
$ echo $?
0
Where in code
vera/tester.py::_classify_functions (line 393–476). The function harvests diagnostics from the verifier's result but only inspects severity == "warning" with codes in {E520, E521, E522, E523, E524, E525} (the Tier 3 fallback warnings, line 403). Diagnostics with severity == "error" flow past untouched. A refuted function then falls through to the default ("verified", "Tier 1 (proved)", decl) branch at line 474.
Suggested fix
Add a second harvest pass collecting function names from severity == "error" diagnostics whose error code is in the verification range ({E500, E501, E502} per vera/errors.py). Introduce a new classification category "failed" that takes precedence over "verified" and "tier3":
- text output:
FAILED label with verifier's counterexample inlined
- JSON output:
"category": "failed" carrying the diagnostic
TestSummary gains a failed: int field
- exit code is non-zero when
failed > 0
Acceptance criteria
vera test tester_demo.vera on the reproducer prints buggy_double ............................ FAILED with the E500 counterexample
vera test --json has "category": "failed" on the buggy function
TestSummary.failed field exists, populated, surfaced in both text and JSON
- Exit code is non-zero when
failed > 0
- Regression test in
tests/test_tester.py pins all four behaviours, using a buggy postcondition like the reproducer
- Harvest set is a named constant
{"E500", "E501", "E502"} (open to extension as new verifier errors are added)
Estimated effort
Half a day. Classifier change is mechanical; new category needs ~4 branches in the test engine driver; test additions are small.
Origin
Surfaced by a strategic compiler review on 2026-05-18 (v0.0.155). Marked P0 (correctness, unsafe direction). All facts independently verified against current HEAD before filing.
Finding
vera testruns the verifier internally to classify each function. When the verifier emits anE500error ("Postcondition does not hold", with counterexample), the tester does not notice — the function is reported as"verified", "Tier 1 (proved)"in both text and JSON outputs.A user who runs
vera testto confirm a function is correct will seeVERIFIED (Tier 1)against a function the prover has refuted with a concrete counterexample. The same counterexample is reachable viavera verify, but a user trustingvera testwill never see it.Reproducer
Where in code
vera/tester.py::_classify_functions(line 393–476). The function harvests diagnostics from the verifier's result but only inspectsseverity == "warning"with codes in{E520, E521, E522, E523, E524, E525}(the Tier 3 fallback warnings, line 403). Diagnostics withseverity == "error"flow past untouched. A refuted function then falls through to the default("verified", "Tier 1 (proved)", decl)branch at line 474.Suggested fix
Add a second harvest pass collecting function names from
severity == "error"diagnostics whose error code is in the verification range ({E500, E501, E502}pervera/errors.py). Introduce a new classification category"failed"that takes precedence over"verified"and"tier3":FAILEDlabel with verifier's counterexample inlined"category": "failed"carrying the diagnosticTestSummarygains afailed: intfieldfailed > 0Acceptance criteria
vera test tester_demo.veraon the reproducer printsbuggy_double ............................ FAILEDwith the E500 counterexamplevera test --jsonhas"category": "failed"on the buggy functionTestSummary.failedfield exists, populated, surfaced in both text and JSONfailed > 0tests/test_tester.pypins all four behaviours, using a buggy postcondition like the reproducer{"E500", "E501", "E502"}(open to extension as new verifier errors are added)Estimated effort
Half a day. Classifier change is mechanical; new category needs ~4 branches in the test engine driver; test additions are small.
Origin
Surfaced by a strategic compiler review on 2026-05-18 (v0.0.155). Marked P0 (correctness, unsafe direction). All facts independently verified against current HEAD before filing.