Skip to content

vera test reports buggy_double as VERIFIED when verifier emits E500 (correctness, unsafe direction) #681

@aallan

Description

@aallan

Finding

vera test runs the verifier internally to classify each function. When the verifier emits an E500 error ("Postcondition does not hold", with counterexample), the tester does not notice — the function is reported as "verified", "Tier 1 (proved)" in both text and JSON outputs.

A user who runs vera test to confirm a function is correct will see VERIFIED (Tier 1) against a function the prover has refuted with a concrete counterexample. The same counterexample is reachable via vera verify, but a user trusting vera test will never see it.

Reproducer

public fn safe_double(@Nat -> @Nat)
  requires(@Nat.0 < 1000)
  ensures(@Nat.result == @Nat.0 * 2)
  effects(pure)
{
  @Nat.0 * 2
}

public fn buggy_double(@Nat -> @Nat)
  requires(@Nat.0 < 1000)
  ensures(@Nat.result == @Nat.0 * 2)
  effects(pure)
{
  @Nat.0 + @Nat.0 + 1
}
$ vera verify tester_demo.vera
[E500] Error at tester_demo.vera, line 11, column 3:
  Postcondition does not hold in function 'buggy_double'.
    Counterexample:
        @Nat.0 = 0
        @Nat.result = 0

$ vera test tester_demo.vera
  safe_double  .............................. VERIFIED (Tier 1)
  buggy_double .............................. VERIFIED (Tier 1)   <- WRONG

Results: 2 verified
$ echo $?
0

Where in code

vera/tester.py::_classify_functions (line 393–476). The function harvests diagnostics from the verifier's result but only inspects severity == "warning" with codes in {E520, E521, E522, E523, E524, E525} (the Tier 3 fallback warnings, line 403). Diagnostics with severity == "error" flow past untouched. A refuted function then falls through to the default ("verified", "Tier 1 (proved)", decl) branch at line 474.

Suggested fix

Add a second harvest pass collecting function names from severity == "error" diagnostics whose error code is in the verification range ({E500, E501, E502} per vera/errors.py). Introduce a new classification category "failed" that takes precedence over "verified" and "tier3":

  • text output: FAILED label with verifier's counterexample inlined
  • JSON output: "category": "failed" carrying the diagnostic
  • TestSummary gains a failed: int field
  • exit code is non-zero when failed > 0

Acceptance criteria

  1. vera test tester_demo.vera on the reproducer prints buggy_double ............................ FAILED with the E500 counterexample
  2. vera test --json has "category": "failed" on the buggy function
  3. TestSummary.failed field exists, populated, surfaced in both text and JSON
  4. Exit code is non-zero when failed > 0
  5. Regression test in tests/test_tester.py pins all four behaviours, using a buggy postcondition like the reproducer
  6. Harvest set is a named constant {"E500", "E501", "E502"} (open to extension as new verifier errors are added)

Estimated effort

Half a day. Classifier change is mechanical; new category needs ~4 branches in the test engine driver; test additions are small.

Origin

Surfaced by a strategic compiler review on 2026-05-18 (v0.0.155). Marked P0 (correctness, unsafe direction). All facts independently verified against current HEAD before filing.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions