Skip to content

v0.0.4: Typed AST layer#1

Merged
aallan merged 1 commit into
mainfrom
ast-layer
Feb 23, 2026
Merged

v0.0.4: Typed AST layer#1
aallan merged 1 commit into
mainfrom
ast-layer

Conversation

@aallan

@aallan aallan commented Feb 23, 2026

Copy link
Copy Markdown
Owner

Summary

  • Typed AST layer: ~50 frozen dataclass node classes with source spans, covering all grammar constructs — the foundation for the type checker (Phase C3)
  • Lark→AST transformer: ~86 bottom-up transformer methods converting parse trees to typed AST nodes, with TransformError for unhandled rules
  • CLI: vera ast <file> (indented text) and vera ast --json <file> (JSON output)
  • 83 new tests: round-trip tests for all 13 examples, node-specific tests for every construct, span propagation tests, serialisation tests (193 total, up from 110)
  • Documentation: updated README (project status, CLI docs, project structure), SKILLS.md (toolchain section), CHANGELOG

Test plan

  • All 193 tests pass (110 parser + 83 AST)
  • All 13 example programs transform to AST without error
  • vera ast examples/factorial.vera prints readable indented text
  • vera ast --json examples/factorial.vera produces valid JSON
  • to_dict() and pretty() work on all node types
  • Spans propagate correctly from Lark parse tree to AST nodes

🤖 Generated with Claude Code

Add a typed AST layer (frozen dataclass nodes with source spans) and a
Lark Transformer that converts parse trees into AST. This is the
foundation for the type checker (Phase C3).

- vera/ast.py: ~50 node classes covering all grammar constructs
- vera/transform.py: ~86 transformer methods, bottom-up conversion
- vera/cli.py: `vera ast` and `vera ast --json` commands
- vera/parser.py: parse_to_ast() convenience function
- vera/errors.py: TransformError for transformation failures
- tests/test_ast.py: 83 tests (round-trip, node-specific, span, serialisation)
- README.md, SKILLS.md, CHANGELOG.md: updated documentation

193 tests pass (110 parser + 83 AST).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@aallan aallan merged commit f85904d into main Feb 23, 2026
3 checks passed
@aallan aallan deleted the ast-layer branch February 23, 2026 04:49
aallan added a commit that referenced this pull request Mar 31, 2026
Add a 'State handler with a loop helper' subsection showing the most
common State pattern: handle[State<Int>] wrapping a where-block helper
that has effects(<State<Int>>).

This pattern was the #1 undocumented syntax that LLMs fail to produce.
VeraBench testing shows:
- Without this example: Claude Sonnet 4 fails check on both VB-T5-004
  (State accumulator) and VB-T5-008 (IO print loop) — can't produce
  valid where-block + effect syntax
- With this example: both problems pass check (100%)

The example covers: pure outer function with handler, effectful where-
block helper with decreases clause, pure helper called from handler
body, with clause for state updates, and visibility rules for where
blocks.

Relates to: aallan/vera-bench#15

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
aallan added a commit that referenced this pull request Mar 31, 2026
* Add State handler + where block example to SKILL.md

Add a 'State handler with a loop helper' subsection showing the most
common State pattern: handle[State<Int>] wrapping a where-block helper
that has effects(<State<Int>>).

This pattern was the #1 undocumented syntax that LLMs fail to produce.
VeraBench testing shows:
- Without this example: Claude Sonnet 4 fails check on both VB-T5-004
  (State accumulator) and VB-T5-008 (IO print loop) — can't produce
  valid where-block + effect syntax
- With this example: both problems pass check (100%)

The example covers: pure outer function with handler, effectful where-
block helper with decreases clause, pure helper called from handler
body, with clause for state updates, and visibility rules for where
blocks.

Relates to: aallan/vera-bench#15

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: update allowlist and site assets for SKILL.md State handler section

The 51-line "State handler with a loop helper" insertion shifted all
code-block line numbers after line 1213, making 19 allowlist entries in
check_skill_examples.py stale. Update them to their new positions, remove
entries whose blocks now parse cleanly, and add the one new entry needed
(Correct: import syntax example at 1681).

Also clarify the "Pure helper functions" key point: add_value is called
from the `where` block helper (sum_loop), not from the handler body
directly.

Regenerate docs/ site assets to include the new SKILL.md section.

Co-Authored-By: Claude <noreply@anthropic.invalid>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Claude <noreply@anthropic.invalid>
aallan added a commit that referenced this pull request Apr 23, 2026
Both findings real, both fixed.

Predicate hoist (#2):
- Extracted the canonical ASCII whitespace predicate into a
  module-level helper emit_is_ascii_whitespace(byte_local, indent)
  in vera/wasm/helpers.py.  Single-source-of-truth for the
  branchless {space(32) | (byte - 9) < 5} encoding of Python's
  str.isspace() ASCII set {tab(9), LF(10), VT(11), FF(12), CR(13),
  space(32)}.  Replaced four open-coded copies with calls to the
  helper:
    1. _translate_is_whitespace (~10 LOC → 1 line)
    2. _is_ws_inline closure inside _translate_trim
    3. _translate_structural_split count pass (string_words)
    4. _translate_structural_split emit pass (string_words)
- Helper docstring documents the historical context (PR #510 round 2
  caught _translate_strip's divergent narrow predicate; round 4
  hoisted the helper) and warns against re-encoding the byte set.
- Direct application of the byte-literal-collision review rule (memory
  file feedback_review_heuristics.md): two functions sharing a
  predicate routed through one helper from day 1 prevents future
  divergence.

Browser regression (#1):
- New tests/test_browser.py::TestBrowserStringUtilities::
  test_string_strip_vt_ff: compiles and runs a program that calls
  string_strip("\\u{0B}\\u{0C}hi \\u{0B}") and asserts the output
  is "hi".  Pins the strip→trim delegation contract under the
  browser runtime — if a future refactor accidentally re-opens the
  old narrow predicate for strip, this assertion breaks.

All 3,487 tests pass; mypy clean; doc count check + site asset
check + spec/SKILL/conformance/example checks all green.

Co-Authored-By: Claude <noreply@anthropic.invalid>
aallan added a commit that referenced this pull request Apr 23, 2026
Two new issues surfaced by the same agent run:

- #515: $gc_collect itself faults with out-of-bounds memory access
  under sustained allocation pressure.  40×20 Conway's Game of Life
  over 200 generations reliably reproduces.  The collector walks
  past $heap_ptr to the linear-memory bound and traps — gc_collect
  is the top frame of the crashing stack, not the program.  This
  is distinct from #487 (alloc path) and #484 (sweeper 16-bit
  truncation).
- #516: Runtime traps bubble up as raw wasmtime stack traces; the
  CLI mis-labels every Trap/WasmtimeError as "Runtime contract
  violation" even when the actual cause is out-of-bounds memory
  access, integer overflow, unreachable, etc.  No source-line
  mapping, no actionable guidance — the exact opposite of the
  carefully crafted compile-time diagnostics.  Three-stage
  proposed scope: categorise trap reason, source-map the Vera
  function that trapped, specialise help for common trap classes.

KNOWN_ISSUES.md updated with rows for both.

ROADMAP.md implementation-order table re-sorted:
  1. #514 (nested closures + captured-scalar indirection)
  2. #515 (GC collect faults)
  3. #516 (runtime trap diagnostics)
  4. #475 (WASM translator bug cleanup — previously #1)
  5. #507 (Eq/Ord-dispatched array ops)

The top three are all surfaced by the single Game of Life agent
run — the empirical signal is the clearest prioritisation we've
had.  #475 was promoted to #1 after PR #511 merged; pushed to #4
as the new issues outrank it on agent-adoption impact.

Agent-noted quote, worth capturing as the motivation for #516
specifically:

  "the gap between 'the type system is happy' and 'the compiled
   artefact actually runs' is wider than you'd expect from a
   language with SMT-verified contracts. The verifier can prove
   your termination argument is sound while the codegen silently
   miscompiles your closure environment out from under you."

Also posted comment on #514 documenting the third shape the
agent is now isolating (closure body with i32/i64 type mismatch,
distinct from direct-nesting and captured-scalar-indirection).

No code changes.

Co-Authored-By: Claude <noreply@anthropic.invalid>
aallan added a commit that referenced this pull request Apr 23, 2026
CodeRabbit caught two copies of the same BROKEN/WORKING nested-closure
example pair — one labeled (1) and one unlabeled but otherwise identical.
Dropped the unlabeled duplicate; kept the (1)-labeled copy as the
canonical example for the 'Known limitation: nested closures and
captured-scalar indirection' subsection.

scripts/check_skill_examples.py: cleaned up ALLOWLIST (fix_allowlists
re-introduced duplicate keys per the memory'd bug; manually deduped
and re-verified 66 unique entries, 0 duplicates).

Co-Authored-By: Claude <noreply@anthropic.invalid>
aallan added a commit that referenced this pull request May 7, 2026
…nt-integration

Reframed "What's next" to acknowledge the stabilisation work uncovered
by the v0.0.119–v0.0.137 bug-killing campaign. Two patterns worth
closing out before declaring runtime-correctness floor raised:

1. Scale-dependent bugs slip past the standard test suite (#515,
   #570, #487, now #593 — all required real-world programs at scale
   to surface; none would have been caught by the existing 3,747
   small focused tests).

2. Walker-completeness gaps are silent failures of the same shape
   (#588 found _walk_free_vars missing 8 AST branches; we don't yet
   know if other walkers in the codebase have similar incompleteness).

The "Implementation order" table is now split into two tiers:

  - Stabilisation tier (priority): #593, #596, #597, #595
  - Agent-integration tier (resumes after): #222, #523, #370

#596 (stress-test harness) and #597 (walker-completeness audit) are
new issues filed in this PR — they convert "I think we're stabilising"
into evidence by building the infrastructure that prevents recurrence
of the bug classes #588 and #593 surfaced.

#593 retains the lead position because it's the largest open codegen
unknown — bisection narrows it to 12x30 + recursive run_loop with
allocating call argument, but the root cause hasn't been isolated.
Until then, "WASM codegen is stable at scale" is unverified.

#595 sits last in the stabilisation tier because it likely shares a
root cause with #593 (both surface together at the same scale); fixing
#593 may close it. The v0.0.137 KeyboardInterrupt guard already
prevents the Python-traceback half.

Order #5 (LSP) doesn't start until orders #1-#4 close (or are
explicitly deferred with an open follow-up). Tier separator is
explicit so future readers don't have to reverse-engineer the
ordering rationale.

Co-Authored-By: Claude <noreply@anthropic.invalid>
aallan added a commit that referenced this pull request May 7, 2026
1. HISTORY.md "137 tagged releases" -> 138 (line 315): the v0.0.138
   row was added but the rollup footer count still said 137.

2. ROADMAP.md stabilisation-gate sentence: "order #5 starts when
   #1-#4 are closed" -> "order #4 starts when #1-#3 are closed".
   The earlier deletion of the #593 row (now closed) shifted the
   stabilisation tier to three items; the agent-integration tier
   numbering is also renumbered 5/6/7 -> 4/5/6 to match.

3. New test: test_eager_gc_array_mapi_with_non_allocating_string_closure
   mirrors the existing array_map String eager-GC test for the
   array_mapi unwind path with a fn(@Bool, @nat -> @string) closure
   returning pick(@Bool.0).  The existing array_mapi ADT test
   (test_array_mapi_non_allocating_closure_emits_balanced_push)
   covered the i32 branch; this new test covers the i32-pair branch.

4. Tighten the structural #593 predicate.  Pre-fix it accepted any
   "global.set $gc_sp" + absent "call $alloc"; CodeRabbit observed
   that a bare "global.set $gc_sp" could in principle be a stack-
   restore rather than a push.  In practice the prologue's
   gc_sp_save/restore is gated on ctx.needs_alloc and the regex
   already excludes call $alloc, so a non-allocating body cannot
   contain a restore -- but the literal predicate was weaker than
   the intent.  Adds _has_gc_shadow_push helper that anchors on the
   canonical "i32.const 4 / i32.add / global.set $gc_sp" SP-increment
   sequence emitted by gc_shadow_push (vera/wasm/helpers.py).  Used
   by both the i32-pair and i32-ADT structural tests; a non-push
   artefact in the body can no longer satisfy the assertion.

Plus: HISTORY.md tooling section gains a v0.0.138 row for
VERA_EAGER_GC alongside the existing tooling milestones (cross-
linked to ENVIRONMENT.md, which already has the full reference
section for it).

Suite: 3,747 passed (+1 from new array_mapi String test), 14 skipped.

Co-Authored-By: Claude <noreply@anthropic.invalid>
aallan added a commit that referenced this pull request May 7, 2026
Two Conway's Life agent experiments on --target browser surfaced
five concrete blockers (issues #602/#603/#604/#609/#610) and an
explicit design memo (#608) mapping each obstacle to a concrete
runtime-only fix.  The current stabilisation framing covers
codegen reliability and walker completeness but treats browser-
target reliability as a separate concern.  The agent's diagnosis
makes that split harder to defend: "write once, run anywhere"
is currently true for pure computation and approximate-to-false
for anything with timing or screen output.  Two of the fixes
(#609 JSPI-driven IO.sleep, #610 ANSI subset interpreter) close
the timing and rendering halves of the seam without language
changes -- adding them to the stabilisation tier commits to
"browser-target Vera is something you'd actually use" before
the agent-integration push.

Changes:
- Expanded the campaign-pattern list from two patterns to three;
  third pattern documents the browser-target seam with links to
  the umbrella issue (#608) and the five concrete blockers.
- Added #609 (JSPI-driven IO.sleep) as item 3 in the stabilisation
  tier, with rationale + the WebAssembly JSPI / Asyncify mechanism.
- Added #610 (ANSI subset interpreter) as item 4 in the same tier,
  paired with #609 -- together they close the seam and let life.vera
  (terminal version) run unchanged on --target browser.
- Renumbered #595 from item 3 to item 5 (now last in stabilisation,
  since it's contingent on an upstream wasmtime-py release).
- Renumbered the agent-integration tier items 4/5/6 -> 6/7/8 to
  match.
- Updated the "What moves when" gate from "#4 starts when #1-#3
  are closed" to "#6 starts when #1-#5 are closed".

Co-Authored-By: Claude <noreply@anthropic.invalid>
aallan added a commit that referenced this pull request May 8, 2026
…ents + CodeRabbit

Four review agents (code-reviewer, pr-test-analyzer, comment-analyzer,
silent-failure-hunter) plus three CodeRabbit comments produced a
batch of cross-cutting findings.  This commit addresses them all
in one pass to avoid the iterative round-trip cost the bug class
itself was filed to eliminate.

**Critical fixes:**

1. **Closure body E615 harvest gap (silent-failure-hunter C1).**
   `_compile_lifted_closure` constructed its own WasmContext but
   never harvested `_interp_inference_failures` — closure bodies
   with E615-triggering interpolation segments were silently
   dropped, producing call_indirect→missing-table-entry WASM
   validation traps with no source-located diagnostic.  Extracted
   the harvest into `CodeGenerator._harvest_interp_inference_failures`
   on the codegen base and called it from both functions.py and
   closures.py.  Removed the now-disproved `# pragma: no cover —
   defensive` claim on closures.py's body-instrs-None path.  New
   regression test `test_e615_in_closure_body_emits_diagnostic`.

2. **`_canonical_named_type` type_args from terminal NamedType
   (CodeRabbit #3 + code-reviewer I1).**  The walker's
   `outer_type_args` capture rule (always read from first NamedType)
   lost type_args when alias_map substitution bound a generic
   param to a parameterised type.  Empirically reachable today via
   `type Mapper<A, B> = fn(A -> B); Mapper<Int, Array<Int>>` etc.
   Fix: drop the "outermost" rule, return type_args from the
   terminal NamedType reached.  All 21 existing #602-class
   regression tests still pass.  New regression test
   `test_canonical_named_type_terminal_args_propagation`.

3. **HISTORY.md footer (CodeRabbit #1).**  141 tagged releases →
   142 to match the v0.0.142 row added in PR #631.

**UX fix:**

4. **Multiple E615 per function (silent-failure-hunter H2).**
   Pre-this-fix `_translate_interpolated_string` returned None on
   the first failing segment, requiring N round-trips for N bad
   segments in one interpolation.  Now collects every failing
   segment via `had_failure` flag, then bails at the end.  New
   regression test `test_multiple_e615_in_one_interpolation`.

**Doc-quality fixes (comment-analyzer):**

5. **Trigger-count drift** in `operators.py:491-494` —
   "every canonicalisation gap (#602 and ten subsequent triggers)"
   implied 11; rewrote to "the ten triggers of the #602 bug class
   accumulated across PRs #627 + #629".

6. **`_format_named_type_canonical` docstring drift** — claim
   "matches the pre-#630 fallback shape" was empirically wrong
   (pre-#630 resolved through aliases, post-#630 fallback doesn't).
   Rewrote to acknowledge the deliberate behavioural change.

7. **"Future closure-arg shapes plug in here without further
   dispatch ladder"** — overstatement.  Adding a `FnCall`
   returning a closure still needs an `elif`; what's saved is
   canonicalisation logic, not dispatch logic.  `IfExpr` between
   closures has no single `return_type` field — example was
   misleading.  Rewrote in both `_infer_fncall_vera_type` and
   `_infer_apply_fn_return_type`.

8. **`_canonical_wasm_type` docstring** — added explicit `Unit →
   None` documentation; clarified that `None` (no WASM type) and
   `"i64"` (unreachable-NamedType default) are distinct return
   shapes that callers must not conflate.

**Tests added (besides the regressions tied to fixes):**

9. **`test_per_function_isolation_of_failures_list`** — pins
   that a clean function and a dirty function in the same source
   produce independent `_interp_inference_failures` lists.

10. **`test_e615_fires_on_result_in_interpolation`** — adjacent
    ADT shape (Result vs Option) so a future Option-handling
    broadening doesn't accidentally regress the parallel Result
    path.

11. **`test_array_map_refinement_returning_closure`** — pins the
    previously-unaudited `_infer_closure_return_vera_type` path
    in `calls_arrays.py`, which the #630 migration broadened to
    handle refinements.

**Test enhancements:**

12. **E615 ordering before matching E602** in the existing test
    (CodeRabbit #2) — pinned via `warnings.index(e615[0])` <
    `warnings.index(matching_e602)`.  Filtered to the E602 that
    mentions the offending function so prelude-skip E602s don't
    confound the assertion.

13. **E615 source-location attached** in the existing test —
    softened to "line > 0" (any location) because SlotRef-inside-
    InterpolatedString spans are unreliable today (#634, separate
    follow-up).  Tightening to "line points at the segment" is
    the natural acceptance test for that follow-up.

**Skipped with reasons:**

- **Cycle-guard regression on `_resolve_base_type_name`** —
  test-analyzer C1 / code-reviewer S4.  Verified pre-existing
  and dead-code-safe today (cycles crash an upstream resolver
  first).  Filed as #633.

- **`_canonical_wasm_type`'s `i64` default for unhandled
  apply_fn / call_indirect shapes** — silent-failure-hunter H1.
  Bigger refactor parallel to Tier 2 but for the WASM-side
  dispatch.  Filed as #632.

- **SlotRef-inside-InterpolatedString span fidelity** —
  surfaced when writing the E615 source-location assertion.
  Diagnostic-quality concern, separate from #630's
  canonicalisation scope.  Filed as #634.

- **`_canonical_named_type` worked-example docstring**
  (comment-analyzer I6) — superseded by fix #2 above (semantics
  changed; docstring rewritten with concrete examples).

- **Per-context isolation prose-vs-test** (comment-analyzer I4)
  — addressed via fix #9 (test added).

- **`_resolve_i32_pair_ret_te` 30-line upper docstring** —
  comment-analyzer style nit.  The prose is load-bearing
  context (#628 cross-module narrative); trimming would lose
  history without obvious gain.  Defer.

**Follow-ups:** #632 (apply_fn / call_indirect E616), #633
(_resolve_base_type_name cycle guard), #634 (interpolation
SlotRef source spans).  Added to KNOWN_ISSUES.md alongside #628.

Counts: 3,792 tests (3,778 passed, 14 skipped) — six new tests
under `TestE615LoudInterpolationFallthrough630`.  CHANGELOG +
KNOWN_ISSUES + ROADMAP + TESTING.md + HISTORY footer all
synced.  mypy clean.

Co-Authored-By: Claude <noreply@anthropic.invalid>
aallan added a commit that referenced this pull request May 11, 2026
CodeRabbit symmetry catch: the `CodegenInvariantError` catch-block
in `_compile_fn` emits the `[E699]` warning without first calling
`self._harvest_interp_inference_failures(ctx)`, whereas the
parallel `CodegenSkip` catch-block does call it before the
`[E602]` warning.

Empirically invariant violations fire early — before any
interpolation translation has populated
`ctx._interp_inference_failures` — so this is mostly insurance.
But keeping the two handlers structurally identical is the right
principle: any time we add a new failure side-channel above
either path, we want both to drain it.  Added the harvest call
plus a comment explaining the symmetry reasoning.

Refs #626 #658

Co-Authored-By: Claude <noreply@anthropic.invalid>
aallan added a commit that referenced this pull request May 11, 2026
Five findings from CodeRabbit's review of #659 — three 🟠 Major
correctness issues and two 🟡 Minor quality-of-life
improvements.  All five verified valid against current code and
addressed minimally.

CR-1 (`scripts/check_e602_clean.py:174`) — 🟠
`_extract_skips()` filtered only `("E602", "E604")` but the
CHANGELOG entry and the suppression pass in
`vera/codegen/core.py` both treat `[E605]` as part of the same
template-warning surface.  A future E605 silent skip would have
slipped past the gate.  Added `"E605"` to the filter tuple +
inline comment cross-referencing the suppression pass.

CR-2 (`tests/test_codegen.py:15880`) — 🟡
`test_mono_suffix_correct_for_slotref_fn_alias_arg` asserted WAT
suffix strings but didn't execute the compiled module — a
regression that produced correct WAT but broken indirect-call
wiring would slip past.  Added `execute(result, fn_name="main")`
+ `assert exec_result.value == 20` to pin the runtime contract.
Also added a sibling test
`test_parameterised_alias_substitutes_type_args` exercising the
CR-4/CR-5 fix with the same execute+assert shape (21 for
`Some(7) * 3`).

CR-3 (`vera/codegen/core.py:552`) — 🟠
Pre-fix the template-warning suppression checked `_fn_sigs` for
the existence of mono clones.  But `_fn_sigs` is populated in
Pass 1.5 (registration) *before* any mono body is compiled — so
a clone that `_compile_fn(mdecl)` later rejects with `None` would
still be counted as "compiled" and wrongly suppress its
template's diagnostic.  That hides the only pre-runtime signal
for a genuinely-broken generic.  Introduced
`compiled_mono_bases: set[str]` populated only inside the
`if fn_wat is not None:` branch of the mono-clone compile loop.
Suppression now uses `compiled_mono_bases & forall_decl_names`.

CR-4 (`vera/codegen/monomorphize.py::_resolve_arg_fn_shape`) — 🟠
For a *parameterised* FnType alias like
`type Mapper<T> = fn(T -> T)`, `@Mapper<Int>.0` resolved to the
raw alias body `fn(T -> T)` without substituting `T → Int`.
Downstream `_infer_fn_alias_type_args` then bound alias-local
names (`A → T`, `B → T`) instead of concrete ones, producing
mono suffixes like `option_map$T_T` rather than
`option_map$Int_Int`.  Fixed by substituting the SlotRef's
`type_args` into the alias body before returning.

CR-5 (`vera/wasm/calls.py::_resolve_arg_fn_shape_wasm`) — 🟠
Same gap on the WASM call-site rewriting side.  Same fix applied.

Shared substitution helper
Rather than duplicate the type-var substitution logic in two
codegen modules, extended the existing module-level
`substitute_type_vars` in `vera/wasm/inference.py` (used by
`InferenceMixin` canonicaliser + `CodeGenerator._type_expr_to_wasm_type`)
to also handle `FnType` (it previously handled only `NamedType`
and `RefinementType`).  Both `_resolve_arg_fn_shape` helpers now
import + use the canonical implementation — same semantics
across all four callers, no drift risk.

Validation
- mypy: clean (59 source files)
- pytest: 3,794 passed, 14 skipped (was 3,793 + 1 new test)
- e602 gate: 116 files clean, 6 allowlist entries matched, 0 stale
- All 86 conformance programs pass; all 34 examples pass
  `vera check` + `vera verify`

Refs #604 #655 #659

Co-Authored-By: Claude <noreply@anthropic.invalid>
aallan added a commit that referenced this pull request May 13, 2026
CR-1 (`CONTRIBUTING.md:90`) — commit-stage hook count drift.

The heading "**commit-stage** hooks (24, each gated to relevant
files) include:" disagreed with the immediately-preceding
sentence "25 are configured at the commit stage".  Updated to
match (24 -> 25).  Pure doc-consistency fix.

CR-2 (`scripts/check_walker_coverage.py:167`) — auto-discover
walker files instead of hardcoded list.

The hardcoded `WALKER_FILES` list silently skipped any new
walker file added without updating the list — replicating the
exact silent-skip class this script was written to close.
Verified by inspection: nothing forced contributors adding a
new walker file to also touch this list.

Replaced with `_discover_walker_files()` which globs
`vera/**/*.py` for files containing the `WALKER_COVERAGE:`
marker.  Returns a sorted `list[Path]` (same shape as the
hardcoded list it replaces) so downstream code referencing
`WALKER_FILES` is unchanged.

Validation
- `python scripts/check_walker_coverage.py` -> 9 walkers cover
  all 29 Expr subclasses (clean — same output as pre-fix)
- Discovered file list matches the pre-fix hardcoded list
  exactly (6 files, same paths)

This also addresses the #1 critical finding from the
pr-review-toolkit audit run before this commit.

Refs #597 #668

Co-Authored-By: Claude <noreply@anthropic.invalid>
aallan added a commit that referenced this pull request May 13, 2026
**Finding 1: substring-only WAT checks could false-pass** (valid).
The previous tests asserted constants like `0x80000000`, `i32.or`,
`0x7FFFFFFF`, and `i32.ge_u` in isolation.  But:
- `i32.const 0x80000000` appears in BOTH the wrap-site tag AND
  the heap-ceiling guard.  Stripping the wrap OR would still
  leave the substring (from the guard) and the test would pass.
- `i32.or` also appears in header-mark manipulation, unrelated
  to the wrap path.

Switched all three tests to boundary-safe adjacent-sequence
regex:
- `test_wrap_emits_tag_or`: `re.search(r"i32\.const 0x80000000\s+i32\.or", ...)`
- `test_unwrap_emits_mask_and`: `re.search(r"i32\.load\s+offset=4\s+i32\.const 0x7FFFFFFF\s+i32\.and", ...)`
- `test_alloc_emits_heap_ceiling_guard`: extract `$alloc` via
  `re.search(r"\(func \$alloc\b", ...)` (boundary-safe vs the
  previous `find("(func $alloc")` which could false-match
  `$alloc_xxx`), then assert the EXACT ordered 8-instruction
  guard sequence (`global.get $heap_ptr; local.get $total;
  i32.add; i32.const 0x80000000; i32.ge_u; if; unreachable;
  end`) via a single multi-line regex with `re.DOTALL`.

**Finding 2: _wrap_handle doesn't check raw_handle < 0x80000000**
(valid).  If a handle ever has bit 31 set, `raw_handle |
0x80000000` is a no-op and the unwrap mask `& 0x7FFFFFFF`
returns the WRONG handle — silent host-store-lookup corruption
that's exactly the latent failure class #578 itself sought to
eliminate.  Practically infeasible (counters bounded well below
2^31; a 2B-handle session is wall-clock-infeasible) but the
invariant deserves an explicit assertion per DESIGN.md principle
#1 (checkability).  Added `if raw_handle & 0x80000000: raise
RuntimeError(...)` immediately before the `_write_i32` /
`_call_register_wrapper` block in
`vera/codegen/api.py::_wrap_handle`.  Error includes
raw_handle, kind, body_ptr per CR's exact recommendation.

**Not addressed: CR's 3rd finding**
("Missing the long-run high-handle-counter regression that
#578 is meant to prevent" — labelled Major, Heavy lift).

Declined with reasoning:

1. **#578 was a RETENTION bug, not a correctness bug.**  Pre-fix
   the conservative scan could mark unrelated heap objects when
   a handle value matched the heap-range check.  In an
   IMMUTABLE language those objects can't corrupt state — only
   memory grows.  No output assertion can distinguish pre-fix
   from post-fix at modest scale.

2. **A true regression test requires heap-growth inspection.**
   Vera's test infra doesn't expose `$heap_ptr` post-run; adding
   that machinery is a separate infrastructure change.  An
   alternative — measure WASM `memory.size` deltas — needs the
   same instrumentation.

3. **The structural invariants are mathematically sufficient.**
   The 3 tightened tests + `$alloc` heap-ceiling guard prove
   by construction that no tagged value can EVER match the
   heap-range check: heap_ptr is hard-capped at 0x80000000, and
   every tagged value is >= 0x80000000.  Sets don't overlap.
   This is a stronger guarantee than a stochastic
   non-occurrence stress test could provide.

4. **A weaker "scale" test would be misleading.**  200K wrap
   allocations under eager GC complete fine pre-fix too (the
   retention is bounded and self-clearing once the falsely-
   retained object becomes legitimately unreachable in a later
   cycle).  Such a test would pass on broken code, providing
   false confidence.

If we add `$heap_ptr` inspection to the test harness in future
(a worthwhile general improvement), I'd be happy to file a
follow-up that includes a deterministic retention regression.
Tracked as a possible follow-up issue.

Validation: pytest 3868/3868, mypy clean, ruff S clean,
doc-counts consistent.

Co-Authored-By: Claude <noreply@anthropic.invalid>
aallan added a commit that referenced this pull request May 22, 2026
1. host_md_parse exception scope (#693 inline #1): the broad
   except Exception was swallowing host-side invariant
   violations (shadow-stack overflow from _ShadowGuard,
   ValueError from write_md_block exhaustive match, internal
   AssertionErrors).  Restructured to put only the parse step
   inside the try; shadow-stack work + write_md_block +
   Result.Ok alloc now sit outside, so invariant violations
   propagate as wasmtime traps rather than being repackaged as
   user-domain parse errors.  Mirrors the narrow-catch pattern
   in host_html_parse and host_json_parse.

2. Redundant guard.push in markdown.py walkers (#693 inline #2):
   the _write_inline_array / _write_block_array /
   _write_array_of_block_arrays / _write_table_data helpers
   already push their backing buffer onto the shadow stack
   internally.  Removed the duplicate guard.push(arr_ptr) calls
   immediately after these helpers in write_md_inline (MdEmph,
   MdStrong, MdLink) and write_md_block (MdParagraph,
   MdHeading, MdBlockQuote, MdList, MdTable, MdDocument).
   Functionally equivalent (the entry IS rooted, just twice)
   but halves shadow-stack consumption from those branches and
   removes the only correct way to overflow the guard on
   pathologically deep walks.  Pushes after alloc_string calls
   are KEPT - alloc_string does not push.

3. TESTING.md "435 tests" -> "440 tests" (outside-diff): the
   pytest-parametrized shell-block comment now matches the
   table entry and the new 88-program x 5-stage total.

4. Tracking: added #694 (Windows test_browser timeout flake) to
   KNOWN_ISSUES.md "Test coverage gaps" with the symptom,
   asymmetry-test diagnostic that proves it is a flake (3.11
   and 3.13 pass on the same Windows runner where 3.12 timed
   out), and the one-line fix (bump timeout=30 to timeout=60 at
   the two sites).  Per the PR-generated-new-issues-must-be-
   tracked-in-the-same-PR rule.

Validation: 3,815 pytest pass + 88 conformance + mypy clean +
all doc validators green + limitations sync consistent.

Co-Authored-By: Claude <noreply@anthropic.invalid>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant