v0.0.138: fix #593 closure-return shadow-push + ENVIRONMENT.md#598
Conversation
The bug: array_map / array_mapi always emit a per-iteration "$gc_sp -= 4" after each call_indirect when the element type is heap-pointer- like (b_needs_unwind path). But _compile_lifted_closure only emitted the matching gc_shadow_push of the return value when the closure body itself allocated (ctx.needs_alloc=True). A closure body like fn(@Bool -> @string) { render_cell(@Bool.0) } where render_cell returns String literals (data segment, no heap alloc) had needs_alloc=False, so its epilogue emitted no push -- but the array_map loop popped anyway, dropping $gc_sp BELOW the surrounding function's prologue baseline. Subsequent shadow-stack pushes overwrote slots holding still-live roots, so the next GC mark phase missed those roots and swept their referents -- manifested as silent string corruption (Conway's Life rendering -- the original issue's symptom: U+FFFD bytes interleaved with valid output) or call_indirect out-of-bounds table access trap at smaller scales. Fix: lift the return-value push out of the if-ctx.needs_alloc branch in _compile_lifted_closure. Always push when the return type is a heap pointer (i32_pair or i32 ADT). No gc_sp save/restore needed in the non-allocating branch -- nothing to clean up -- just intercept the return to publish the root. Diagnostic: added VERA_EAGER_GC=1 environment variable that prepends call $gc_collect to every $alloc invocation. Surfaced the rooting imbalance on the very first iteration rather than only at scale, which was the diagnostic that cracked the investigation. Permanent debug knob now -- documented in new top-level ENVIRONMENT.md catalogue. Validation: - 4 new tests in TestClosureReturnShadowPushBalance cover positive regression, eager-GC behavioural (flat array_map and recursive Life-shape), and structural WAT assertion of the return-value push - Original life_full_program.vera at 12x30 x 200 generations: exit 0, zero U+FFFD corruption (was 6,320 U+FFFD pre-fix) - Full test suite: 3,737 passed, 14 skipped (no regressions) - All 86 conformance programs pass, all 33 examples pass Documentation: - New ENVIRONMENT.md centralises the eight VERA_* environment variables (provider keys, runtime overrides, dev knobs) -- previously scattered across README, AGENTS, TESTING, CONTRIBUTING, SKILL, CLAUDE. Cross-linked from each. - KNOWN_ISSUES.md, ROADMAP.md, HISTORY.md updated to reflect #593 closure rather than carrying the open-bug entry. Closes #593 Co-Authored-By: Claude <noreply@anthropic.invalid>
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #598 +/- ##
==========================================
+ Coverage 90.87% 90.92% +0.04%
==========================================
Files 59 59
Lines 22968 22997 +29
Branches 259 259
==========================================
+ Hits 20873 20910 +37
+ Misses 2088 2080 -8
Partials 7 7
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughThis release v0.0.138 fixes a GC shadow-stack push/pop imbalance in lifted closures returning heap pointers when the closure body is non-allocating. The core fix centralises return-pointer detection and adds unconditional epilogue root-pushing for pointer-returning non-allocating closures. A compile-time VERA_EAGER_GC diagnostic knob forces GC on every allocation. A new ENVIRONMENT.md centralises all VERA_* variable documentation. ChangesGC shadow-stack closure-return fix and diagnostic infrastructure
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested labels
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 Microsoft Presidio Analyzer (2.2.362)TESTING.mdMicrosoft Presidio Analyzer failed to scan this file Comment |
CI lint job runs `uv lock --check` which requires the lockfile to match `pyproject.toml`. The version bump in the previous commit invalidated it. Co-Authored-By: Claude <noreply@anthropic.invalid>
Catches the failure mode that broke #598 CI: bumping the version in pyproject.toml + vera/__init__.py without regenerating uv.lock. The lint job's uv-lock --check flagged it, but the round-trip is avoidable if the pre-commit hook catches the drift locally. Anchors on the [[package]] / name = vera / version = X.Y.Z triple in uv.lock rather than a global vera regex, so a future transitive dep named vera (unlikely but possible) wouldn't false-match. Co-Authored-By: Claude <noreply@anthropic.invalid>
|
@coderabbitai full review |
✅ Actions performedFull review triggered. |
The site landing page's status block had drifted to '81-program conformance suite and 32 worked examples' while the live counts were 86 and 33. Static HTML, no automated check on it, so it had been silently stale for several releases. Updates the numbers to 86 / 33 and extends scripts/check_doc_counts.py with a new section 13 that pins both counts via regex against the live conformance manifest and examples directory — same approach as the existing checks for TESTING.md, README.md, ROADMAP.md, etc. Co-Authored-By: Claude <noreply@anthropic.invalid>
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@HISTORY.md`:
- Line 275: The HISTORY entry added v0.0.138 but left the roll-up footer count
at "137 tagged releases"; update the footer/summary counts that mention "137
tagged releases" (and any other staged-release totals in the HISTORY
footer/ROADMAP/Stage 11 sections) to "138 tagged releases" so the file is
internally consistent with the new v0.0.138 entry.
In `@ROADMAP.md`:
- Around line 38-40: Update the stabilisation-gate summary sentence that still
reads “#1–#4 are closed” to reflect the new ordered list (items 1–3 now defined
in the preceding section); locate the sentence that references the numbered
issues/gates and change it to reference the correct range (e.g., “#1–#3 are
closed” or rephrase to “the top three gates are closed”) so the ROADMAP text
matches the reordered list entries.
In `@tests/test_codegen_closures.py`:
- Around line 979-1095: The PR adds an eager-GC non-allocating pointer-return
regression test for array_map but misses the array_mapi unwind path; add a new
test function (e.g. test_eager_gc_array_mapi_with_non_allocating_string_closure)
that mirrors test_eager_gc_array_map_with_non_allocating_string_closure but uses
array_mapi calls and a closure signature fn(`@Bool`, `@Nat` -> `@String`) that returns
pick(`@Bool.0`); compile under with mock.patch.dict(os.environ, {"VERA_EAGER_GC":
"1"}), execute the result, and assert the same expected stdout ("...X....") to
cover the array_mapi edge case in pick, render_cell, and run_loop-related paths.
- Around line 1148-1158: The current predicate building non_alloc_with_push only
checks for "global.set $gc_sp" and absence of "call $alloc", which can be
satisfied by a plain stack-restore without an actual return-root shadow push;
update the predicate in the test to assert an explicit push sequence for the
return-root in candidate_bodies (e.g., require the closure body to contain the
push instruction sequence used by the runtime such as the explicit push-call or
i32.const + call that represents the shadow-push of the return root) instead of
relying solely on "global.set $gc_sp"; modify the comprehension that defines
non_alloc_with_push to look for that explicit push pattern (referencing the
variable non_alloc_with_push and the candidate_bodies list) so the test only
passes when a real return-root push is present.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 2d293ea8-9510-4aec-a4bd-9f69e27f6c68
⛔ Files ignored due to path filters (6)
docs/SKILL.mdis excluded by!docs/**docs/index.htmlis excluded by!docs/**docs/index.mdis excluded by!docs/**docs/llms-full.txtis excluded by!docs/**docs/llms.txtis excluded by!docs/**uv.lockis excluded by!**/*.lock,!uv.lock
📒 Files selected for processing (18)
AGENTS.mdCHANGELOG.mdCLAUDE.mdCONTRIBUTING.mdENVIRONMENT.mdHISTORY.mdKNOWN_ISSUES.mdREADME.mdROADMAP.mdSKILL.mdTESTING.mdpyproject.tomlscripts/check_doc_counts.pyscripts/check_version_sync.pytests/test_codegen_closures.pyvera/__init__.pyvera/codegen/assembly.pyvera/codegen/closures.py
Comprehensive PR review (3 specialised agents, parallel)Three agents reviewed in parallel: code quality, test coverage, comment accuracy. Skipped type-design (no new types) and silent-failure-hunter (no error-handling changes). Posting findings + action plan here for the permanent record. Critical issues
Important issues
Suggestions
What's well done
Action planWill implement (1)–(6) plus the trivial mechanical fixes as a follow-up commit on this branch:
Skipping the ADT behavioural and eager-GC behavioural tests — the structural test is the higher-leverage pin (it would have failed pre-fix; behavioural tests can pass coincidentally at small scale). Recommendation from code-reviewer was approve-and-merge; the test-analyzer's ADT-branch gap is the only finding worth blocking on, and that's a 10-minute additional test, not a fix change. |
Mechanical fixes (review-comment-analyzer findings): - Drop file:line citations in closures.py + test docstrings; replace with symbol references (calls_arrays.py is 2,472 lines and changes routinely; line ranges would shift within 2-3 releases). - Fix _emit_alloc docstring: "prepend to body" -> "first instruction after local declarations" (WAT requires locals at the top so the call $gc_collect goes after them, not at the very start). - Trim TestClosureReturnShadowPushBalance class docstring's "three layers" claim to match the actual four test layers. - Tighten redundant per-test docstrings that re-explained the bug story already covered in the class docstring. - Lowercase VERA_EAGER_GC env-var comparison so True/YES/On/etc. are also accepted. New tests (review-pr-test-analyzer findings): - test_lifted_closure_emits_return_root_push_for_non_allocating_adt_return: structural assertion for the i32-ADT branch of the fix at closures.py:421-425. Pre-fix this branch was completely untested; exercised here via a non-allocating identity closure over a Box ADT through array_map. - test_array_mapi_non_allocating_closure_emits_balanced_push: pin _translate_array_mapi's pop site independently of array_map's so a future refactor that touches one but not both is caught. - test_vera_eager_gc_injects_gc_collect_into_alloc_body: pin the eager-GC diagnostic mechanism itself. Without this, a silent regression in the env-var-reading code would degenerate the eager-GC behavioural tests to the non-eager case (still passing, but no longer pinning the rooting balance). - test_vera_eager_gc_accepts_case_insensitive_truthy_values (parametrized over 1, true, TRUE, Yes, On, "true "): pin the case-insensitive allowlist behaviour. Suite: 3,746 passed (up from 3,737, +9 from new tests), 14 skipped. Doc counts and version sync are clean. Co-Authored-By: Claude <noreply@anthropic.invalid>
Review action plan: completeCommit Mechanical fixes:
New tests (all passing):
Suite: 3,746 passed (up from 3,737, +9 from the new tests + parametrized variants), 14 skipped. Doc counts and version sync are clean. Skipped: ADT behavioural + ADT eager-GC behavioural tests. The structural test is the higher-leverage pin — it would have failed pre-fix; behavioural tests can pass coincidentally at small scale. |
1. HISTORY.md "137 tagged releases" -> 138 (line 315): the v0.0.138 row was added but the rollup footer count still said 137. 2. ROADMAP.md stabilisation-gate sentence: "order #5 starts when #1-#4 are closed" -> "order #4 starts when #1-#3 are closed". The earlier deletion of the #593 row (now closed) shifted the stabilisation tier to three items; the agent-integration tier numbering is also renumbered 5/6/7 -> 4/5/6 to match. 3. New test: test_eager_gc_array_mapi_with_non_allocating_string_closure mirrors the existing array_map String eager-GC test for the array_mapi unwind path with a fn(@Bool, @nat -> @string) closure returning pick(@Bool.0). The existing array_mapi ADT test (test_array_mapi_non_allocating_closure_emits_balanced_push) covered the i32 branch; this new test covers the i32-pair branch. 4. Tighten the structural #593 predicate. Pre-fix it accepted any "global.set $gc_sp" + absent "call $alloc"; CodeRabbit observed that a bare "global.set $gc_sp" could in principle be a stack- restore rather than a push. In practice the prologue's gc_sp_save/restore is gated on ctx.needs_alloc and the regex already excludes call $alloc, so a non-allocating body cannot contain a restore -- but the literal predicate was weaker than the intent. Adds _has_gc_shadow_push helper that anchors on the canonical "i32.const 4 / i32.add / global.set $gc_sp" SP-increment sequence emitted by gc_shadow_push (vera/wasm/helpers.py). Used by both the i32-pair and i32-ADT structural tests; a non-push artefact in the body can no longer satisfy the assertion. Plus: HISTORY.md tooling section gains a v0.0.138 row for VERA_EAGER_GC alongside the existing tooling milestones (cross- linked to ENVIRONMENT.md, which already has the full reference section for it). Suite: 3,747 passed (+1 from new array_mapi String test), 14 skipped. Co-Authored-By: Claude <noreply@anthropic.invalid>
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@TESTING.md`:
- Line 9: TESTING.md's overall test total (3,760) is out of sync with the
embedded conformance subtotal and the test_conformance.py listing; recompute the
canonical counts by running pytest collection (e.g., pytest --collect-only -q)
to get the authoritative total and per-file counts, then update the conformance
section number (the embedded "425 tests") and the test_conformance.py entry to
match the collected totals so all embedded totals in TESTING.md are consistent
with the pytest-collected counts.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 683131ae-f524-4600-92df-0a732d80acd5
📒 Files selected for processing (6)
README.mdROADMAP.mdTESTING.mdtests/test_codegen_closures.pyvera/codegen/assembly.pyvera/codegen/closures.py
CodeRabbit on #598 caught that line 190's "Via pytest (parametrized -- 425 tests)" disagrees with the table at line 79 (which correctly says 430) and with pytest --collect-only (which reports 430 tests collected for tests/test_conformance.py). This is the same fix CodeRabbit flagged on PR #594 just before that PR merged. It was applied via the Edit tool at the time but landed in a local stash rather than the branch (the merge happened before the push) so the fix was effectively lost. Stash dropped as part of this commit. scripts/check_doc_counts.py confirms documentation counts are consistent post-fix; the canonical 430 = 86 conformance programs * 5 check phases (parse/check/verify/run/format-idempotency). Co-Authored-By: Claude <noreply@anthropic.invalid>
Summary
_compile_lifted_closureonly emitted the closure's return-valuegc_shadow_pushwhen the body itself allocated, butarray_map/array_mapialways emit a per-iteration$gc_sp -= 4after eachcall_indirectwhen the element type is heap-pointer-like. Non-allocating closures returning Strings (likefn(@Bool -> @String) { render_cell(@Bool.0) }) had no push but the loop popped anyway, dropping$gc_spbelow the surrounding function's prologue baseline and corrupting earlier shadow-stack roots. Fix: lift the return-value push out of theif ctx.needs_alloc:branch so it always runs when the return is a heap pointer.VERA_EAGER_GC=1— diagnostic env var that fires$gc_collecton every allocation, surfacing latent missing-shadow-root bugs immediately. This was the diagnostic that cracked Conway's Life at 12x30 still corrupts strings from gen 1 onwards (additional trigger beyond #588) #593 (rebuilt minimum repro crashed at gen 0 under eager-GC rather than around gen 20 without it). Worth keeping permanently as a debugging aid.ENVIRONMENT.md— centralises the eightVERA_*environment variables (four Inference provider keys, two Inference selection knobs, the existingVERA_JS_COVERAGEbrowser-test knob, and the newVERA_EAGER_GCdebug knob). Previously scattered across README, AGENTS, TESTING, CONTRIBUTING, SKILL, CLAUDE — now all link to one canonical source.Test plan
TestClosureReturnShadowPushBalance— positive regression, eager-GC behavioural (flatarray_mapand recursive Life-shape), and structural WAT assertion of the return-value pushlife_full_program.veraat 12×30 × 200 generations: exit 0, 0 U+FFFD bytes (was 6,320 pre-fix)VERA_EAGER_GC=1: 0 U+FFFD, completed all printed generations🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Bug Fixes
Documentation