v0.0.122: bound conservative GC scan against $heap_ptr (#515) by aallan · Pull Request #542 · aallan/vera

aallan · 2026-04-27T15:01:08Z

Closes #515.

Summary

$gc_collect was faulting under sustained allocation pressure with gc_collect itself at the top of the stack — the collector, not the program, was the crashing frame. Symptom: memory fault at wasm address 0x... in linear memory of size 0x.... Reliably reproduced with a 470-line Conway implementation crashing at generation 56.

Root cause

The Phase 2 worklist-seeding code in _emit_gc_collect accepts a shadow-stack value as a heap pointer if it satisfies three guards:

in heap range (val >= gc_heap_start + 4)
properly aligned ((val - gc_heap_start) % 8 == 4)
below $heap_ptr

None of those guards prove the word at val - 4 is actually an object header. A non-pointer i32 in payload data (e.g. a bit-packed Nat row in Conway-style code, a hash, anything else with bits in heap range) can satisfy all three; Phase 2b then reads garbage as obj_size = header >> 1, sets the mark bit (corrupting a random heap word!), and walks obj_ptr + 0..obj_size past $heap_ptr and past the linear-memory boundary.

This is the classic conservative-GC false-positive failure mode.

Fix

Two layers of defence in _emit_gc_collect (vera/codegen/assembly.py):

Layer 2 (early skip) — before either marking or scanning a worklist entry, compute obj_size and verify obj_ptr + obj_size <= heap_ptr. If not, skip the entry entirely (no mark store, no scan loop). Catches the bug at the cheapest possible point and prevents the mark-bit corruption that would otherwise persist across the cycle.
Layer 1 (per-iter) — inside the conservative scan loop, also check obj_ptr + scan_ptr + 4 <= heap_ptr before each i32.load. Negligible cost relative to the load it guards, and protects any future caller that reaches this loop without the Layer-2 check.

Tests

Structural regression test: tests/test_codegen.py::TestGarbageCollection::test_gc_collect_bounds_check_against_heap_ptr asserts both bound checks survive in the emitted WAT. Behavioural reproducers for GC collect itself faults with out-of-bounds memory access under sustained allocation pressure #515 are heavily layout-sensitive (string-pool offsets, allocation order), so a structural assertion is the durable regression guard.
3,569 tests pass; mypy clean; 81 conformance programs pass; 33 examples pass.

Verification

470-line Conway implementation now runs cleanly through every generation (was crashing at gen 56 pre-fix).

Test plan

pytest tests/ -q — 3,555 passed, 14 skipped
mypy vera/ — clean
python scripts/check_conformance.py — all 81 pass
python scripts/check_examples.py — all 33 pass
python scripts/check_version_sync.py — version 0.0.122 consistent
All pre-commit hooks pass

Co-Authored-By: Claude noreply@anthropic.invalid

Summary by CodeRabbit

Bug Fixes
- Resolved a GC crash occurring under sustained allocation pressure by adding defensive heap-bounds checks during conservative GC scanning.
Tests
- Added a regression test to verify the conservative-scan bounds and prevent regressions.
Chores
- Bumped release version to v0.0.122 and updated package metadata.
Documentation
- Updated changelog, roadmap, README, testing metrics and module GC docs to reflect the fix and new release.

Closes #515. $gc_collect was faulting under sustained allocation pressure: the Phase 2 worklist-seeding code accepts a shadow-stack value as a heap pointer if it satisfies three guards (in heap range, properly aligned, below $heap_ptr). None of those guards prove the word at val-4 is actually an object header. A non-pointer i32 in payload data (e.g. a bit-packed Nat row in Conway-style code) can satisfy all three; Phase 2b then reads garbage as obj_size = header >> 1, sets the mark bit (corrupting a random heap word), and walks obj_ptr + 0..obj_size past heap_ptr and past the linear-memory boundary, trapping with "memory fault at wasm address 0x... in linear memory of size 0x..." — $gc_collect itself at the top of the stack. Two layers of defence in _emit_gc_collect (vera/codegen/assembly.py): * Layer 2 (early skip): before either marking or scanning a worklist entry, compute obj_size and verify obj_ptr + obj_size <= heap_ptr. If not, skip the entry — no mark store, no scan loop — preventing both the trap and the mark-bit corruption. * Layer 1 (per-iter): inside the conservative scan loop, also check obj_ptr + scan_ptr + 4 <= heap_ptr before each i32.load. Cheap relative to the load itself, and protects any future caller that reaches this loop without the Layer-2 check. Verified end-to-end with a 470-line Conway implementation that pre-fix crashed reliably at generation 56: now runs cleanly through every generation. Structural regression test in tests/test_codegen.py::TestGarbageCollection::test_gc_collect_bounds_check_against_heap_ptr asserts both bound checks survive in the emitted WAT. Behavioural reproducers for #515 are heavily layout-sensitive (string-pool offsets, allocation order, so a structural assertion is the durable regression guard. Doc sweep: KNOWN_ISSUES (remove #515 row); ROADMAP (drop #515 from queue, intro updated to ten remain); HISTORY (add v0.0.122 row); CHANGELOG (new section). Site assets regenerated. Co-Authored-By: Claude <noreply@anthropic.invalid> EOF )

coderabbitai · 2026-04-27T15:01:22Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: b40fc42e-748c-406c-916a-6c905bfcadf2

📥 Commits

Reviewing files that changed from the base of the PR and between 3eae6bb and 75682a1.

⛔ Files ignored due to path filters (1)

uv.lock is excluded by !**/*.lock, !uv.lock

📒 Files selected for processing (4)

HISTORY.md
ROADMAP.md
TESTING.md
tests/test_codegen.py

📝 Walkthrough

Walkthrough

v0.0.122: adds heap-pointer bounds checks to the conservative GC mark/scan in $gc_collect (early-skip and per-iteration guards); updates release/version metadata, docs, and adds a regression test verifying the emitted WAT checks.

Changes

Cohort / File(s)	Summary
Version metadata `pyproject.toml`, `vera/__init__.py`	Bumped project/package version from `0.0.121` → `0.0.122`.
Changelog & release notes `CHANGELOG.md`, `HISTORY.md`	Added `0.0.122` release entry, updated `[Unreleased]` compare link and incremented tagged-release count.
Docs & README updates `README.md`, `vera/README.md`, `ROADMAP.md`, `KNOWN_ISSUES.md`, `TESTING.md`	Updated project status, test totals, roadmap priority/bug-campaign to reflect v0.0.122; removed the KNOWN_ISSUES entry for `#515` and tightened GC mark-phase documentation.
GC conservative-scan fix `vera/codegen/assembly.py`	Assembly emission: after popping a worklist entry compute `obj_size` and skip if `obj_ptr + obj_size > heap_ptr`; add per-iteration `heap_ptr` bound check inside the conservative i32 scan to avoid out-of-bounds `i32.load`.
Regression test `tests/test_codegen.py`	New unit test that compiles a small program, locates `func $gc_collect` in emitted WAT, and asserts presence/order of the Layer 2 (early-skip) and Layer 1 (per-iteration) `heap_ptr` bounds checks.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Widen GC header size field from 16 to 31 bits (closes #484) #488 — Modifies object-header size extraction used by $gc_collect; overlaps header/size logic changed here.
Fix GC shadow stack overflow causing silent array corruption #473 — Prior GC worklist/collector changes in vera/codegen/assembly.py; contextually related to the bounds/safety fixes.

Suggested labels

compiler, tests, docs

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 75.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The PR title accurately reflects the primary change: adding heap bounds checks to the conservative GC scan against $heap_ptr to fix issue `#515`, with a version bump to v0.0.122.
Linked Issues check	✅ Passed	All coding objectives from `#515` are met: Layer 2 bounds check (obj_ptr + obj_size ≤ heap_ptr) prevents marking/scanning invalid entries, Layer 1 per-iteration check guards loads against heap_ptr, regression test asserts opcode sequences, and documentation clarifies the fix.
Out of Scope Changes check	✅ Passed	All changes are in scope: core GC bounds-check implementation, version bumps (pyproject.toml, vera/init.py), documentation updates (CHANGELOG, HISTORY, README, ROADMAP, vera/README), test count increments, and regression test coverage. No unrelated refactoring.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch v0.0.122-gc-collect-bound-check

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

codecov · 2026-04-27T15:05:27Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 91.02%. Comparing base (40396b6) to head (75682a1).

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #542   +/-   ##
=======================================
  Coverage   91.02%   91.02%           
=======================================
  Files          58       58           
  Lines       22033    22033           
  Branches      259      259           
=======================================
  Hits        20055    20055           
  Misses       1971     1971           
  Partials        7        7

Flag	Coverage Δ
javascript	`56.83% <ø> (ø)`
python	`94.97% <100.00%> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

coderabbitai

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

HISTORY.md (1)
296-296: ⚠️ Potential issue | 🟡 Minor

Tagged releases count not incremented in footer.

The footer still says "121 tagged releases" but this PR adds v0.0.122, so it should say "122 tagged releases". Based on learnings, this count must also be updated in ROADMAP.md's footer section.
Proposed fix
-Total: **810+ commits, 121 tagged releases, 40 active development days.**
+Total: **810+ commits, 122 tagged releases, 40 active development days.**
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@HISTORY.md` at line 296, Update the footer string "121 tagged releases" to
"122 tagged releases" in HISTORY.md and make the same change in ROADMAP.md's
footer; search for the literal "121 tagged releases" (or the footer block
containing "Total: **... commits, 121 tagged releases, ...**") and replace the
count so both files reflect the new v0.0.122 release.
vera/codegen/assembly.py (1)
680-717: ⚠️ Potential issue | 🔴 Critical

This still marks interior payload words as though they were headers.

The new obj_ptr + obj_size <= heap_ptr gate only rejects false positives whose decoded size runs past the heap. A conservative root that numerically lands at header+12 of a live object still gets through if the payload word at header+8 happens to decode to a small in-bounds obj_size; we then set bit 0 at obj_ptr - 4, which is a payload word, not an object header. So the collector can still corrupt live data even though it no longer walks out of bounds. Please add a real object-start/header validation step before the mark store.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@vera/codegen/assembly.py` around lines 680 - 717, The code currently decodes
a size from whatever word sits at $header (which may be an interior payload) and
then sets the mark bit at $obj_ptr-4; instead, load and validate the actual
candidate header at potential_header_ptr = ($obj_ptr - 4) before setting the
mark: compute potential_header_ptr, i32.load that word into a new local (e.g.
reuse $header or a new local like $candidate_header), check its
header-format/tag (ensure low tag bit is 0 or matches your header-tag invariant)
and decode obj_size from that candidate header (local.get $candidate_header;
i32.const 1; i32.shr_u) and re-run the bounds check (potential_header_ptr +
obj_size <= global.get $heap_ptr) and already-marked check against the candidate
header; if any check fails br $m_loop, otherwise proceed to set the mark by
writing back to potential_header_ptr using the validated candidate header before
the i32.store that sets the mark bit.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@ROADMAP.md`:
- Line 42: Update the ROADMAP entry for issue `#348` to reflect the current
worklist size: change the description that currently says a "fixed 1,024-entry
cap" to state the collector uses a 16 KiB worklist (4,096 i32 entries) and that
overflow at that size is the tracked limitation; reference the implementation
location vera/codegen/assembly.py and the GC worklist allocation/constant (the
worklist buffer size / worklist allocation constant) so readers know where the
limit is defined.

In `@tests/test_codegen.py`:
- Around line 1924-1927: The current assertions only check for comment markers
in the WebAssembly text (wat) but must assert the actual bound-check opcodes;
update the checks that reference the "Layer 2 (issue `#515`)" and "Layer 1 (issue
`#515`)" markers to also verify the emitted bound-check instructions are present
in wat (use the existing wat variable and the two marker strings to locate the
relevant region, then assert the opcode sequence for the bound check is present,
e.g., that the region contains the expected instruction like "i32.lt_u" (and the
following conditional branch such as "br_if") or the concrete opcode pattern
your compiler emits for bounds checking). Ensure both Layer 2 (header sanity
check) and Layer 1 (per-iteration scan) assertions are strengthened to look for
these opcodes adjacent to their marker comments rather than solely the comments.

---

Outside diff comments:
In `@HISTORY.md`:
- Line 296: Update the footer string "121 tagged releases" to "122 tagged
releases" in HISTORY.md and make the same change in ROADMAP.md's footer; search
for the literal "121 tagged releases" (or the footer block containing "Total:
**... commits, 121 tagged releases, ...**") and replace the count so both files
reflect the new v0.0.122 release.

In `@vera/codegen/assembly.py`:
- Around line 680-717: The code currently decodes a size from whatever word sits
at $header (which may be an interior payload) and then sets the mark bit at
$obj_ptr-4; instead, load and validate the actual candidate header at
potential_header_ptr = ($obj_ptr - 4) before setting the mark: compute
potential_header_ptr, i32.load that word into a new local (e.g. reuse $header or
a new local like $candidate_header), check its header-format/tag (ensure low tag
bit is 0 or matches your header-tag invariant) and decode obj_size from that
candidate header (local.get $candidate_header; i32.const 1; i32.shr_u) and
re-run the bounds check (potential_header_ptr + obj_size <= global.get
$heap_ptr) and already-marked check against the candidate header; if any check
fails br $m_loop, otherwise proceed to set the mark by writing back to
potential_header_ptr using the validated candidate header before the i32.store
that sets the mark bit.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 20dfd6f8-acd8-4bdc-8d0c-b47b5426001a

📥 Commits

Reviewing files that changed from the base of the PR and between 40396b6 and c61ed6d.

⛔ Files ignored due to path filters (4)

docs/index.html is excluded by !docs/**
docs/index.md is excluded by !docs/**
docs/llms-full.txt is excluded by !docs/**
docs/llms.txt is excluded by !docs/**

📒 Files selected for processing (10)

CHANGELOG.md
HISTORY.md
KNOWN_ISSUES.md
README.md
ROADMAP.md
TESTING.md
pyproject.toml
tests/test_codegen.py
vera/__init__.py
vera/codegen/assembly.py

💤 Files with no reviewable changes (1)

KNOWN_ISSUES.md

Two parallel updates I missed in the main #515 commit: * Module map: assembly.py grew 774 -> 856 lines with the Layer 1 + Layer 2 bound checks. * GC architecture description: Phase 2 ("Mark") now documents the two-layer bound check explicitly, and explains *why* it's needed (the seeding guards don't prove the word at val-4 is actually a header, so a payload word that satisfies the guards can lead the marker past heap_ptr. This is the single most subtle thing about the conservative GC and was previously elided in the README. Co-Authored-By: Claude <noreply@anthropic.invalid> EOF )

Three doc/test fixes from CodeRabbit on PR #542 plus the uv.lock refresh required by the version bump. Fourth CodeRabbit finding (assembly.py outside-diff suggestion to load the header from obj_ptr-4) verified as already-implemented; the existing header local IS the value loaded from obj_ptr-4 -- replying inline on the PR. Changes: * ROADMAP #348: "fixed 1024-entry cap" -> "16 KiB / 4,096-entry cap (gc_worklist_size = 16384 in vera/codegen/assembly.py)". CodeRabbit verified: assembly.py line 269 sets gc_worklist_size = 16384 (= 4096 entries x 4 bytes). * HISTORY.md footer: "121 tagged releases" -> "122 tagged releases". Stale after the v0.0.122 row landed in the Stage 9 release table. * tests/test_codegen.py: strengthen the #515 regression test to assert the actual opcode pattern adjacent to each marker comment rather than just the marker text. A refactor that left the comment in place but deleted the underlying bound check would silently pass the old version; the new version pins the exact opcode sequence. Fragility tradeoff: the test now fails if anyone reorders the bound check, but the assertion message shows the drift verbatim so the fix is one diff away. * uv.lock: regenerated via uv lock. Required by the version bump (pyproject.toml 0.0.121 -> 0.0.122) -- lint job failed on uv lock --check. Co-Authored-By: Claude <noreply@anthropic.invalid>

aallan · 2026-04-27T15:49:50Z

@coderabbitai thanks. Three of four findings actioned in 75682a1:

ROADMAP GC worklist overflow for deeply nested object graphs #348 worklist size — fixed: now reads "16 KiB / 4,096-entry cap (gc_worklist_size = 16384 in vera/codegen/assembly.py)". You were right that the existing "1024-entry cap" was stale.
HISTORY footer — fixed: "121 tagged releases" → "122 tagged releases" (the v0.0.122 row was already in the Stage 9 release table; the footer hadn't been bumped).
Test assertions on bound-check opcodes — fixed: the test now asserts the exact opcode pattern adjacent to each Layer N (issue #515) marker, not just the marker text. A refactor that left the comment in place but deleted the underlying check fails the test now. Marker comments stay as discoverability anchors; opcodes are the contract.

For the fourth (outside-diff in vera/codegen/assembly.py, "load and validate the actual candidate header at potential_header_ptr = $obj_ptr - 4"): not actioning because the code already does this. Lines 674-679 load $header from $obj_ptr - 4 directly:

local.get $obj_ptr
i32.const 4
i32.sub
i32.load
local.set $header

$header IS the value at $obj_ptr - 4 — it's the potential_header_ptr value the suggestion asks for, just under a different name. $obj_size is decoded from it (line 681-684), the bound check obj_ptr + obj_size <= heap_ptr runs against it (lines 695-702), the already-marked check runs against it (lines 704-708), and the mark store writes back to $obj_ptr - 4 with the validated value (lines 711-717). The header-tag invariant in this codebase is (size << 1) | mark with no separate "is-this-a-header" tag bit — the bound check on obj_size is the only validation possible without introducing a new tag scheme, and that's exactly what Layer 2 does. Happy to rename $header → $candidate_header if that would make the intent clearer, but the validation is structurally complete.

Also pushed uv.lock regenerated for the version bump (lint job was failing on uv lock --check).

coderabbitai · 2026-04-27T15:50:23Z

Tip

For best results, initiate chat on the files or code changes.

@aallan — thanks for the detailed walkthrough on all four items.

The three fixes are clear and the test improvement (asserting actual opcodes adjacent to the Layer N (issue #515) markers) is the right approach — marker comments as discoverability anchors, opcodes as the contract.

On the fourth: you're correct, and I was wrong to suggest it. $header already is the word at $obj_ptr - 4, and given the header-tag invariant (size << 1) | mark — with no dedicated "is-this-a-header" tag bit — there is no additional structural check possible beyond the obj_size bounds validation that Layer 2 already performs. Renaming to $candidate_header is optional flavour; the validation logic is complete as written.

✏️ Learnings added

Learnt from: aallan
URL: https://github.com/aallan/vera/pull/542

Timestamp: 2026-04-27T15:50:22.222Z
Learning: In aallan/vera `vera/codegen/assembly.py`, the GC object header encoding is `(size << 1) | mark` — there is NO dedicated "is-this-a-header" tag bit. When conservative-GC worklist seeding treats a shadow-stack word as a heap pointer at `$obj_ptr`, the value loaded from `$obj_ptr - 4` is already the candidate header (`$header` local). The only possible structural validation for a word with no tag bit is the Layer 2 bound check: `obj_ptr + obj_size <= heap_ptr`. Do NOT suggest loading a separate `potential_header_ptr` or a tag-bit check — neither exists in this encoding. The local is named `$header` but behaves as a candidate_header; suggesting a rename to `$candidate_header` is cosmetic only.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

🧠 Learnings used

Learnt from: aallan
Repo: aallan/vera PR: 485
File: vera/wasm/calls_arrays.py:622-626
Timestamp: 2026-04-17T10:13:59.841Z
Learning: In `vera/wasm/calls_arrays.py`, do NOT add per-site `arr_len * sizeof(B) > 65535` trap guards in individual array combinators (`array_map`, `array_range`, `array_concat`, `array_append`, `array_slice`). All of these can allocate >65535 bytes and are equally affected by the 16-bit GC object-header size field truncation bug. The correct, single-point fix is issue `#484`: widen the GC header size field from 16-bit to 31-bit in `vera/codegen/assembly.py` by updating the two `i32.const 65535 / i32.and` masks in `$alloc` and three in `$gc_collect`. Per-site guards would shift the landmine without defusing it.

Learnt from: aallan
Repo: aallan/vera PR: 473
File: vera/codegen/assembly.py:508-514
Timestamp: 2026-04-16T10:35:13.666Z
Learning: In vera/codegen/assembly.py `_emit_gc_collect`, the worklist push sites (Phase 2 seed loop and Phase 2b mark loop) intentionally silently skip enqueuing when the worklist is full (`wl_ptr >= wl_end`). This is deliberate bounded-marking-worklist design: overflow means some objects may not be transitively marked and get conservatively freed, which is acceptable. A trap/unreachable at that point would be incorrect because Phase 1 has already cleared all mark bits and Phase 3 has not yet run, leaving the heap in an unrecoverable inconsistent state. The worklist is 16 KiB (4,096 i32 entries), making overflow require >4,096 live pointers at a single marking depth. The known limitation for deeply nested object graphs is tracked in issue `#348`. The shadow stack overflow guard (added in this PR) is the critical correctness defence.

Learnt from: aallan
Repo: aallan/vera PR: 536
File: HISTORY.md:258-258
Timestamp: 2026-04-27T12:21:59.379Z
Learning: In `HISTORY.md` and `ROADMAP.md` for aallan/vera, the "tagged releases" total appears in **three separate locations**: (1) the per-row version entry in the Stage 11 table in `HISTORY.md`, (2) the bottom-of-file roll-up footer in `HISTORY.md` (e.g. "810+ commits, 121 tagged releases, 40 active development days"), and (3) the "Where we are" / footer section in `ROADMAP.md`. All three must be incremented together on each release. aallan confirmed in PR `#536` (commit 7de39ff) that this count should be added to the gated list in issue `#528` (auto-generate or gate doc counts via `check_doc_counts.py`). Flag any PR where only a subset of the three locations is updated.

Learnt from: aallan
Repo: aallan/vera PR: 510
File: vera/wasm/calls_strings.py:2877-2962
Timestamp: 2026-04-23T10:22:01.626Z
Learning: In `vera/wasm/calls_strings.py` (and WASM translation mixins generally): a DRY refactor that consolidates duplicate code into a shared helper (e.g. `_translate_strip` delegating to `_translate_trim`) does NOT automatically propagate GC-root correctness from the old duplicates to the new helper. After any such consolidation, the surviving helper must be independently audited for GC shadow-stack rooting: every heap pointer that is still live across a subsequent `call $alloc` must appear in a `gc_shadow_push` before that alloc. The `_translate_trim` fix (PR `#510`, commit fb4d0b6) is the canonical example: the strip→trim DRY refactor correctly unified the whitespace predicate (rule 1) but missed the missing `gc_shadow_push(ptr)` before the destination alloc (rule 2). These two correctness properties are independent and must be checked separately.

Learnt from: aallan
Repo: aallan/vera PR: 489
File: vera/wasm/calls_arrays.py:1072-1082
Timestamp: 2026-04-17T17:27:34.889Z
Learning: In `vera/wasm/calls_arrays.py` (PR `#489`), the `u_is_adt` / `u_needs_root` heuristic in `_translate_array_fold` deliberately over-roots any non-Bool/Byte `i32` accumulator, including opaque host handles (Map, Set, Decimal, Regex). This is intentional safe over-caution: the conservative GC's alignment/range check rejects out-of-range small integers (e.g. `5` from `host_map_new`) so no spurious marking occurs, while under-rooting would be a silent-GC bug (class `#464`). The correct fix — a positive `is_gc_managed(type)` predicate applied cross-cutting across all rooting-decision sites in every mixin — is tracked in issue `#490` and is out of scope for PR `#489`. Do NOT flag the `u_is_adt` condition as incorrect or missing a positive-predicate check; it is a deliberate bias-to-root default.

Learnt from: aallan
Repo: aallan/vera PR: 511
File: tests/test_browser.py:1272-1379
Timestamp: 2026-04-23T13:38:11.484Z
Learning: Repo aallan/vera: When a test in tests/test_browser.py or tests/test_codegen.py needs to verify that json_as_int returns Some(INT64_MIN) for the input -2^63 (i.e. JNumber(0.0 - 9223372036854775808.0)), do NOT call int_to_string(Int.0) directly — there is a pre-existing bug (`#475`, queued for v0.0.120) where int_to_string(INT64_MIN) triggers a negation overflow inside the WAT code. The correct workaround is to probe the value indirectly, e.g. via `bool_to_string(Int.0 < 0)`, and assert the result is "true". Add an inline test-body comment documenting the `#475` bug to prevent future reviewers from replacing the workaround with a direct print.

Learnt from: aallan
Repo: aallan/vera PR: 511
File: tests/test_codegen.py:12294-12682
Timestamp: 2026-04-23T13:18:52.667Z
Learning: Repo aallan/vera PR `#511`: In tests/test_codegen.py::TestJsonTypedAccessors, accessor import-gating is now covered by:
- test_accessors_do_not_force_json_imports and test_layer2_accessors_do_not_force_json_imports (ensure json_as_*/json_get_* are pure prelude and do NOT emit '(import "vera" "json_parse")' or '(import "vera" "json_stringify")' in WAT),
- test_json_parse_does_force_its_host_import (ensures json_parse DOES emit its host import).
This prevents false positives about JSON imports when only typed accessors are used.

Learnt from: aallan
Repo: aallan/vera PR: 511
File: tests/test_codegen.py:12473-12510
Timestamp: 2026-04-23T13:51:17.615Z
Learning: Repo aallan/vera — tests/test_codegen.py and tests/test_browser.py: When verifying json_as_int at -2^63 (JNumber(-9223372036854775808.0)), use two indirect probes to avoid the known int_to_string(INT64_MIN) negation-overflow bug (`#475`): (a) assert bool_to_string(Int.0 < 0) == "true"; and (b) assert int_to_string(Int.0 + 1) == "-9223372036854775807". Include an inline "#475" comment. Implemented in commit 3cf5a13 (PR `#511`).

Learnt from: aallan
Repo: aallan/vera PR: 0
File: :0-0
Timestamp: 2026-03-29T10:15:51.538Z
Learning: In aallan/vera, general housekeeping PRs (CI hardening, documentation, tooling) may acceptably bundle incidental code fixes (e.g. adding `vera --version` CLI support, HTTP Content-Type headers, urlopen timeouts) without requiring them to be split into separate PRs. Do not flag such bundling as a scope concern in this repository.

Learnt from: aallan
Repo: aallan/vera PR: 357
File: vera/wasm/calls.py:353-358
Timestamp: 2026-03-26T09:40:40.644Z
Learning: Repo aallan/vera: The canonical allocator-needed flag is self.needs_alloc (no underscore). For host-import call sites (e.g., Http/Json/Markdown/Regex/Map/Set/Decimal) in vera/wasm/calls.py and related mixins, set self.needs_alloc = True to ensure allocator wiring is kept.

Learnt from: aallan
Repo: aallan/vera PR: 526
File: scripts/build_site.py:0-0
Timestamp: 2026-04-24T20:41:50.975Z
Learning: In aallan/vera `scripts/build_site.py` and `docs/index.html`, the VeraBench section currently shows results from **v0.0.7** of the benchmark, at which point the suite had **50 problems**. The current VeraBench suite has **60 problems**. When reviewing any prose referencing the VeraBench problem count, check which benchmark release is cited (`Results from VeraBench vX.Y.Z`): the problem count must match the suite size at *that release*, not the current size. As of the fix in PR `#526` commit cac8c3c, all four occurrences (two in `docs/index.html`, two in the `build_index_md` f-string) correctly say 50.

Learnt from: aallan
Repo: aallan/vera PR: 510
File: vera/README.md:94-94
Timestamp: 2026-04-23T10:21:57.603Z
Learning: In `vera/README.md`, the `wasm/` package line count appears in two places: the module table (e.g. line 94) and the "Code Generation" section prose (e.g. line 486, phrase "wasm/ (N lines across 18 modules)"). Both must be kept in sync whenever `wasm/` grows. aallan noted in PR `#510` that ideally a validation script would derive both from a single source (similar to `check_conformance.py`). When reviewing PRs that add files to `vera/wasm/`, flag if only one of the two occurrences is updated.

Learnt from: aallan
Repo: aallan/vera PR: 473
File: tests/test_codegen.py:1657-1658
Timestamp: 2026-04-16T10:42:15.189Z
Learning: In aallan/vera (tests/test_codegen.py), $heap_ptr and $gc_heap_start are intentionally initialised to the same value: the start of the heap equals the end of the GC regions (data_end + gc_stack_size + gc_worklist_size). Tests like TestHeapAllocation.test_heap_ptr_starts_after_strings and test_heap_ptr_zero_without_strings should validate the computed constant rather than try to distinguish between $heap_ptr and $gc_heap_start.

Learnt from: aallan
Repo: aallan/vera PR: 332
File: vera/codegen/api.py:826-836
Timestamp: 2026-03-23T15:53:25.789Z
Learning: In vera/codegen/api.py, the `_map_store` Python dict (used to back Map<K,V> host imports) stores raw WASM heap addresses as Python integers. The Vera GC (bump allocator + mark-sweep) operates exclusively on WASM linear memory and the WASM stack — it has no visibility into Python-side structures. Therefore, ADT pointers stored as i32 values inside `_map_store` are NOT at risk of premature GC collection by the Vera GC. Raising concerns about dangling pointers due to GC collecting map-stored ADTs is incorrect for this architecture.

Learnt from: aallan
Repo: aallan/vera PR: 536
File: tests/test_codegen_closures.py:610-679
Timestamp: 2026-04-27T12:58:29.788Z
Learning: In tests/test_codegen_closures.py::TestNestedClosures (PR `#536`, commit f382a8a), `test_two_top_level_fns_with_nested_closures` is the dedicated cross-function shared-state regression test for issue `#514`. It defines two separate `public fn`s, each with one outer + one inner closure (4 lifted closures total), and asserts: `len(re.findall(r'\(func \$anon_\d+', wat)) >= 4` (count), `len(set(anon_funcs)) == len(anon_funcs)` (no duplicate IDs), `(table N funcref)` size >= 4 (all closures invokable), plus `_run` of both functions (WASM validation). This test catches regressions where `_next_closure_id` or `_closure_sigs` are re-initialised per top-level function instead of shared at module level — a failure mode that the single-fn `test_nested_closure_emits_anon_for_inner` test would miss entirely (all existing tests would still pass while the module fails at link time with duplicate `$anon_0` identifiers).

Learnt from: aallan
Repo: aallan/vera PR: 486
File: vera/wasm/calls_arrays.py:832-841
Timestamp: 2026-04-17T14:46:14.693Z
Learning: In `vera/wasm/calls.py` (post-PR `#486`), `_infer_concat_elem_type` was extended to detect `array_map(arr, closure)` call nodes and delegate to `_infer_closure_return_vera_type(closure_arg)` to return the *output* element type B rather than the input element type A. This fix ensures that `array_filter(array_map(...), pred)`, `array_concat(array_map(...), ...)`, `array_slice(array_map(...), ...)` and similar compositions resolve the correct element size, load/store ops, and closure signatures. Do NOT flag `_infer_concat_elem_type` as missing `array_map` handling — it is already present.

Three fixes from the post-rebase CodeRabbit pass on #544: * HISTORY.md footer: "122 tagged releases" -> "123 tagged releases". CodeRabbit reported it as "121 -> 123" but the prior fix in #542 had already taken it to 122; bumping it the rest of the way for this release. * tests/test_runtime_traps.py: add captured.err == "" assertion in test_json_mode_does_not_tee_to_stdout. Mirrors the JSON-mode invariant that TestStdoutOnTrap522 already pins on its four JSON tests -- in JSON mode the entire output is the envelope on stdout and stderr must stay empty so downstream consumers parsing our output see exactly one machine-readable document. * uv.lock: regenerated for the 0.0.122 -> 0.0.123 version bump. Lint job was failing on uv lock --check. Three other CodeRabbit findings verified as either incorrect or out of scope (replying inline on the PR): * HISTORY.md "collapse multi-line entries" -- the v0.0.122 / v0.0.123 rows are already single rows; CodeRabbit was reading wrapped display as multi-line content. * ROADMAP.md "33 examples -> 32 examples" -- ls examples/*.vera reports 33; check_doc_counts.py and the live count agree. CodeRabbit's expected count is wrong. * tests/test_codegen.py opcode assertions -- already strengthened in #542 (commit 75682a1 "PR #542 review fixes"). Markers stay as discoverability anchors; opcode patterns are the contract. * vera/cli.py isatty() gating on the trailing newline -- pre-existing behaviour from before #543, unchanged by this PR. Out of scope: changing the trailing-newline policy is a separate concern with its own backwards-compat surface. * vera/codegen/assembly.py header-validity check -- same finding as the one already discussed and resolved on #542. No separate HEADER_TAG invariant exists in this codebase (header is (size << 1) | mark with no marker bit beyond mark itself). The obj_ptr + obj_size <= heap_ptr bound check is the strongest validation possible without introducing a new tag scheme. Co-Authored-By: Claude <noreply@anthropic.invalid>

coderabbitai Bot reviewed Apr 27, 2026

View reviewed changes

Comment thread ROADMAP.md Outdated

Comment thread tests/test_codegen.py Outdated

aallan mentioned this pull request Apr 27, 2026

v0.0.123: tee IO.print writes live to sys.stdout (#543) #544

Merged

7 tasks

aallan merged commit d101fe7 into main Apr 27, 2026
20 checks passed

aallan deleted the v0.0.122-gc-collect-bound-check branch April 27, 2026 16:04

This was referenced May 5, 2026

v0.0.131 — GC infrastructure batch (closes #487 and #348) #571

Merged

v0.0.132 — Opaque-handle GC-rooting hygiene (closes #347 and #490; #346 deferred to #573) #572

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.0.122: bound conservative GC scan against $heap_ptr (#515)#542

v0.0.122: bound conservative GC scan against $heap_ptr (#515)#542
aallan merged 3 commits into
mainfrom
v0.0.122-gc-collect-bound-check

aallan commented Apr 27, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Apr 27, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

❌ Failed checks (1 warning)

Uh oh!

codecov Bot commented Apr 27, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

aallan commented Apr 27, 2026

Uh oh!

coderabbitai Bot commented Apr 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

aallan commented Apr 27, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Root cause

Fix

Tests

Verification

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

❌ Failed checks (1 warning)

Uh oh!

codecov Bot commented Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

aallan commented Apr 27, 2026

Uh oh!

coderabbitai Bot commented Apr 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

aallan commented Apr 27, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 27, 2026 •

edited

Loading

codecov Bot commented Apr 27, 2026 •

edited

Loading