You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
After #588 fixed the captured-Array<T>-indexing-in-closure bug, the BUG_REPORT.md repro_min.vera and repro_nested.vera minimum reproducers run cleanly. However, the full Conway's Game of Life implementation (12×30 grid, 200 generations, recursive run_loop with IO.print and IO.sleep between frames) still produces silent string corruption from generation 1 onwards.
The bug report itself acknowledged this gap:
However, scaling this up to a real Conway's Game of Life implementation (12×30 grid, eight neighbour reads through a wrap-aware cell_at, array_mapi-of-array_mapi for the next-generation step, recursive run_loop with IO.print and IO.sleep between frames) still produces the trap or — more often — silent corruption where render_cell is bypassed and raw Bool bytes (�/�) leak into IO.print, which then chokes with UnicodeDecodeError. The minimum repros do not reproduce that variant. Either there is an additional trigger I haven't isolated or there are multiple closely-related codegen bugs.
#588's fix removes the UnicodeDecodeError (the v0.0.136 errors="replace" defensive layer surfaces invalid bytes as U+FFFD characters in output rather than crashing Python — see #589), but the underlying memory-corruption trigger remains.
ANSI escape sequences (\u{1B}[2J\u{1B}[H) appear partially stripped — the escape character (0x1B) is replaced/lost while the rest ([2J[H) leaks into stdout as visible text.
Bisection so far
I built a series of progressively-larger Life subsets and confirmed each works post-#588:
step alone (one array_mapi-of-array_mapi next-grid step over a 3×3 grid, no rendering, no recursion) — works.
render_grid (Array<Array> → String, no captures of grids) — works.
step + render_grid + one print step at 3×3 — works.
step + render_grid + one print step at 12×30 with full Life infrastructure (cell_at, count_neighbors, next_cell, make_initial) — works.
Mini Life with recursive run_loop at 3×3 over 2 generations — works.
Full Life at 12×30 with recursive run_loop over 200 generations — fails from generation 1+.
The bisection narrows the trigger to the combination of: 12×30 grid scale + recursive run_loop with allocating next_grid argument across the recursive call boundary. Smaller scales (~5×5) or non-recursive sweeps work cleanly.
Plausible hypothesis
The corruption pattern (full-block characters intact + U+FFFD between them + missing escape bytes) suggests per-byte string corruption inside the rendered output buffer rather than wholesale heap reuse. Plausible mechanisms:
GC reclamation of in-flight strings during string_join / string_concat while the destination buffer is still being filled. The rendered grid is built up via nested array_map over Strings; intermediate Array<String> values held only on the WASM operand stack across a string_join call could be swept if the operand stack isn't fully shadow-stack-rooted.
Shadow-stack overflow at scale — 12×30 = 360 inner-closure invocations × 8 neighbour calls × 5 contract violations or so per generation. If shadow-stack-pushes for in-flight String results aren't paired with pops, the stack could overflow and wrap, corrupting later rooting.
Closure-env corruption across allocating recursive call — run_loop(next_grid(@Array<Array<Bool>>.0), …) allocates a fresh grid as the first arg; if the closures inside next_grid capture pointers that get invalidated by the very allocation that's producing the new grid, captures could see freed memory.
Hypothesis 1 (GC during string_join) is the most likely candidate based on the corruption shape (per-byte rather than per-pointer).
Reproducer
Attached life_full_program.vera from the original BUG_REPORT.md. Bisection scripts reside in /tmp from the #588 investigation but are not preserved. To reproduce:
vera run life_full_program.vera # interrupt after a few seconds# Generation 0 renders cleanly; from Generation 1+, output is corrupt.
Acceptance
Full Life program runs to completion across 200 generations with a clean rendered grid at every step.
A new conformance test that reliably reproduces the residual trigger at the smallest possible scale.
Any GC-rooting / shadow-stack / closure-capture invariant that needed strengthening is documented in the codegen comments.
Related
#588 — captured-Array<T>-indexing-in-closure (closed in v0.0.137).
#589 — host-runtime UTF-8 hygiene (closed in v0.0.136). Provides the errors="replace" defensive layer that prevents the residual corruption from escaping as a Python traceback.
#570 — iterative-builder shadow-stack overflow (closed in v0.0.133). Different shape but adjacent to hypothesis 2.
Summary
After #588 fixed the captured-
Array<T>-indexing-in-closure bug, the BUG_REPORT.mdrepro_min.veraandrepro_nested.veraminimum reproducers run cleanly. However, the full Conway's Game of Life implementation (12×30 grid, 200 generations, recursiverun_loopwithIO.printandIO.sleepbetween frames) still produces silent string corruption from generation 1 onwards.The bug report itself acknowledged this gap:
#588's fix removes the
UnicodeDecodeError(the v0.0.136errors="replace"defensive layer surfaces invalid bytes as U+FFFD characters in output rather than crashing Python — see #589), but the underlying memory-corruption trigger remains.Current symptom (post-#588 fix)
Running
vera run life_full_program.verawith the BUG_REPORT.md attachment:\u{1B}[2J\u{1B}[H) appear partially stripped — the escape character (0x1B) is replaced/lost while the rest ([2J[H) leaks into stdout as visible text.Bisection so far
I built a series of progressively-larger Life subsets and confirmed each works post-#588:
stepalone (onearray_mapi-of-array_mapinext-grid step over a 3×3 grid, no rendering, no recursion) — works.render_grid(Array<Array> → String, no captures of grids) — works.step+render_grid+ one print step at 3×3 — works.step+render_grid+ one print step at 12×30 with full Life infrastructure (cell_at,count_neighbors,next_cell,make_initial) — works.run_loopat 3×3 over 2 generations — works.run_loopover 200 generations — fails from generation 1+.The bisection narrows the trigger to the combination of: 12×30 grid scale + recursive
run_loopwith allocatingnext_gridargument across the recursive call boundary. Smaller scales (~5×5) or non-recursive sweeps work cleanly.Plausible hypothesis
The corruption pattern (full-block characters intact + U+FFFD between them + missing escape bytes) suggests per-byte string corruption inside the rendered output buffer rather than wholesale heap reuse. Plausible mechanisms:
GC reclamation of in-flight strings during
string_join/string_concatwhile the destination buffer is still being filled. The rendered grid is built up via nestedarray_mapover Strings; intermediateArray<String>values held only on the WASM operand stack across astring_joincall could be swept if the operand stack isn't fully shadow-stack-rooted.Shadow-stack overflow at scale — 12×30 = 360 inner-closure invocations × 8 neighbour calls × 5 contract violations or so per generation. If shadow-stack-pushes for in-flight String results aren't paired with pops, the stack could overflow and wrap, corrupting later rooting.
Closure-env corruption across allocating recursive call —
run_loop(next_grid(@Array<Array<Bool>>.0), …)allocates a fresh grid as the first arg; if the closures insidenext_gridcapture pointers that get invalidated by the very allocation that's producing the new grid, captures could see freed memory.Hypothesis 1 (GC during
string_join) is the most likely candidate based on the corruption shape (per-byte rather than per-pointer).Reproducer
Attached
life_full_program.verafrom the original BUG_REPORT.md. Bisection scripts reside in/tmpfrom the #588 investigation but are not preserved. To reproduce:Acceptance
Related
Array<T>-indexing-in-closure (closed in v0.0.137).errors="replace"defensive layer that prevents the residual corruption from escaping as a Python traceback.