Skip to content

Non-contiguous capture of outer Int slots in closure miscompiles (trap or silent wrong result) #615

@aallan

Description

@aallan

Vera version: vera 0.0.138 (current main, commit ec46cf8)
Symptom class: WASM compile-time trap (2a) or silent miscomputation (2b) — vera check is silent in both cases.
Found by: Agent attempting to write terminal Tetris on Vera 0.0.138.

Summary

If a closure body references a captured outer @Int.k while leaving some @Int.j (with 0 < j < k, where index 0 is the closure's own parameter) unreferenced, the compiler emits malformed WASM. Two distinct manifestations show up depending on the body shape:

  • 2a: if the skip happens in a "tail-shaped" expression (no further let pushes after the offending reference), the WASM module fails to compile and vera run traps at module load.
  • 2b: if the skip happens before further let @Int = ... bindings push more slots on top, the module compiles, runs, and silently produces the wrong value.

Hypothesised same root cause: an off-by-one in the closure-lifter's env-struct layout when only a non-contiguous subset of outer same-typed slots is referenced. Filed as one issue with two repros.

Bug 2a — skip causes WASM compile-time trap

Minimum reproducer

public fn main(@Unit -> @Unit)
  requires(true) ensures(true) effects(<IO>)
{
  let @Array<Int> = [99, 99, 99, 99];
  let @Array<Int> = [10, 20, 30, 40];
  let @Int = 3;
  let @Int = 0;

  -- Closure references @Int.0 (its own param) and @Int.2 (outer),
  -- but NOT @Int.1 (the other outer).
  let @Array<Int> = array_map(array_range(0, 4), fn(@Int -> @Int) effects(pure) {
    let @Nat = match int_to_nat(@Int.0) {
      Some(@Nat) -> @Nat.0,
      None -> 0
    };
    @Array<Int>.0[@Nat.0] + @Int.2 + @Int.0
  });
  IO.print("result: [\(@Array<Int>.0[0]), \(@Array<Int>.0[1]), \(@Array<Int>.0[2]), \(@Array<Int>.0[3])]\n")
}

Observed

$ vera check t.vera
OK: t.vera
$ vera run t.vera
Error: Unhandled WASM trap: failed to compile: wasm[0]::function[3]::main

Caused by:
    0: WebAssembly translation error
    1: Invalid input WebAssembly code at offset 1385: unknown table 0: table index out of bounds

Confirmed reproducing on current main (commit ec46cf8).

Expected

result: [13, 24, 35, 46] (i.e. arr[i] + 3 + i).

Confirming the trigger

If the same closure body is changed to @Array<Int>.0[@Nat.0] + @Int.2 + @Int.1 + @Int.0 — touching every outer Int slot — the trap goes away and the program runs correctly. Touching @Int.1 with a no-op (e.g. + (@Int.1 * 0)) also makes it compile and run. This is consistent with the lifter only looking at the set of referenced indices when deciding what to serialise, rather than the full prefix {0, 1, …, k} actually in scope.

Bug 2b — same skip pattern, silent wrong result

Minimum reproducer (essence of fits from a Tetris collision check)

private fn empty_board(@Unit -> @Array<Int>)
  requires(true) ensures(true) effects(pure)
{
  array_map(array_range(0, 200), fn(@Int -> @Int) effects(pure) { 0 })
}

private fn cell_at(@Array<Int>, @Int, @Int -> @Int)
  requires(true) ensures(true) effects(pure)
{
  if @Int.1 < 0 then { 1 } else {
    if @Int.1 >= 10 then { 1 } else {
      if @Int.0 >= 20 then { 1 } else {
        if @Int.0 < 0 then { 0 } else {
          let @Int = @Int.0 * 10 + @Int.1;
          match int_to_nat(@Int.0) {
            Some(@Nat) -> @Array<Int>.0[@Nat.0],
            None -> 0
          }
        }
      }
    }
  }
}

private fn fits(@Array<Int>, @Int, @Int -> @Bool)
  -- args: board, px, py
  requires(true) ensures(true) effects(pure)
{
  -- @Array<Int>.0=board, @Int.0=py, @Int.1=px
  let @Array<Int> = [0, 1, 1, 1, 2, 1, 3, 1];   -- I-piece offsets
  array_all(array_range(0, 4), fn(@Int -> @Bool) effects(pure) {
    -- closure: @Int.0=cell idx (param), @Int.1=py, @Int.2=px
    let @Nat = match int_to_nat(@Int.0 * 2) {
      Some(@Nat) -> @Nat.0, None -> 0
    };
    let @Nat = match int_to_nat(@Int.0 * 2 + 1) {
      Some(@Nat) -> @Nat.0, None -> 0
    };
    -- Skips @Int.1 (py) — only references @Int.0 and @Int.2.
    let @Int = @Array<Int>.0[@Nat.1] + @Int.2;
    -- After the let, slot indices renumber; the skipped outer is now reachable
    -- but the damage is done.
    let @Int = @Array<Int>.0[@Nat.0] + @Int.2;
    cell_at(@Array<Int>.1, @Int.1, @Int.0) == 0
  })
}

public fn main(@Unit -> @Unit)
  requires(true) ensures(true) effects(<IO>)
{
  let @Array<Int> = empty_board(());
  let @Bool = fits(@Array<Int>.0, 3, 0);
  IO.print("fits = \(@Bool.0)\n")
}

Observed

$ vera check t.vera
OK: t.vera
$ vera run t.vera
fits = false

Confirmed reproducing on current main (commit ec46cf8).

Expected

fits = true — an I-tetromino at (px=3, py=0) on an empty board fits trivially.

Why this is hypothesised to be the same bug

The closure body's first non-let-bound expression that touches outer Ints is @Array<Int>.0[@Nat.1] + @Int.2, which references @Int.2 (px) but not @Int.1 (py). That's the same skip pattern as 2a. The only structural difference is that the closure does more work after the skip — additional let @Int = ... bindings push fresh Int slots, after which the body references @Int.0 and @Int.1 (the let-bound abs_x / abs_y).

The hypothesis: the extra let pushes happen to keep the malformed offsets within the bounds of some table after the lift pass renumbers, so WASM compilation succeeds — but the captured value being read is the wrong one, hence the silent wrong answer.

Workaround

Hoist the closure body into a top-level helper that takes all outer values as ordinary arguments, so the closure references its captured slots strictly contiguously (@Int.0, @Int.1, …, @Int.k with no gaps), and the helper's own slot-arithmetic happens in a fresh top-level frame:

private fn cell_fits(@Array<Int>, @Array<Int>, @Int, @Int, @Int -> @Bool)
  -- args: board, offsets, i, px, py
  requires(true) ensures(true) effects(pure)
{ ... }

private fn fits(@Array<Int>, @Int, @Int -> @Bool)
  ...
{
  let @Array<Int> = [0, 1, 1, 1, 2, 1, 3, 1];
  array_all(array_range(0, 4), fn(@Int -> @Bool) effects(pure) {
    cell_fits(@Array<Int>.1, @Array<Int>.0, @Int.0, @Int.2, @Int.1)
  })
}

This pattern was applied to every closure in the Tetris implementation that captured more than one outer Int. Mechanical and ugly; should not be necessary.

Diagnostic suggestion

When the closure-lifting pass computes the env-struct layout, assert that the set of referenced outer slots for each type forms a contiguous prefix {0, 1, …, n-1} of the captured-slot range, and emit a clear compile-time error otherwise. That would convert both 2a and 2b into a "slot k captured but slot j (j<k) is unreferenced — currently miscompiled, please reference all captured slots up to k" message.

A regression test pulling the Tetris collision-check shape (array_all over array_range(0, 4), closure body uses two outer Ints non-contiguously) would prevent the silent-miscompute manifestation from coming back.

Hypothesised umbrella

Bug A (#614) (closure capturing ADT and passing it to a function call) is suspected to share a root cause in the closure-lifting pass, since both bugs share the symptom signature unknown table 0: table index out of bounds at the closure boundary. Worth investigating whether a single fix in the lifter's env-struct serialisation closes both — or whether they're independent and both happen to surface as malformed call_indirect table indices.

Origin

Discovered while writing terminal Tetris in Vera; hit consistently on collision-check shapes that capture more than one outer Int non-contiguously inside array_all closures.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions