SKILL.md: tell sandbox-running agents they can install Vera#582
Conversation
Observed in the wild on 2026-05-06: a Claude.ai sandboxed instance asked to write Conway's Game of Life in Vera read SKILL.md's Installation section, concluded "Vera isn't available in this sandbox", and fell back to "I'll write a correct implementation that the user can run locally" — without trying the install steps. When called out it managed to install Vera fine. The Installation section currently describes the standard `git clone + pip install -e .` flow that DOES work in any sandbox with Python + git + pip + outbound network (i.e. basically all of them). The failure mode is the agent defaulting to "I can't run this" before testing, not the instructions being wrong. Add an explicit note as a blockquote at the end of the Installation section directly addressing this. Two parts: 1. "You CAN install in a sandbox" — direct rebuttal to the default-to-no-toolchain reflex. Lists the typical sandbox capabilities and explicitly says "try it before concluding Vera isn't available." 2. "Don't `pip install vera` from PyPI" — that package name resolves to an unrelated ERAV citizen-science library (verified via `curl pypi.org/pypi/vera/json`). Always install from the GitHub source clone. CHANGELOG entry under [Unreleased]; not bundling into v0.0.134 via retag because we just did that yesterday and another retag within hours for a one-paragraph fix is the kind of churn worth avoiding. Auto-regenerated docs/SKILL.md + docs/llms-full.txt + line- shifted allowlist fix in scripts/check_skill_examples.py. Co-Authored-By: Claude <noreply@anthropic.invalid>
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #582 +/- ##
=======================================
Coverage 90.98% 90.98%
=======================================
Files 59 59
Lines 22884 22884
Branches 259 259
=======================================
Hits 20821 20821
Misses 2056 2056
Partials 7 7
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Pro Run ID: ⛔ Files ignored due to path filters (2)
📒 Files selected for processing (2)
📝 WalkthroughWalkthroughAdds sandbox installation guidance and runtime-trap notes to documentation (SKILL.md, CHANGELOG.md, KNOWN_ISSUES.md). Updates ChangesDocumentation and Checker Alignment
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested labels
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
…Conway debugging session The sandboxed Claude that prompted PR #582's install-affordance note then went on to write Conway's Game of Life and hit a real compiler bug — type aliases over Array<T> compile cleanly through 'vera check' but break at WASM codegen. The agent isolated the root cause through stepwise narrowing. Two confirmed failure modes (both reproduced locally): * Alias as a parameter type ('fn foo(@Row -> ...)' where 'type Row = Array<Bool>') emits structurally invalid WAT; 'vera run' fails at WASM instantiation with 'type mismatch: expected a type but nothing on stack'. * Alias in a let binding ('let @Row = [...]') silently triggers an E602 skip of the enclosing function. The function disappears from the compiled output entirely; 'vera run' then reports 'no exported functions' or 'unknown func' with no link back to the compile-time warning. Filed as #583 with both reproducers, root-cause sketch, and acceptance criteria. Doc changes in this commit: * KNOWN_ISSUES.md — new Bugs row for #583 with the workaround (use the underlying type directly). * SKILL.md type-aliases section — new blockquote inline with the alias examples flagging the pair-type-alias footgun. * SKILL.md Known Bugs and Workarounds table — new row matching the KNOWN_ISSUES entry, in the format agents already scan. * docs/SKILL.md and docs/llms-full.txt regenerated. * scripts/check_skill_examples.py allowlist auto-fixed for the SKILL.md line shifts. Note for the underlying fix: probably one or both of (a) the type-name-to-WASM-type lookup in vera/wasm/helpers.py needs to follow alias resolution, or (b) the slot-env type inference in vera/wasm/inference.py needs to dereference aliases before checking pair-typed-ness. Out of scope for this PR (which is doc-only); tracked in #583. Co-Authored-By: Claude <noreply@anthropic.invalid>
Second bug from the same Conway debugging session that surfaced #583. The agent's writeup arrived as a structured markdown report covering both bugs; bug 2 (this one) had a particularly clean narrowing matrix that made root-cause isolation a 30-second job rather than a 30-minute one. The bug shape: a block-statement sequence like 'helper(()); next_expr' where 'helper' is a user-defined function declared with @Unit return emits structurally invalid WAT. Type-checks fine, then 'vera run' fails at WASM instantiation with 'type mismatch: expected a type but nothing on stack'. Narrowing matrix (verified locally): user helper -> Unit, mid-block FAILS three IO.print mid-block OK user helper -> Int, mid-block OK user helper -> Unit, tail position OK user helper -> Unit, mid (no leading) FAILS So the trigger is specifically: user-defined function returning @Unit, called in a non-tail statement position within a block. Built-in IO.* operations work; non-Unit returns work; Unit in tail position works. Root cause located: vera/wasm/context.py::_is_void_expr at line 456 determines whether an expression produces a stack value. It correctly handles QualifiedCall (IO.print etc.), UnitLit, FnCall to effect ops, AssertExpr/AssumeExpr, and compound expressions recursively. It does NOT handle FnCall to a user-defined function whose declared return type is @Unit. Falls through to 'return False', caller in translate_block emits a 'drop', WASM validation fails because the call left nothing on the stack. Fix is a one-clause omission, not a deep design issue. The function-return-type registry (self._fn_ret_types, populated in vera/codegen/functions.py:76-80) currently filters OUT Unit-returning functions ('if ret_wt and ret_wt != "unsupported"'); the registry needs to either include them explicitly or be supplemented with a separate _unit_returning_fns: set[str]. ~5-10 lines. Out of scope for this PR (which is doc-only); detailed fix sketch in #584. Doc changes in this commit: * KNOWN_ISSUES.md — new Bugs row for #584 with the workaround (keep user @Unit-returning calls in tail position). * SKILL.md Known Bugs and Workarounds table — new row in the format agents already scan, with concrete pointer to the Iteration with IO section as the place agents will hit it. * SKILL.md Iteration with IO section — inline footgun note right after the FizzBuzz example pointing at #584. The example's IO.print mid-block works because it's a qualified call to a built-in effect op; the agent will hit the bug the moment they factor that into a user helper. * docs/SKILL.md and docs/llms-full.txt regenerated. Why no test caught this: every Vera example uses IO.* for IO and _is_void_expr covers QualifiedCall correctly. The shape that breaks (user helper wrapping IO.print, called mid-block) is a pattern that appears the moment programs grow past trivial size, but the test suite favors flat structures or terminal-position calls. Conway's render-then-sleep-then- recurse loop was the first test of this shape in the codebase. Co-Authored-By: Claude <noreply@anthropic.invalid>
Summary
Observed in the wild today: a Claude.ai sandboxed instance asked to write Conway's Game of Life in Vera read
SKILL.md's Installation section, concluded "Since Vera isn't available in this sandbox, I'll focus on writing a correct implementation that the user can run locally on their machine", and fell back to writing untested code. When called out it installed Vera fine.The instructions weren't wrong — the standard
git clone + pip install -e .flow works in any sandbox with Python + git + pip + outbound network (i.e. nearly all of them). The failure was the agent defaulting to "I can't run this" before trying. Direct rebuttal added as a blockquote at the end of the Installation section:The PyPI footgun is real: I verified
curl pypi.org/pypi/vera/jsonresolves to "vera 1.0.0: Reference implementation of the ERAV data model for citizen science" — completely unrelated to us.Test plan
python scripts/check_skill_examples.py— 55 Vera blocks parse, 62 allowlisted, 0 failures (line-shifted allowlist auto-fixed viafix_allowlists.py --fix)python scripts/check_doc_counts.py— counts consistentpython scripts/check_version_sync.py— 5 files consistentpython scripts/build_site.py && python scripts/check_site_assets.py—docs/SKILL.md+docs/llms-full.txtregenerated, site assets up-to-dateNotes for reviewers
Doc-only. No code changes, no version bump. CHANGELOG entry under
[Unreleased]rather than bundled into v0.0.134 via another retag — we just did one of those yesterday for the broader consistency sweep, and another retag within hours for a one-paragraph fix is the kind of churn the project's release-workflow memory now flags as something to avoid.Audience consideration: CLAUDE.md is for Claude Code (already-local dev env) and doesn't need updating. AGENTS.md "agents writing Vera code" section already points at SKILL.md for installation, so the cascade works.
🤖 Generated with Claude Code
Summary by CodeRabbit
Documentation
Known Issues
Chores