SKILL.md: tell sandbox-running agents they can install Vera by aallan · Pull Request #582 · aallan/vera

aallan · 2026-05-06T10:59:51Z

Summary

Observed in the wild today: a Claude.ai sandboxed instance asked to write Conway's Game of Life in Vera read SKILL.md's Installation section, concluded "Since Vera isn't available in this sandbox, I'll focus on writing a correct implementation that the user can run locally on their machine", and fell back to writing untested code. When called out it installed Vera fine.

The instructions weren't wrong — the standard git clone + pip install -e . flow works in any sandbox with Python + git + pip + outbound network (i.e. nearly all of them). The failure was the agent defaulting to "I can't run this" before trying. Direct rebuttal added as a blockquote at the end of the Installation section:

> **For agents running in a sandbox** (Claude.ai, Code Interpreter,
> container-based execution environments, etc.): the steps above
> work. Sandboxes typically have Python, `git`, `pip`, and outbound
> network access — that's all Vera needs. **Run the install commands
> and verify with `vera run examples/hello_world.vera` before
> concluding that Vera "isn't available." Don't assume the sandbox
> lacks the toolchain — try it.**
>
> One caveat: `pip install vera` (no source) installs a *different*
> package from PyPI (an ERAV citizen-science library — unrelated to
> this Vera). Always install from the GitHub source clone shown above.

The PyPI footgun is real: I verified curl pypi.org/pypi/vera/json resolves to "vera 1.0.0: Reference implementation of the ERAV data model for citizen science" — completely unrelated to us.

Test plan

python scripts/check_skill_examples.py — 55 Vera blocks parse, 62 allowlisted, 0 failures (line-shifted allowlist auto-fixed via fix_allowlists.py --fix)
python scripts/check_doc_counts.py — counts consistent
python scripts/check_version_sync.py — 5 files consistent
python scripts/build_site.py && python scripts/check_site_assets.py — docs/SKILL.md + docs/llms-full.txt regenerated, site assets up-to-date
All pre-commit hooks pass

Notes for reviewers

Doc-only. No code changes, no version bump. CHANGELOG entry under [Unreleased] rather than bundled into v0.0.134 via another retag — we just did one of those yesterday for the broader consistency sweep, and another retag within hours for a one-paragraph fix is the kind of churn the project's release-workflow memory now flags as something to avoid.

Audience consideration: CLAUDE.md is for Claude Code (already-local dev env) and doesn't need updating. AGENTS.md "agents writing Vera code" section already points at SKILL.md for installation, so the cascade works.

🤖 Generated with Claude Code

Summary by CodeRabbit

Documentation
- Added sandbox installation guidance and GitHub-source install/verification notes; warned that pip install vera from PyPI is a different package
- Expanded guidance on type-aliases over Array, failure modes, workarounds, and a codegen footgun note about non-tail-position Unit-returning calls
- Improved runtime-trap diagnostics with labelled kinds, per-kind fixes and source backtraces
Known Issues
- Added entries for Array codegen break and non-tail Unit-returning call WAT generation
Chores
- Updated examples allowlist to align documentation and diagnostics changes

Observed in the wild on 2026-05-06: a Claude.ai sandboxed instance asked to write Conway's Game of Life in Vera read SKILL.md's Installation section, concluded "Vera isn't available in this sandbox", and fell back to "I'll write a correct implementation that the user can run locally" — without trying the install steps. When called out it managed to install Vera fine. The Installation section currently describes the standard `git clone + pip install -e .` flow that DOES work in any sandbox with Python + git + pip + outbound network (i.e. basically all of them). The failure mode is the agent defaulting to "I can't run this" before testing, not the instructions being wrong. Add an explicit note as a blockquote at the end of the Installation section directly addressing this. Two parts: 1. "You CAN install in a sandbox" — direct rebuttal to the default-to-no-toolchain reflex. Lists the typical sandbox capabilities and explicitly says "try it before concluding Vera isn't available." 2. "Don't `pip install vera` from PyPI" — that package name resolves to an unrelated ERAV citizen-science library (verified via `curl pypi.org/pypi/vera/json`). Always install from the GitHub source clone. CHANGELOG entry under [Unreleased]; not bundling into v0.0.134 via retag because we just did that yesterday and another retag within hours for a one-paragraph fix is the kind of churn worth avoiding. Auto-regenerated docs/SKILL.md + docs/llms-full.txt + line- shifted allowlist fix in scripts/check_skill_examples.py. Co-Authored-By: Claude <noreply@anthropic.invalid>

codecov · 2026-05-06T11:01:50Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 90.98%. Comparing base (92a2e69) to head (773f364).

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #582   +/-   ##
=======================================
  Coverage   90.98%   90.98%           
=======================================
  Files          59       59           
  Lines       22884    22884           
  Branches      259      259           
=======================================
  Hits        20821    20821           
  Misses       2056     2056           
  Partials        7        7

Flag	Coverage Δ
javascript	`57.36% <ø> (ø)`
python	`94.82% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

coderabbitai · 2026-05-06T11:05:12Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 5ca93a8e-5a8c-4c21-9adb-adda22322241

📥 Commits

Reviewing files that changed from the base of the PR and between 93b72ec and 773f364.

⛔ Files ignored due to path filters (2)

docs/SKILL.md is excluded by !docs/**
docs/llms-full.txt is excluded by !docs/**

📒 Files selected for processing (2)

KNOWN_ISSUES.md
SKILL.md

📝 Walkthrough

Walkthrough

Adds sandbox installation guidance and runtime-trap notes to documentation (SKILL.md, CHANGELOG.md, KNOWN_ISSUES.md). Updates scripts/check_skill_examples.py ALLOWLIST to match SKILL.md line shifts and new fragments; no code/APIs changed.

Changes

Documentation and Checker Alignment

Layer / File(s)	Summary
Documentation Content `SKILL.md`	Adds sandbox installation guidance (git clone + `pip install -e .`), warns that `pip install vera` on PyPI is a different package, adds notes about type aliases over `Array<T>` breaking WASM codegen (two failure modes), mid-block IO.print footgun, and enhanced runtime-trap diagnostics (kind + Fix + source backtrace).
Known Issues `KNOWN_ISSUES.md`	Inserts bug entries for type-alias over `Array<T>` breaking WASM codegen (issue `#583`) and `@Unit-returning` non-tail calls emitting invalid WAT (issue `#584`).
Changelog `CHANGELOG.md`	Adds a Documentation subsection under Unreleased documenting the sandbox installation guidance and the PyPI caveat.
Checker Configuration `scripts/check_skill_examples.py`	Reworks `ALLOWLIST` FRAGMENT coordinates (many entries moved/added), adds a new FRAGMENT for Option/Result combinator bare calls, and moves the `MISMATCH` placeholder from line 127 to 131 to align with SKILL.md edits.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

aallan/vera#581: Overlapping SKILL.md edits and corresponding scripts/check_skill_examples.py ALLOWLIST updates.

Suggested labels

docs, ci

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately summarises the primary objective: documenting sandbox installation guidance for Vera in SKILL.md. It is specific, concise, and reflects the main purpose of this documentation-only PR.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch claude/skill-md-sandbox-install-affordance

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

@Row

…Conway debugging session The sandboxed Claude that prompted PR #582's install-affordance note then went on to write Conway's Game of Life and hit a real compiler bug — type aliases over Array<T> compile cleanly through 'vera check' but break at WASM codegen. The agent isolated the root cause through stepwise narrowing. Two confirmed failure modes (both reproduced locally): * Alias as a parameter type ('fn foo(@Row -> ...)' where 'type Row = Array<Bool>') emits structurally invalid WAT; 'vera run' fails at WASM instantiation with 'type mismatch: expected a type but nothing on stack'. * Alias in a let binding ('let @Row = [...]') silently triggers an E602 skip of the enclosing function. The function disappears from the compiled output entirely; 'vera run' then reports 'no exported functions' or 'unknown func' with no link back to the compile-time warning. Filed as #583 with both reproducers, root-cause sketch, and acceptance criteria. Doc changes in this commit: * KNOWN_ISSUES.md — new Bugs row for #583 with the workaround (use the underlying type directly). * SKILL.md type-aliases section — new blockquote inline with the alias examples flagging the pair-type-alias footgun. * SKILL.md Known Bugs and Workarounds table — new row matching the KNOWN_ISSUES entry, in the format agents already scan. * docs/SKILL.md and docs/llms-full.txt regenerated. * scripts/check_skill_examples.py allowlist auto-fixed for the SKILL.md line shifts. Note for the underlying fix: probably one or both of (a) the type-name-to-WASM-type lookup in vera/wasm/helpers.py needs to follow alias resolution, or (b) the slot-env type inference in vera/wasm/inference.py needs to dereference aliases before checking pair-typed-ness. Out of scope for this PR (which is doc-only); tracked in #583. Co-Authored-By: Claude <noreply@anthropic.invalid>

Second bug from the same Conway debugging session that surfaced #583. The agent's writeup arrived as a structured markdown report covering both bugs; bug 2 (this one) had a particularly clean narrowing matrix that made root-cause isolation a 30-second job rather than a 30-minute one. The bug shape: a block-statement sequence like 'helper(()); next_expr' where 'helper' is a user-defined function declared with @Unit return emits structurally invalid WAT. Type-checks fine, then 'vera run' fails at WASM instantiation with 'type mismatch: expected a type but nothing on stack'. Narrowing matrix (verified locally): user helper -> Unit, mid-block FAILS three IO.print mid-block OK user helper -> Int, mid-block OK user helper -> Unit, tail position OK user helper -> Unit, mid (no leading) FAILS So the trigger is specifically: user-defined function returning @Unit, called in a non-tail statement position within a block. Built-in IO.* operations work; non-Unit returns work; Unit in tail position works. Root cause located: vera/wasm/context.py::_is_void_expr at line 456 determines whether an expression produces a stack value. It correctly handles QualifiedCall (IO.print etc.), UnitLit, FnCall to effect ops, AssertExpr/AssumeExpr, and compound expressions recursively. It does NOT handle FnCall to a user-defined function whose declared return type is @Unit. Falls through to 'return False', caller in translate_block emits a 'drop', WASM validation fails because the call left nothing on the stack. Fix is a one-clause omission, not a deep design issue. The function-return-type registry (self._fn_ret_types, populated in vera/codegen/functions.py:76-80) currently filters OUT Unit-returning functions ('if ret_wt and ret_wt != "unsupported"'); the registry needs to either include them explicitly or be supplemented with a separate _unit_returning_fns: set[str]. ~5-10 lines. Out of scope for this PR (which is doc-only); detailed fix sketch in #584. Doc changes in this commit: * KNOWN_ISSUES.md — new Bugs row for #584 with the workaround (keep user @Unit-returning calls in tail position). * SKILL.md Known Bugs and Workarounds table — new row in the format agents already scan, with concrete pointer to the Iteration with IO section as the place agents will hit it. * SKILL.md Iteration with IO section — inline footgun note right after the FizzBuzz example pointing at #584. The example's IO.print mid-block works because it's a qualified call to a built-in effect op; the agent will hit the bug the moment they factor that into a user helper. * docs/SKILL.md and docs/llms-full.txt regenerated. Why no test caught this: every Vera example uses IO.* for IO and _is_void_expr covers QualifiedCall correctly. The shape that breaks (user helper wrapping IO.print, called mid-block) is a pattern that appears the moment programs grow past trivial size, but the test suite favors flat structures or terminal-position calls. Conway's render-then-sleep-then- recurse loop was the first test of this shape in the codebase. Co-Authored-By: Claude <noreply@anthropic.invalid>

aallan and others added 2 commits May 6, 2026 12:05

aallan merged commit 0dfe47c into main May 6, 2026
19 checks passed

aallan deleted the claude/skill-md-sandbox-install-affordance branch May 6, 2026 13:14

aallan mentioned this pull request May 6, 2026

v0.0.135: fix #584 (Unit fn non-tail), #583 (Array<T> aliases), #568 (url_parse leading colon) #585

Merged

8 tasks

coderabbitai Bot mentioned this pull request May 7, 2026

SKILL doc fixes + ROADMAP browser-target tier from Conway's Life agent experiments #601

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SKILL.md: tell sandbox-running agents they can install Vera#582

SKILL.md: tell sandbox-running agents they can install Vera#582
aallan merged 3 commits into
mainfrom
claude/skill-md-sandbox-install-affordance

aallan commented May 6, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

codecov Bot commented May 6, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented May 6, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

aallan commented May 6, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Notes for reviewers

Summary by CodeRabbit

Uh oh!

codecov Bot commented May 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coderabbitai Bot commented May 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

aallan commented May 6, 2026 •

edited by coderabbitai Bot

Loading

codecov Bot commented May 6, 2026 •

edited

Loading

coderabbitai Bot commented May 6, 2026 •

edited

Loading