Skip to content

SKILL.md: tell sandbox-running agents they can install Vera#582

Merged
aallan merged 3 commits into
mainfrom
claude/skill-md-sandbox-install-affordance
May 6, 2026
Merged

SKILL.md: tell sandbox-running agents they can install Vera#582
aallan merged 3 commits into
mainfrom
claude/skill-md-sandbox-install-affordance

Conversation

@aallan

@aallan aallan commented May 6, 2026

Copy link
Copy Markdown
Owner

Summary

Observed in the wild today: a Claude.ai sandboxed instance asked to write Conway's Game of Life in Vera read SKILL.md's Installation section, concluded "Since Vera isn't available in this sandbox, I'll focus on writing a correct implementation that the user can run locally on their machine", and fell back to writing untested code. When called out it installed Vera fine.

The instructions weren't wrong — the standard git clone + pip install -e . flow works in any sandbox with Python + git + pip + outbound network (i.e. nearly all of them). The failure was the agent defaulting to "I can't run this" before trying. Direct rebuttal added as a blockquote at the end of the Installation section:

> **For agents running in a sandbox** (Claude.ai, Code Interpreter,
> container-based execution environments, etc.): the steps above
> work. Sandboxes typically have Python, `git`, `pip`, and outbound
> network access — that's all Vera needs. **Run the install commands
> and verify with `vera run examples/hello_world.vera` before
> concluding that Vera "isn't available." Don't assume the sandbox
> lacks the toolchain — try it.**
>
> One caveat: `pip install vera` (no source) installs a *different*
> package from PyPI (an ERAV citizen-science library — unrelated to
> this Vera). Always install from the GitHub source clone shown above.

The PyPI footgun is real: I verified curl pypi.org/pypi/vera/json resolves to "vera 1.0.0: Reference implementation of the ERAV data model for citizen science" — completely unrelated to us.

Test plan

  • python scripts/check_skill_examples.py — 55 Vera blocks parse, 62 allowlisted, 0 failures (line-shifted allowlist auto-fixed via fix_allowlists.py --fix)
  • python scripts/check_doc_counts.py — counts consistent
  • python scripts/check_version_sync.py — 5 files consistent
  • python scripts/build_site.py && python scripts/check_site_assets.pydocs/SKILL.md + docs/llms-full.txt regenerated, site assets up-to-date
  • All pre-commit hooks pass

Notes for reviewers

Doc-only. No code changes, no version bump. CHANGELOG entry under [Unreleased] rather than bundled into v0.0.134 via another retag — we just did one of those yesterday for the broader consistency sweep, and another retag within hours for a one-paragraph fix is the kind of churn the project's release-workflow memory now flags as something to avoid.

Audience consideration: CLAUDE.md is for Claude Code (already-local dev env) and doesn't need updating. AGENTS.md "agents writing Vera code" section already points at SKILL.md for installation, so the cascade works.

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Documentation

    • Added sandbox installation guidance and GitHub-source install/verification notes; warned that pip install vera from PyPI is a different package
    • Expanded guidance on type-aliases over Array, failure modes, workarounds, and a codegen footgun note about non-tail-position Unit-returning calls
    • Improved runtime-trap diagnostics with labelled kinds, per-kind fixes and source backtraces
  • Known Issues

    • Added entries for Array codegen break and non-tail Unit-returning call WAT generation
  • Chores

    • Updated examples allowlist to align documentation and diagnostics changes

Observed in the wild on 2026-05-06: a Claude.ai sandboxed
instance asked to write Conway's Game of Life in Vera read
SKILL.md's Installation section, concluded "Vera isn't
available in this sandbox", and fell back to "I'll write a
correct implementation that the user can run locally" —
without trying the install steps.  When called out it
managed to install Vera fine.

The Installation section currently describes the standard
`git clone + pip install -e .` flow that DOES work in any
sandbox with Python + git + pip + outbound network (i.e.
basically all of them).  The failure mode is the agent
defaulting to "I can't run this" before testing, not the
instructions being wrong.

Add an explicit note as a blockquote at the end of the
Installation section directly addressing this.  Two parts:

1. "You CAN install in a sandbox" — direct rebuttal to the
   default-to-no-toolchain reflex.  Lists the typical
   sandbox capabilities and explicitly says "try it before
   concluding Vera isn't available."

2. "Don't `pip install vera` from PyPI" — that package
   name resolves to an unrelated ERAV citizen-science
   library (verified via `curl pypi.org/pypi/vera/json`).
   Always install from the GitHub source clone.

CHANGELOG entry under [Unreleased]; not bundling into v0.0.134
via retag because we just did that yesterday and another retag
within hours for a one-paragraph fix is the kind of churn
worth avoiding.

Auto-regenerated docs/SKILL.md + docs/llms-full.txt + line-
shifted allowlist fix in scripts/check_skill_examples.py.

Co-Authored-By: Claude <noreply@anthropic.invalid>
@codecov

codecov Bot commented May 6, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 90.98%. Comparing base (92a2e69) to head (773f364).

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #582   +/-   ##
=======================================
  Coverage   90.98%   90.98%           
=======================================
  Files          59       59           
  Lines       22884    22884           
  Branches      259      259           
=======================================
  Hits        20821    20821           
  Misses       2056     2056           
  Partials        7        7           
Flag Coverage Δ
javascript 57.36% <ø> (ø)
python 94.82% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@coderabbitai

coderabbitai Bot commented May 6, 2026

Copy link
Copy Markdown

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 5ca93a8e-5a8c-4c21-9adb-adda22322241

📥 Commits

Reviewing files that changed from the base of the PR and between 93b72ec and 773f364.

⛔ Files ignored due to path filters (2)
  • docs/SKILL.md is excluded by !docs/**
  • docs/llms-full.txt is excluded by !docs/**
📒 Files selected for processing (2)
  • KNOWN_ISSUES.md
  • SKILL.md

📝 Walkthrough

Walkthrough

Adds sandbox installation guidance and runtime-trap notes to documentation (SKILL.md, CHANGELOG.md, KNOWN_ISSUES.md). Updates scripts/check_skill_examples.py ALLOWLIST to match SKILL.md line shifts and new fragments; no code/APIs changed.

Changes

Documentation and Checker Alignment

Layer / File(s) Summary
Documentation Content
SKILL.md
Adds sandbox installation guidance (git clone + pip install -e .), warns that pip install vera on PyPI is a different package, adds notes about type aliases over Array<T> breaking WASM codegen (two failure modes), mid-block IO.print footgun, and enhanced runtime-trap diagnostics (kind + Fix + source backtrace).
Known Issues
KNOWN_ISSUES.md
Inserts bug entries for type-alias over Array<T> breaking WASM codegen (issue #583) and @Unit-returning non-tail calls emitting invalid WAT (issue #584).
Changelog
CHANGELOG.md
Adds a Documentation subsection under Unreleased documenting the sandbox installation guidance and the PyPI caveat.
Checker Configuration
scripts/check_skill_examples.py
Reworks ALLOWLIST FRAGMENT coordinates (many entries moved/added), adds a new FRAGMENT for Option/Result combinator bare calls, and moves the MISMATCH placeholder from line 127 to 131 to align with SKILL.md edits.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

  • aallan/vera#581: Overlapping SKILL.md edits and corresponding scripts/check_skill_examples.py ALLOWLIST updates.

Suggested labels

docs, ci

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarises the primary objective: documenting sandbox installation guidance for Vera in SKILL.md. It is specific, concise, and reflects the main purpose of this documentation-only PR.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch claude/skill-md-sandbox-install-affordance

Comment @coderabbitai help to get the list of available commands and usage tips.

aallan and others added 2 commits May 6, 2026 12:05
…Conway debugging session

The sandboxed Claude that prompted PR #582's install-affordance
note then went on to write Conway's Game of Life and hit a real
compiler bug — type aliases over Array<T> compile cleanly
through 'vera check' but break at WASM codegen.  The agent
isolated the root cause through stepwise narrowing.

Two confirmed failure modes (both reproduced locally):

* Alias as a parameter type ('fn foo(@Row -> ...)' where
  'type Row = Array<Bool>') emits structurally invalid WAT;
  'vera run' fails at WASM instantiation with 'type mismatch:
  expected a type but nothing on stack'.
* Alias in a let binding ('let @Row = [...]') silently
  triggers an E602 skip of the enclosing function.  The
  function disappears from the compiled output entirely;
  'vera run' then reports 'no exported functions' or
  'unknown func' with no link back to the compile-time
  warning.

Filed as #583 with both reproducers, root-cause sketch, and
acceptance criteria.

Doc changes in this commit:

* KNOWN_ISSUES.md — new Bugs row for #583 with the workaround
  (use the underlying type directly).
* SKILL.md type-aliases section — new blockquote inline with
  the alias examples flagging the pair-type-alias footgun.
* SKILL.md Known Bugs and Workarounds table — new row
  matching the KNOWN_ISSUES entry, in the format agents
  already scan.
* docs/SKILL.md and docs/llms-full.txt regenerated.
* scripts/check_skill_examples.py allowlist auto-fixed for
  the SKILL.md line shifts.

Note for the underlying fix: probably one or both of (a) the
type-name-to-WASM-type lookup in vera/wasm/helpers.py needs
to follow alias resolution, or (b) the slot-env type
inference in vera/wasm/inference.py needs to dereference
aliases before checking pair-typed-ness.  Out of scope for
this PR (which is doc-only); tracked in #583.

Co-Authored-By: Claude <noreply@anthropic.invalid>
Second bug from the same Conway debugging session that surfaced
#583.  The agent's writeup arrived as a structured markdown
report covering both bugs; bug 2 (this one) had a particularly
clean narrowing matrix that made root-cause isolation a
30-second job rather than a 30-minute one.

The bug shape: a block-statement sequence like 'helper(());
next_expr' where 'helper' is a user-defined function declared
with @Unit return emits structurally invalid WAT.  Type-checks
fine, then 'vera run' fails at WASM instantiation with
'type mismatch: expected a type but nothing on stack'.

Narrowing matrix (verified locally):

  user helper -> Unit, mid-block        FAILS
  three IO.print mid-block              OK
  user helper -> Int, mid-block         OK
  user helper -> Unit, tail position    OK
  user helper -> Unit, mid (no leading) FAILS

So the trigger is specifically: user-defined function returning
@Unit, called in a non-tail statement position within a block.
Built-in IO.* operations work; non-Unit returns work; Unit in
tail position works.

Root cause located: vera/wasm/context.py::_is_void_expr at
line 456 determines whether an expression produces a stack
value.  It correctly handles QualifiedCall (IO.print etc.),
UnitLit, FnCall to effect ops, AssertExpr/AssumeExpr, and
compound expressions recursively.  It does NOT handle FnCall
to a user-defined function whose declared return type is
@Unit.  Falls through to 'return False', caller in
translate_block emits a 'drop', WASM validation fails because
the call left nothing on the stack.

Fix is a one-clause omission, not a deep design issue.  The
function-return-type registry (self._fn_ret_types, populated
in vera/codegen/functions.py:76-80) currently filters OUT
Unit-returning functions ('if ret_wt and ret_wt !=
"unsupported"'); the registry needs to either include them
explicitly or be supplemented with a separate
_unit_returning_fns: set[str].  ~5-10 lines.  Out of scope
for this PR (which is doc-only); detailed fix sketch in #584.

Doc changes in this commit:

* KNOWN_ISSUES.md — new Bugs row for #584 with the workaround
  (keep user @Unit-returning calls in tail position).
* SKILL.md Known Bugs and Workarounds table — new row in the
  format agents already scan, with concrete pointer to the
  Iteration with IO section as the place agents will hit it.
* SKILL.md Iteration with IO section — inline footgun note
  right after the FizzBuzz example pointing at #584.  The
  example's IO.print mid-block works because it's a qualified
  call to a built-in effect op; the agent will hit the bug
  the moment they factor that into a user helper.
* docs/SKILL.md and docs/llms-full.txt regenerated.

Why no test caught this: every Vera example uses IO.* for IO
and _is_void_expr covers QualifiedCall correctly.  The shape
that breaks (user helper wrapping IO.print, called mid-block)
is a pattern that appears the moment programs grow past
trivial size, but the test suite favors flat structures or
terminal-position calls.  Conway's render-then-sleep-then-
recurse loop was the first test of this shape in the codebase.

Co-Authored-By: Claude <noreply@anthropic.invalid>
@aallan aallan merged commit 0dfe47c into main May 6, 2026
19 checks passed
@aallan aallan deleted the claude/skill-md-sandbox-install-affordance branch May 6, 2026 13:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant