SKILL doc fixes + ROADMAP browser-target tier from Conway's Life agent experiments#601
Conversation
A second agent wrote Conway's Life cleanly on current main (works first time, all 200 generations clean, 0 corruption signatures, all contracts Tier 1). Their think-aloud transcript surfaced three concrete SKILL.md gaps: 1. array_length: returns Int per the registration but flows freely into Nat positions via the bidirectional Int <: Nat permitted by the type checker (verifier-enforced via Z3). The old comment said "returns Int (always >= 0)" -- correct but agents read this and reach for int_to_nat unnecessarily before discovering empirically that they don't need it. Updated to "returns Nat (the array length, flows to either Nat or Int positions)" matching user- visible behaviour. 2. array_fold closure shape: the SKILL pinned the De Bruijn slot mapping (@Int.0 = element, @Int.1 = accumulator) but never spelled out the resulting signature shape (fn(@acc, @elem -> @acc)). Agents have to derive it via the rightmost = .0 rule, and one wrote a probe to confirm before using array_fold for population counting. Three-line comment added making the shape explicit. 3. Tuple construction: SKILL mentioned @tuple<Int, String> as a type exactly once (line 388) with no construction or destructuring example. An agent considered using array_fold over a tuple table of (dr, dc) offset pairs for neighbour counting, hunted for tuple literal syntax, didn't find any, and fell back to a verbose sequential let chain. Tuples DO work -- the constructor is just the upper-case Tuple name -- but it has to be discovered by guess or by reading the spec grammar. Added a brief subsection in the "Composite types" area showing Tuple(...) construction and match destructuring. Plus a side-effect: scripts/check_skill_examples.py allowlist line numbers re-anchored to match the new SKILL.md line offsets. Both the array_fold edit and the Tuple subsection bumped subsequent line numbers; fix_allowlists.py auto-shifted most entries but missed two which were patched manually. Validation: - All 56 SKILL.md vera blocks parse, 62 allowlisted, 0 failed - pytest tests/ -q: 3,747 passed, 14 skipped - check_doc_counts / check_limitations_sync / check_site_assets clean Co-Authored-By: Claude <noreply@anthropic.invalid>
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #601 +/- ##
=======================================
Coverage 90.92% 90.92%
=======================================
Files 59 59
Lines 22997 22997
Branches 259 259
=======================================
Hits 20910 20910
Misses 2080 2080
Partials 7 7
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Pro Run ID: ⛔ Files ignored due to path filters (2)
📒 Files selected for processing (3)
📝 WalkthroughWalkthroughSKILL.md adds a Tuples subsection and changes ChangesType Documentation and Validation Synchronisation
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related issues
Possibly related PRs
Suggested labels
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@scripts/check_skill_examples.py`:
- Around line 105-109: Remove the mis-anchored allowlist tuple that maps a
SKILL.md block to ("FRAGMENT", "Wrong: bare `@Int` + `@Int` without indices") — that
entry is suppressing a full-function example (“add effects”); either delete that
tuple or re-anchor it to the actual fragment location (or replace it with the
correct tuple for the full-function example). You can find this entry adjacent
to the other allowlist entries with messages "Wrong: missing contracts" and
"Correct: bare `@Int` + `@Int` (common mistakes)"; update the allowlist so the
fragment-only allowlist no longer points at the full-function example.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 4c9b4a82-6305-437f-8fd7-fc8463045d5c
⛔ Files ignored due to path filters (2)
docs/SKILL.mdis excluded by!docs/**docs/llms-full.txtis excluded by!docs/**
📒 Files selected for processing (2)
SKILL.mdscripts/check_skill_examples.py
CI fix: - Added a [Unreleased] CHANGELOG entry covering the three SKILL fixes + the allowlist re-anchoring tooling change. Previous push failed the lint job because the changelog hook was unsatisfied. CodeRabbit fix (verified against current code): - The allowlist tuple at line 1905 was suppressing a parseable example (CORRECT add-with-effects full function) -- removed. The description "Wrong: bare @int + @int without indices" that lived there was actually the correct description for the bare-add code at line 1942 (where the entry's previous label "Common mistake example, bare if/else" was just wrong -- the block is bare add, not if/else). Net change: drop the misplaced entry, fix the description on its rightful neighbour. Bonus low-hanging fix folded in for #605: - SKILL.md "decreases (termination)" subsection opening rewritten to lead with the pure-vs-effectful rule prominently rather than saying "Required on recursive functions" (which was misleading for effectful recursion). Two recent agents writing Conway's Life had to triangulate this rule between FizzBuzz and the State example; the explicit one-liner lets future agents answer it on first read. #605 stays open for the broader doc-prominence work (this is the ~5-line subset that was cheap to fold in). Validation: - 56 SKILL.md vera blocks parse, 62 allowlisted, 0 failed - pytest tests/ -q: green - check_doc_counts / check_limitations_sync / check_site_assets clean Co-Authored-By: Claude <noreply@anthropic.invalid>
…terp) Two concrete fix proposals from the second Conway's Life agent's design memo, closing the timing and rendering halves of the terminal- vs-browser IO seam without language changes. Updated the #608 row to point at both sub-issues so anyone reading KNOWN_ISSUES sees the full fix path rather than just the framing. Co-Authored-By: Claude <noreply@anthropic.invalid>
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@KNOWN_ISSUES.md`:
- Line 46: Update the KNOWN_ISSUES.md wording so it no longer calls array_length
“formally `@Int`”; change the phrase to match the updated SKILL contract (e.g.,
say array_length “returns `Nat`” or is “annotated `@Nat`”) and remove any
wording that implies it is an `@Int`; ensure the entry still notes the
bidirectional `Int <: Nat` subtyping behavior and verifier-enforced refinement
while aligning the array_length description with the new SKILL phrasing.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
Two Conway's Life agent experiments on --target browser surfaced five concrete blockers (issues #602/#603/#604/#609/#610) and an explicit design memo (#608) mapping each obstacle to a concrete runtime-only fix. The current stabilisation framing covers codegen reliability and walker completeness but treats browser- target reliability as a separate concern. The agent's diagnosis makes that split harder to defend: "write once, run anywhere" is currently true for pure computation and approximate-to-false for anything with timing or screen output. Two of the fixes (#609 JSPI-driven IO.sleep, #610 ANSI subset interpreter) close the timing and rendering halves of the seam without language changes -- adding them to the stabilisation tier commits to "browser-target Vera is something you'd actually use" before the agent-integration push. Changes: - Expanded the campaign-pattern list from two patterns to three; third pattern documents the browser-target seam with links to the umbrella issue (#608) and the five concrete blockers. - Added #609 (JSPI-driven IO.sleep) as item 3 in the stabilisation tier, with rationale + the WebAssembly JSPI / Asyncify mechanism. - Added #610 (ANSI subset interpreter) as item 4 in the same tier, paired with #609 -- together they close the seam and let life.vera (terminal version) run unchanged on --target browser. - Renumbered #595 from item 3 to item 5 (now last in stabilisation, since it's contingent on an upstream wasmtime-py release). - Renumbered the agent-integration tier items 4/5/6 -> 6/7/8 to match. - Updated the "What moves when" gate from "#4 starts when #1-#3 are closed" to "#6 starts when #1-#5 are closed". Co-Authored-By: Claude <noreply@anthropic.invalid>
Audit pass requested before merging #601: walk every limitation list across SKILL.md, vera/README.md, spec/, and KNOWN_ISSUES.md to (a) remove anything we've closed off and (b) add anything we've newly flagged. Result of audit: - No closed issues found stale-listed in any open-limitation table. All references to closed issues (#588, #593, #520, #589) are referential prose ('X turned out to be...', 'see #520's pattern') rather than open-issue listings. Nothing to remove. - Five limitation entries were missing across three docs: * SKILL.md Known Limitations: browser-target IO seam (#608). * SKILL.md Known Bugs and Workarounds: #602 (string interp invalid WASM with workaround) and #604 (prelude combinators skipped on browser target). * vera/README.md Current Limitations: browser-target IO seam. * spec/12-runtime.md "## 12.8 Limitations" replaced 'No open limitations.' with the two browser-runtime gaps (#609 JSPI sleep, #610 ANSI subset interpreter). - spec/11-compilation.md left as 'No open limitations' deliberately; #602 and #604 are bugs that belong in KNOWN_ISSUES, not enduring compilation-model limitations of the spec. Validation: SKILL examples parse (56 OK / 62 allowlisted / 0 failed), doc counts consistent, limitations sync clean (36 in KNOWN_ISSUES, 18 in vera/README, 4 across spec chapters). Co-Authored-By: Claude <noreply@anthropic.invalid>
Summary
Single PR for everything that came out of two Conway's Life agent experiments on current main — splits into a SKILL.md doc-fixes half and a ROADMAP stabilisation-tier half. Originally proposed as two separate PRs (#611, since closed); folded together because both originate from the same session, touch independent files, and the merge-coordination cost of splitting consistently exceeds the review-clarity benefit.
SKILL.md doc fixes (commits
1ad3a75,0d0231a,5581235)Three friction points captured by an agent writing Conway's Life from scratch (program worked first-try — 200 generations clean, 0 corruption, 32 contracts Tier 1 — but the agent's think-aloud transcript surfaced these gaps):
array_lengthreturn type clarified. The internal registration saysInt, but the type checker permitsInt <: Nat(verifier-enforced via Z3) so the result flows freely intoNatpositions. Old SKILL comment said "returns Int (always >= 0)" which is correct in registration terms but leads agents to reach forint_to_natunnecessarily. Updated to "returns Nat (the array length, flows to either Nat or Int positions)".array_foldclosure shape made explicit. SKILL pinned the De Bruijn slot mapping (@Int.0 = element, @Int.1 = accumulator) but never spelled out the resulting signature shapefn(@Acc, @Elem -> @Acc). One agent wrote a probe to confirm the parameter order. Three-line comment added.Tuplessubsection. SKILL mentioned@Tuple<Int, String>as a type once with no construction example. An agent considered using array_fold over a tuple table of(dr, dc)offset pairs, couldn't find tuple syntax, fell back to a verbose sequentialletchain. Tuples do work —Tuple(...)constructor — but it has to be discovered. Added a brief subsection in the "Composite types" area.Plus a
decreasesrule prominence fix folded in for #605 (the subsection opening previously said "Required on recursive functions" which is misleading — only required for pure recursive functions).ROADMAP browser-target tier addition (commit
757435e)Two of the bugs surfaced by a second Conway's Life agent experiment on
--target browserare concrete enough to belong in the stabilisation tier — the agent's design memo (posted on #608) pinpointed both as runtime-only fixes that close the terminal-vs-browser seam without language changes:IO.sleepvia JSPI (timing half)Together: the same
life.verasource would run unchanged on both targets.Stabilisation tier is now 5 items (was 3). #595 demoted to last because it's contingent on an upstream wasmtime-py PyPI release. Agent-integration tier renumbered 4/5/6 → 6/7/8.
Side-effect: allowlist re-anchoring (commit
1ad3a75)scripts/check_skill_examples.py's allowlist line numbers re-anchored after the SKILL line offsets shifted; one stale redundant entry pruned (Non-exhaustive Match section had three allowlist entries but only two actual code blocks); one mis-anchored entry corrected at #1905 (a "bare@Int + @Int" allowlist entry was suppressing a parseable full-function example) — caught by CodeRabbit on review and addressed in0d0231a.Issues filed during this experimental session (9 new)
decreasesrule prominencefix_allowlists.pyheuristic fragilityInt <: Natsubtyping undocumentedIO.sleepWhat's NOT in this PR (deferred)
decreasesrule) — only the prominent-rewording subset folded inTest plan
verablocks parse, 62 allowlisted, 0 failedpytest tests/ -q— 3,747 passed, 14 skippedcheck_doc_counts/check_limitations_sync/check_site_assetsall cleanTuple(42, "hi")round-trip: type-checks ✓Summary by CodeRabbit
Documentation
Tooling
Known Issues
Roadmap Updates
Closes #605