Skip to content

Release v0.0.118: string utilities + character classification (#470, #471)#510

Merged
aallan merged 6 commits into
mainfrom
string-and-char-builtins
Apr 23, 2026
Merged

Release v0.0.118: string utilities + character classification (#470, #471)#510
aallan merged 6 commits into
mainfrom
string-and-char-builtins

Conversation

@aallan

@aallan aallan commented Apr 23, 2026

Copy link
Copy Markdown
Owner

Summary

Sixteen new built-in functions for string utilities (#470) and character classification (#471), shipped together as a single PR. All inline WAT — no host imports, bit-identical Python (wasmtime) and browser (Node.js) output.

String utilities (8, #470):

  • string_chars / string_lines / string_words — bridge from String to Array<String>. Lines follows Python splitlines() (trailing terminator does not add empty segment); words follows Python split() with no args (whitespace runs collapse, empty segments discarded).
  • string_reverse / string_trim_start / string_trim_end
  • string_pad_start / string_pad_end — JavaScript padStart / padEnd semantics: fill cycles left-to-right, truncated to padding length; empty fill is a no-op; target shorter than input returns input unchanged.

Character classification (8, #471):

  • is_digit / is_alpha / is_alphanumeric / is_whitespace / is_upper / is_lower — first-byte ASCII predicates; all return false for the empty string.
  • char_to_upper / char_to_lower — first byte only.

ASCII-only by design. Phase 2 (Unicode-aware variants — string_codepoints, string_graphemes, whole-string case conversion, etc.) tracked separately in #509.

Implementation

  • vera/wasm/calls_strings.py grew from 2,588 → 3,979 lines with sixteen new translators. Shared helpers factor out the predicate scaffolding (_translate_classifier), case conversion (_translate_char_case), trim variants (_translate_trim), pad variants (_translate_pad), and structural splits (_translate_structural_split covers both string_lines and string_words via a two-pass count-then-emit shape mirroring the existing string_split).
  • ASCII range tricks reused throughout: (byte - 48) < 10 for digit, ((byte | 0x20) - 97) < 26 for case-folded alpha. Branchless and idiomatic.
  • vera/wasm/inference.py extended for the new return-type shapes (i32_pair for string/array returns, i32 for bool returns).
  • vera/wasm/calls.py dispatch table extended with sixteen new branches; _infer_concat_elem_type now returns "String" for the three split operations.
  • vera/codegen/modules.py known-names allowlist extended.
  • vera/environment.py registers all sixteen FunctionInfo entries.

Tests

  • 33 unit tests across TestCharClassification and TestStringUtilities in tests/test_codegen.py.
  • 9 browser parity tests across TestBrowserStringUtilities and TestBrowserCharClassification in tests/test_browser.py.
  • New conformance program tests/conformance/ch09_string_char_utilities.vera (level: run) with 14 test functions covering classifier, transform, pad, split, and empty-string edge cases.
  • Tier-count test updated: 184/23/207 → 213/26/239 reflects the new examples/string_utilities.vera contributing 29 T1 + 3 T3 + 32 contracts.

Example

examples/string_utilities.vera exercises all sixteen built-ins in a coherent log-line-processor scenario — line counting, word splitting, byte iteration via string_chars, classifier-based token validation, title-casing, trim, pad, palindrome check via string_reverse, plus comprehensive empty-string round-trip coverage. Verified clean (29 Tier 1, 3 Tier 3).

Documentation

Comprehensive docs sweep across both human- and agent-facing surfaces:

  • spec/09-standard-library.md — new §9.6.22 with full signatures, semantics, ASCII range tables, and verification notes.
  • SKILL.md — new sub-section under string ops; naming-convention section mentions the sixteen new names; bumped conformance count to 79.
  • AGENTS.md / CLAUDE.md / README.md / vera/README.md / FAQ.md / TESTING.md / EXAMPLES.md — count bumps (78 → 79 conformance, 31 → 32 examples, 137 → 153 built-ins, 3,437 → 3,489 tests), file-size table refresh.
  • CHANGELOG.md — v0.0.118 entry with separate Added / Compiler / Tests sections.
  • HISTORY.md — Stage 11 row added.
  • ROADMAP.mdAdd string utility built-ins (reverse, pad, chars, trim variants) #470 / Add character classification built-ins (is_digit, is_alpha, etc.) #471 removed from short-term queue (now shipped); String + character built-ins: Unicode-aware phase 2 #509 Unicode phase-2 added to Phase 4c.
  • examples/README.mdstring_utilities.vera row added.
  • docs/ site assets regenerated via scripts/build_site.py.

Release prep

  • Version bumped 0.0.117 → 0.0.118 across vera/__init__.py, pyproject.toml, docs/index.html, README.md, uv.lock.
  • Tag v0.0.118 placed on the feature branch; will be moved to main HEAD after merge so the CHANGELOG-extraction release flow can run.

Test plan

  • 33 new unit tests pass (pytest tests/test_codegen.py::TestCharClassification tests/test_codegen.py::TestStringUtilities)
  • 9 new browser parity tests pass (pytest tests/test_browser.py::TestBrowserStringUtilities tests/test_browser.py::TestBrowserCharClassification)
  • All 79 conformance programs pass (python scripts/check_conformance.py)
  • All 32 examples pass check + verify (python scripts/check_examples.py)
  • Full pytest suite green (3,475 passed, 14 skipped)
  • mypy clean
  • All 14 doc validation scripts green (conformance, examples, README, spec, SKILL, EXAMPLES, FAQ, HTML, site assets, version sync, doc counts, licenses, limitations, examples README)

Closes #470
Closes #471

Summary by CodeRabbit

  • New Features

    • Added 16 built-in string and character utilities (chars/lines/words, reverse, trim start/end, pad start/end, ASCII first-byte classifiers and case conversions).
  • Tests

    • Expanded unit and browser-parity tests and added a new conformance chapter covering the string/char utilities; verification totals updated.
  • Documentation

    • Updated changelog, READMEs, roadmap, specs and other docs to v0.0.118 and revised test/conformance/example counts.
  • Chores

    • Bumped package version to v0.0.118.

Sixteen new built-in functions implemented as inline WAT - no host
imports, bit-identical Python/browser output. Closes #470 and #471.

String utilities (eight, #470):
- string_chars / string_lines / string_words - bridge from String to
  Array<String>; lines follows Python splitlines, words follows
  Python split with no args
- string_reverse / string_trim_start / string_trim_end
- string_pad_start / string_pad_end - JS padStart/padEnd cycling
  semantics: fill cycles left-to-right, truncated to padding length

Character classification (eight, #471):
- is_digit, is_alpha, is_alphanumeric, is_whitespace, is_upper,
  is_lower - first-byte ASCII predicates, false for empty string
- char_to_upper, char_to_lower - first byte only

Tests added: 33 unit tests (TestCharClassification, TestStringUtilities),
9 browser parity tests, conformance ch09_string_char_utilities.vera
with 14 test functions covering classifier, transform, pad, split,
and empty-string edge cases.

Example added: examples/string_utilities.vera exercises all sixteen
built-ins in a coherent log-line-processor scenario.

Phase 2 (Unicode-aware variants - string_codepoints, string_graphemes,
whole-string string_to_upper/lower, etc.) tracked separately in #509.

Co-Authored-By: Claude <noreply@anthropic.invalid>
@codecov

codecov Bot commented Apr 23, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 98.30339% with 17 lines in your changes missing coverage. Please review.
✅ Project coverage is 90.59%. Comparing base (d18d7cd) to head (407772b).
⚠️ Report is 7 commits behind head on main.

Files with missing lines Patch % Lines
vera/wasm/calls_strings.py 99.04% 9 Missing ⚠️
vera/wasm/inference.py 50.00% 6 Missing ⚠️
vera/wasm/calls.py 94.11% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #510      +/-   ##
==========================================
+ Coverage   90.27%   90.59%   +0.31%     
==========================================
  Files          58       58              
  Lines       21087    21958     +871     
  Branches      236      237       +1     
==========================================
+ Hits        19036    19892     +856     
- Misses       2047     2062      +15     
  Partials        4        4              
Flag Coverage Δ
javascript 52.62% <ø> (+0.04%) ⬆️
python 94.99% <98.30%> (+0.14%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@coderabbitai

coderabbitai Bot commented Apr 23, 2026

Copy link
Copy Markdown

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds 16 ASCII-first-byte built‑ins (8 string utilities, 8 character ops), registers them in the type environment, extends WASM/Vera inference and lowering, expands cross‑module allowlists, adds unit/browser/conformance tests, updates specs/docs, and bumps package version to v0.0.118.

Changes

Cohort / File(s) Summary
Version & metadata
pyproject.toml, vera/__init__.py
Bumped package version 0.0.1170.0.118.
Release notes & docs
CHANGELOG.md, HISTORY.md, README.md, ROADMAP.md, AGENTS.md, CLAUDE.md, FAQ.md, TESTING.md, examples/README.md, vera/README.md
Added v0.0.118 entries; updated counts for builtins, tests, conformance programs and examples; roadmap/priorities adjusted (Unicode phase‑2 #509).
Specification & skill docs
spec/09-standard-library.md, SKILL.md
Added spec and SKILL docs for string_*, is_*, char_to_* with ASCII-first-byte semantics and verification tier notes.
Type registration / prelude
vera/environment.py
Registered new pure built‑ins (string_chars/lines/words, reverse, trim_start/end, pad_start/end, classifiers, char case converters).
WASM dispatch & call handling
vera/wasm/calls.py, vera/codegen/modules.py
Added dispatch routing and expanded known-functions allowlist; array-returning string ops now infer element type String.
WASM lowering (string impl)
vera/wasm/calls_strings.py
Implemented lowering for string utilities and classifiers/case converters: allocation/copy loops, two‑pass structural splits, pad cycling, unified trim helper.
WASM/Vera inference
vera/wasm/inference.py
Mapped new builtins to WASM/Vera return shapes so calls propagate correct types (i32/i32_pair/i32).
Tests & conformance
tests/test_codegen.py, tests/test_browser.py, tests/test_verifier.py, tests/conformance/manifest.json
Added unit and browser‑parity tests for new builtins; updated verifier totals; added conformance manifest entry ch09_string_char_utilities.
Spec/example allowlists & scripts
scripts/check_skill_examples.py, scripts/check_spec_examples.py
Updated ALLOWLIST mappings and fragment line numbers to accommodate new spec/docs and example additions.
Other minor docs
AGENTS.md, CLAUDE.md, FAQ.md, TESTING.md, examples/README.md
Synchronized various docs to the new program/example/test counts and example additions.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Dev as Developer/PR
  participant Env as TypeEnv\n(vera/environment.py)
  participant Infer as Inference\n(vera/wasm/inference.py)
  participant Lower as Wasm Lowering\n(vera/wasm/calls_strings.py)
  participant WasmMod as Wasm Module
  participant Runtime as Runtime\n(Node.js / wasmtime)
  participant Tests as Test Harness

  Dev->>Env: register new builtins (signatures)
  Env->>Infer: expose builtin types to inference
  Infer->>Lower: infer call return shapes (i32, i32_pair)
  Lower->>WasmMod: emit WASM (alloc/copy/loops/splits)
  WasmMod->>Runtime: execute in Node/wasmtime
  Runtime->>Tests: produce stdout/results
  Tests->>Dev: assert parity & conformance
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Possibly related issues

Possibly related PRs

  • PR #319 — Parallel changes to environment builtin registration, wasm dispatch and modules allowlist.
  • PR #485 — Related updates to built-in dispatch/allowlist logic in codegen/modules and calls.
  • PR #454 — Related changes to WASM/Vera function-call type inference for builtin return shapes.

Suggested labels

compiler, tests, spec, ci, docs

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately captures the primary objective: release v0.0.118 adds sixteen string utility and character classification built-ins (#470, #471) as the main feature.
Linked Issues check ✅ Passed The PR fully implements all requirements from #470 (string utilities: chars, lines, words, reverse, trim variants, pad variants) and #471 (character classifiers and case converters), with correct semantics, type signatures, and SMT/verification mappings throughout the implementation and documentation.
Out of Scope Changes check ✅ Passed All changes are scoped to v0.0.118 release, inline WAT implementation, and associated documentation/testing updates for the sixteen built-ins; no unrelated refactors or feature creep detected.
Docstring Coverage ✅ Passed Docstring coverage is 98.88% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch string-and-char-builtins

Comment @coderabbitai help to get the list of available commands and usage tips.

The five signature-only code blocks in the new section 9.6.22 (string
utilities + character classification, #470 + #471) are bare function
signatures with no body. They follow the same pattern already
allowlisted for string_index_of, string_replace, string_join, and
string_repeat at lines 1108, 1190, 1221, 1250 of the same file.

CI lint job's check_spec_examples.py was failing on these five new
blocks. Locally I'd seen the script exit OK earlier in the session
because I read only the tail of the output and missed the PARSE
FAILED line at the top.

Co-Authored-By: Claude <noreply@anthropic.invalid>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 10

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@HISTORY.md`:
- Line 255: The release row for v0.0.118 contains unescaped pipe characters
("||") that break the Markdown table (MD056) and the tagged-release total in the
footer wasn't incremented; fix by editing the table cell that mentions string
utilities/character classification (the row starting with "v0.0.118") to remove
or escape the literal "||" (e.g., replace with "or", "&vert;&vert;", or wrap the
phrase in a code span) so pipes no longer act as column separators, and update
the tagged-release total in the footer from 113 to 114 to reflect the new
release.

In `@scripts/check_skill_examples.py`:
- Around line 154-155: The allowlist entry for the FRAGMENT case in
scripts/check_skill_examples.py is pointing at the wrong line so the code fence
isn't skipped; update the allowlist mapping for ("FRAGMENT", "String utility /
classifier signatures, bare function calls") to use the actual line number of
the opening "```vera" fence (i.e., change the key from 791 to the line where the
opening ```vera appears) so check_skill_examples.py will skip this fragment
correctly.

In `@spec/09-standard-library.md`:
- Line 1774: The sentence incorrectly claims whole-string versions of
string_upper and string_lower are future work; update the sentence so it does
not assert that whole-string string_upper/string_lower are pending—either remove
the sentence or rephrase it to say only "Unicode-aware variants of string_upper
and string_lower" are future work, referencing the already-specified
whole-string functions in §9.6.14 (symbols: string_upper, string_lower) to make
the distinction clear.

In `@tests/test_browser.py`:
- Around line 1116-1137: The test_classifiers test currently doesn't cover both
true/false paths for every predicate; update the source payload used in
test_classifiers so each classifier (is_digit, is_alpha, is_alphanumeric,
is_whitespace, is_upper, is_lower) is called twice with one value that should
return true and one that should return false (e.g., change the is_alphanumeric
call to use an alphanumeric and a non-alphanumeric input, add failing cases for
is_whitespace and failing cases for is_upper/is_lower), add explicit \v and \f
inputs to fully exercise ASCII-whitespace, and then update the expected stdout
string in the assertion to match the new sequence of bool_to_string outputs;
reference the existing symbols is_alphanumeric, is_whitespace, is_upper,
is_lower, is_digit, is_alpha, bool_to_string and the test function name
test_classifiers when making changes.

In `@tests/test_codegen.py`:
- Around line 11784-11793: The test test_is_whitespace is claiming CR (\"\\r\")
is considered whitespace but the input matrix for is_whitespace omits that case;
update the matrix in tests/test_codegen.py inside test_is_whitespace to include
the CR example (e.g., add ("\\r", 1)) so the function is_whitespace is exercised
for carriage return and the assertion verifies it returns 1 just like space,
tab, and LF.
- Around line 11997-12040: Add missing edge-case tests for string_lines: create
tests in tests/test_codegen.py alongside test_string_lines_* that cover (1)
trailing "\r" input (e.g. "a\r\nb\r" or just "a\r") and trailing "\r\n"
specifically, and (2) interior blank-line preservation (e.g. "a\n\nb" and
"a\r\rb" variants) to assert array_length/string_join behavior using the
existing helpers (_run and _run_io) and the same structure as
test_string_lines_crlf/test_string_lines_trailing_newline so splitlines
semantics are fully covered for string_lines.
- Around line 11740-12086: The inline-WAT built-ins tests
(TestCharClassification and TestStringUtilities covering symbols like
char_to_upper, char_to_lower, is_digit/is_alpha/is_whitespace, string_reverse,
string_trim_start/trim_end, string_pad_*, string_chars, string_lines,
string_words, etc.) must assert the "no host imports" invariant up front; update
the test harness so every test that emits result.wat for these inline built-ins
validates the import table contains no host imports (e.g. add a call to the
existing import-gating helper or a new assert_no_host_imports(result.wat) before
running the WAT), so regressions that introduce host imports are caught early.

In `@vera/README.md`:
- Line 107: Recompute and update the wasm/ aggregate line count in
vera/README.md so the package total reflects the new sum: replace the stale
total that assumed calls_strings.py was smaller with the corrected subtotal
(update the wasm/ submodule rows so their sum is 16,098) and then update the
package total entry in the table to match the recomputed sum; specifically
adjust the row referencing `calls_strings.py` and the package total cell so the
table sums consistently.

In `@vera/wasm/calls_strings.py`:
- Around line 3696-3704: The string splitting code only checks ASCII space (32),
tab (9), LF (10) and CR (13); add checks for VT (11) and FF (12) so inputs like
"a\vb" and "a\fb" are treated as whitespace. Update the comparisons built into
string_words (the block that appends local.get ... i32.const 32/9/10/13 ...
i32.eq / i32.or) to also compare against 11 and 12, and make the identical
change in the is_whitespace routine and the trim logic near the other occurrence
(the same sequence around lines ~3902-3910) so all three places use the full set
{9,10,11,12,13,32}.
- Around line 3446-3454: The code currently stores interior pointers (data +
offset) into the resulting Array<String> by writing `local.get {data}` +
`local.get {idx}` + offset math into the slice pointer (variables outer, idx,
data), which can produce dangling interior roots; change this to allocate a
fresh buffer for each slice (like _translate_split() does), copy the slice bytes
into that new allocation, and store the new allocation's pointer into the
Array<String> instead of `data + offset`; apply the same change to the other
similar blocks referenced (around lines with the same pattern).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 4f40cbe5-7f08-40d6-ac1c-127ece676561

📥 Commits

Reviewing files that changed from the base of the PR and between d18d7cd and e8f886e.

⛔ Files ignored due to path filters (8)
  • docs/SKILL.md is excluded by !docs/**
  • docs/index.html is excluded by !docs/**
  • docs/index.md is excluded by !docs/**
  • docs/llms-full.txt is excluded by !docs/**
  • docs/llms.txt is excluded by !docs/**
  • examples/string_utilities.vera is excluded by !**/*.vera
  • tests/conformance/ch09_string_char_utilities.vera is excluded by !**/*.vera
  • uv.lock is excluded by !**/*.lock, !uv.lock
📒 Files selected for processing (25)
  • AGENTS.md
  • CHANGELOG.md
  • CLAUDE.md
  • FAQ.md
  • HISTORY.md
  • README.md
  • ROADMAP.md
  • SKILL.md
  • TESTING.md
  • examples/README.md
  • pyproject.toml
  • scripts/check_skill_examples.py
  • scripts/check_spec_examples.py
  • spec/09-standard-library.md
  • tests/conformance/manifest.json
  • tests/test_browser.py
  • tests/test_codegen.py
  • tests/test_verifier.py
  • vera/README.md
  • vera/__init__.py
  • vera/codegen/modules.py
  • vera/environment.py
  • vera/wasm/calls.py
  • vera/wasm/calls_strings.py
  • vera/wasm/inference.py

Comment thread HISTORY.md Outdated
Comment thread scripts/check_skill_examples.py
Comment thread spec/09-standard-library.md Outdated
Comment thread tests/test_browser.py
Comment thread tests/test_codegen.py
Comment thread tests/test_codegen.py
Comment thread tests/test_codegen.py
Comment thread vera/README.md Outdated
Comment thread vera/wasm/calls_strings.py
Comment thread vera/wasm/calls_strings.py Outdated
CR identified ten findings; verified each against the code, fixed the
five that were real issues plus one nice-to-have (no-host-imports
invariant test).

Real bugs fixed:
- HISTORY.md: literal "||" in v0.0.118 row broke the markdown table;
  changed to "or" in the prose example.  Stale "113 tagged releases"
  total updated to 118.
- spec/09-standard-library.md: the closing sentence of section 9.6.22
  incorrectly claimed whole-string string_upper/string_lower were
  future work; they already exist in section 9.6.14.  Reworded to
  reference 9.6.14 and clarify only Unicode-aware variants are
  pending.
- vera/wasm/calls_strings.py: string_chars / string_lines /
  string_words stored interior pointers (data + offset) into the
  result Array<String>.  The GC mark phase's alignment check
  ((val - gc_heap_start) % 8 == 4 in vera/codegen/assembly.py)
  rejects interior pointers, so the shared backing buffer would
  not stay marked across a collection triggered after the function
  returned, leaving the array elements pointing at freed memory.
  Switched to per-slice $alloc + copy + store pattern (matches the
  existing _translate_split implementation).  Added a Python
  closure helper _emit_slice() shared by all five emit sites in
  _translate_structural_split.
- vera/wasm/calls_strings.py: extended is_whitespace, _translate_trim,
  and _translate_structural_split to recognize the full Python
  str.isspace() ASCII set: tab(9), LF(10), VT(11), FF(12), CR(13),
  space(32).  Previously omitted VT and FF.  9..=13 collapses to a
  single (byte - 9) < 5 unsigned range check for branchless emit.
  Spec, SKILL, CHANGELOG updated to document the full set.
- vera/README.md: stale wasm/ aggregate (12,998) bumped to live
  count 16,608.

Tests improved:
- tests/test_codegen.py test_is_whitespace: extended cases to cover
  CR, VT, FF.  Used \\u{0B} / \\u{0C} unicode escapes since Vera's
  lexer doesn't recognise \\v or \\f as simple escapes.
- tests/test_codegen.py: added five string_lines edge-case tests
  (trailing \\r, trailing \\r\\n, interior blank LF, interior blank
  CR) to cover Python splitlines semantics more thoroughly.
- tests/test_codegen.py: new
  test_no_host_imports_for_inline_builtins compiles a program using
  every #470/#471 built-in and asserts the WAT contains no
  (import "vera" "<name>") line for any of the 16 names.  Catches
  a regression where someone routes one through a host import,
  contradicting the design contract.
- tests/test_browser.py test_classifiers: extended source program to
  exercise both true and false paths for every classifier, plus the
  full whitespace set including \\u{0B} / \\u{0C} and the
  empty-string rejection shared by all six.

CR findings dismissed after verification:
- check_skill_examples.py allowlist key 791: the key IS the actual
  line of the opening ```vera fence; the script passes locally.
- HISTORY footer: CR proposed 113 -> 114 but the actual count is
  118 (matching ROADMAP); used the live number rather than CR's
  +1.

Spec example allowlist: also added entries for the 5 signature-only
code blocks I added in section 9.6.22 (one per subsection).  The CI
lint job's check_spec_examples.py was failing on those 5 blocks
because the existing FRAGMENT category doesn't auto-detect
no-body signatures.

All 3480 tests pass; doc count check, spec/SKILL/site asset checks
all green; mypy clean.

Co-Authored-By: Claude <noreply@anthropic.invalid>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 10

♻️ Duplicate comments (1)
vera/README.md (1)

94-107: ⚠️ Potential issue | 🟡 Minor

Recompute the wasm/ totals from the current sources.

vera/wasm/calls_strings.py in this PR runs through Line 4018, so the 3,977 figure at Line 107 is already stale, and the 16,608 subtotal at Line 94 no longer matches the submodule rows either. Please refresh both numbers from the current file sizes; the later Code Generation summary still needs the same update as a downstream fix.

Suggested doc fix
-| `wasm/` | 16,608 | Compile | WASM translation layer (package) | `WasmContext`, `WasmSlotEnv`, `StringPool` |
+| `wasm/` | 16,137 | Compile | WASM translation layer (package) | `WasmContext`, `WasmSlotEnv`, `StringPool` |
@@
-| ` ├ calls_strings.py` | 3,977 | | All string ops (length, concat, slice, search, transform, split, join, chars/lines/words, reverse, trim_start/end, pad_start/end, char_to_upper/lower, classifiers) + to-string conversions | |
+| ` ├ calls_strings.py` | 4,018 | | All string ops (length, concat, slice, search, transform, split, join, chars/lines/words, reverse, trim_start/end, pad_start/end, char_to_upper/lower, classifiers) + to-string conversions | |

As per coding guidelines, "Review Markdown files for factual accuracy against the codebase, broken links, and outdated information."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@vera/README.md` around lines 94 - 107, The README's wasm/ subtotal and the
calls_strings.py line count are out of date; update the two figures (the wasm/
subtotal `16,608` and the calls_strings.py entry `3,977`) to reflect the current
file sizes (e.g., recalc lines for vera/wasm/*.py and replace the stale numbers)
and also update the downstream Code Generation summary that references the same
totals; locate the entries by searching for the `wasm/` table, the
`calls_strings.py` row, and the Code Generation summary in vera/README.md and
replace the numeric totals with the recomputed values.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@CHANGELOG.md`:
- Line 14: The changelog incorrectly states that
string_chars/string_lines/string_words return slices that share a single backing
buffer; update the entry to reflect the GC-safety fix: mention that each split
now performs a per-slice allocation+copy (fixing the interior-pointer issue) so
slices do not rely on a single shared buffer, and adjust the sentence around
"The three split operations return Array<String> whose elements share a single
backing data buffer" to describe the new per-slice allocation behavior and
reference the interior-pointer/GC-safety fix.

In `@HISTORY.md`:
- Line 255: Update the v0.0.118 release note for string_chars to reflect the GC
fix: change the clause that says “each 1-byte element shares a single backing
data buffer” to state that the implementation now allocates and copies each
1-byte slice separately (no shared interior pointers) to avoid the
interior-pointer GC bug; keep the rest of the description (behavior and purpose
of string_chars) intact and reference the shipped fix for correctness.

In `@spec/09-standard-library.md`:
- Line 1707: The sentence claiming that the three functions return a fresh
`@Array<String>` whose elements "share a single backing data buffer" is
outdated; update the wording to reflect that `string_chars`, `string_lines`, and
`string_words` now allocate and copy each slice individually (per-slice
allocation + copy) so elements do not share a single backing buffer, and ensure
the description of `@Array<String>` behavior matches the current implementation.
- Around line 1682-1691: The code fences around the signature-only spec
fragments for functions like string_chars, string_lines, and string_words are
incorrectly tagged as `vera`; change those fenced blocks back to untagged
(remove the `vera` language tag) so they remain signature-only and non-runnable,
and do the same for the other similar signature fragments noted in this chapter
(the additional signature blocks referenced in the comment). Locate the blocks
by the function signatures (e.g., "public fn string_chars(`@String` ->
`@Array`<String>)", "public fn string_lines(...)", "public fn string_words(...)")
and remove the language tag from their triple-backtick fences to restore the
spec convention.

In `@tests/test_browser.py`:
- Around line 1021-1043: The test_string_pad test doesn't exercise the
empty-fill no-op branch for string_pad_start/string_pad_end; add cases in
test_string_pad that call string_pad_start("x", N, "") and string_pad_end("x",
N, "") (and a longer input like string_pad_start("hello", 10, "")) to verify
that when fill == "" the original string is returned unchanged, and update the
expected node["stdout"] string to include the outputs of these no-op calls;
locate the assertions and IO.print calls in test_string_pad and append the new
IO.print invocations and corresponding expected values for stdout accordingly.
- Around line 1005-1103: Update the browser tests to include vertical tab (VT,
\x0B) and form feed (FF, \x0C) as whitespace edge-cases: modify test_string_trim
(string_trim_start / string_trim_end) to call string_trim_start(" \x0B\x0Chi  ")
and string_trim_end("  hi\x0B\x0C ") (or similar placements) and assert the
expected trimmed results, and modify test_string_words to include strings
containing VT/FF (e.g. " \x0Bfoo\x0Cbar ") and assert they are treated as
whitespace producing "foo|bar" and zero-length-only inputs with VT/FF yield 0;
refer to test function names test_string_trim and test_string_words to locate
and update the test sources and their corresponding stdout assertions.

In `@tests/test_codegen.py`:
- Around line 11922-12049: Add mirrored unit tests for char_to_lower and
string_pad_end to cover the missing edge cases: create a
test_char_to_lower_empty that asserts char_to_lower("") returns "" (mirroring
test_char_to_upper_empty), and add tests for string_pad_end that mirror
string_pad_start's cases — e.g., test_string_pad_end with multi-char fill
cycling, test_string_pad_no_change_when_longer for string_pad_end (input length
>= target returns input), and test_string_pad_empty_fill for string_pad_end
(empty fill returns original). Locate and follow the style of existing tests
around test_char_to_lower_first_only and string_pad_start when adding the new
test functions.
- Around line 11968-12222: Add VT (vertical tab \x0b) and FF (form feed \x0c)
whitespace cases to the existing trim and words tests: for
string_trim_start/string_trim_end/string_trim_all_whitespace add inputs like "
\x0b\x0chi\x0b " and assert _run_io/_run results match the same behavior as
ASCII whitespace (trim_start → "hi\x0b ", trim_end → " \x0bhi", all-whitespace →
length 0), and for string_words add cases such as "foo\x0b\x0cbar  " and " 
\x0b\x0c  " to assert splitting/empty-word behavior (use test_string_words_runs,
test_string_words_only_whitespace, and test_string_words_content_via_join
patterns); ensure you reuse the same helper calls (_run/_run_io) and test names
(e.g., test_string_trim_start, test_string_trim_end,
test_string_trim_all_whitespace, test_string_words_*) so the VT/FF coverage
exercises the shared is_whitespace logic.

In `@vera/wasm/calls_strings.py`:
- Around line 3355-3370: The block comment describing the structural-split
algorithm is out of date: it claims slices share a single "data" buffer, but the
implementation of _emit_slice() now allocates and copies each segment to avoid
the GC bug; update the commentary for string_chars, string_lines, string_words
to state that each slice is separately allocated/copied (no shared data buffer),
remove the old shared-buffer allocation description and any claims about O(slen
+ count*slen) vs O(slen + count), and ensure the note references the rationale
(GC avoidance via per-slice allocation) so future maintainers won't reintroduce
the old shared-data approach.
- Around line 2993-3038: The strip implementation diverges from the new
whitespace predicate used by _translate_trim, causing inconsistent behavior
between string_strip and trimming via _translate_trim; fix this by routing
string_strip's implementation to reuse the shared helper: call
_translate_trim(...) with trim_start=True and trim_end=True (or factor out the
_is_ws_inline predicate into a shared method and use it from both
_translate_trim and _translate_strip) so both string_strip and the trim
functions use the exact same whitespace test and code path.

---

Duplicate comments:
In `@vera/README.md`:
- Around line 94-107: The README's wasm/ subtotal and the calls_strings.py line
count are out of date; update the two figures (the wasm/ subtotal `16,608` and
the calls_strings.py entry `3,977`) to reflect the current file sizes (e.g.,
recalc lines for vera/wasm/*.py and replace the stale numbers) and also update
the downstream Code Generation summary that references the same totals; locate
the entries by searching for the `wasm/` table, the `calls_strings.py` row, and
the Code Generation summary in vera/README.md and replace the numeric totals
with the recomputed values.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 52db25b2-eb50-41b3-a238-8fbd4df6f139

📥 Commits

Reviewing files that changed from the base of the PR and between 42a26f4 and 4100790.

⛔ Files ignored due to path filters (2)
  • docs/SKILL.md is excluded by !docs/**
  • docs/llms-full.txt is excluded by !docs/**
📒 Files selected for processing (10)
  • CHANGELOG.md
  • HISTORY.md
  • ROADMAP.md
  • SKILL.md
  • TESTING.md
  • spec/09-standard-library.md
  • tests/test_browser.py
  • tests/test_codegen.py
  • vera/README.md
  • vera/wasm/calls_strings.py

Comment thread CHANGELOG.md Outdated
Comment thread HISTORY.md Outdated
Comment thread spec/09-standard-library.md
Comment thread spec/09-standard-library.md Outdated
Comment thread tests/test_browser.py Outdated
Comment thread tests/test_browser.py
Comment thread tests/test_codegen.py
Comment thread tests/test_codegen.py
Comment thread vera/wasm/calls_strings.py
Comment thread vera/wasm/calls_strings.py
Verified each finding against the current code; fixed the seven that
were real, dismissed two as already-correct, treated one as a
duplicate.

Stale-claim fixes (the per-slice-allocation refactor in commit
4100790 made the "shared backing data buffer" wording incorrect
in three docs):
- CHANGELOG.md line 14: rewrote the Array<String> sentence to
  describe the per-slice alloc+copy and reference the GC mark-phase
  alignment check.
- HISTORY.md line 255: updated the string_chars description to say
  each 1-byte element is independently $alloc-ed.
- spec/09-standard-library.md line 1707: rewrote the section
  9.6.22 description with the same clarification + a parenthetical
  pointing to _emit_gc_collect.
- vera/wasm/calls_strings.py block comment around #470 splits:
  rewrote the algorithm summary to state per-slice allocation is
  required for GC correctness, removed the stale O(slen) claim,
  and pointed at the alignment check in vera/codegen/assembly.py.
- tests/test_codegen.py TestStringUtilities docstring: same.

Test coverage gaps filled:
- tests/test_codegen.py: added test_char_to_lower_empty mirror of
  test_char_to_upper_empty; added test_string_pad_end_no_change_
  when_longer + test_string_pad_end_empty_fill mirrors of the
  pad_start versions; added test_string_trim_vt_ff_full_set to
  exercise both trim ends with a string mixing space/\\t/\\n/VT/FF/CR;
  added test_string_words_vt_ff_separators and
  test_string_words_only_vt_ff exercising VT/FF as word separators
  per Python str.split() semantics.
- tests/test_browser.py test_string_pad: extended source program to
  cover the empty-fill no-op for both pad_start and pad_end (CR
  was right that this branch was missing).
- tests/test_browser.py test_string_trim: extended to mix VT/FF
  with regular whitespace at both ends.
- tests/test_browser.py test_string_words: extended to use VT/FF
  as separators and to cover an all-VT/FF input yielding zero
  words.
- VT (0x0B) and FF (0x0C) are spelled with \\u{0B} / \\u{0C} since
  Vera's lexer doesn't recognise \\v / \\f as simple escapes.

Real bug fixed:
- vera/wasm/calls_strings.py _translate_strip: was open-coding a
  narrower whitespace predicate {tab(9), LF(10), CR(13), space(32)}
  that diverged from the new _translate_trim Python isspace() set
  {tab, LF, VT(11), FF(12), CR, space}.  Refactored
  _translate_strip to delegate to _translate_trim with both flags
  set — eliminates ~160 lines of duplicate code and guarantees the
  predicate stays in sync.  The existing test_string_strip case
  uses ASCII spaces only so the narrower-set behaviour was masked.

Findings dismissed after verification:
- spec/09-standard-library.md 1682-1691: CR suggested untagging the
  ```vera fences for signature-only blocks.  Existing chapter-9
  convention is `vera`-tagged + FRAGMENT allowlist (matches
  string_index_of, string_replace, string_join, string_repeat at
  lines 1108/1190/1221/1250).  My five new blocks already follow
  the convention and are allowlisted in scripts/check_spec_examples.py.
- vera/README.md 94-107: duplicate of the previous round's
  comment 3129576089 — the wasm/ aggregate was already updated.
  Refreshed the calls_strings.py figure and the package total to
  reflect the post-strip-refactor sizes (now 3,890 lines and
  16,521 total).

Doc-count rolling counts: now 3,500 tests / 949 codegen tests /
12,389 codegen lines / 1,585 browser lines.

Co-Authored-By: Claude <noreply@anthropic.invalid>
@aallan

aallan commented Apr 23, 2026

Copy link
Copy Markdown
Owner Author

@coderabbitai Round 2 addressed in commit 98f3a74. Verified each finding against the current code; fixed seven, dismissed two as already-correct, treated one as a duplicate of the previous round.

Fixed:

  • Stale "shared backing data buffer" claim corrected in CHANGELOG.md, HISTORY.md, spec/09-standard-library.md §9.6.22, the calls_strings.py block comment, and the TestStringUtilities docstring (the per-slice-allocation refactor in 4100790 made all five wordings stale).
  • Added test_char_to_lower_empty mirror, test_string_pad_end_no_change_when_longer, test_string_pad_end_empty_fill, test_string_trim_vt_ff_full_set, test_string_words_vt_ff_separators, test_string_words_only_vt_ff to test_codegen.py. VT/FF spelled with \u{0B} / \u{0C} since Vera's lexer doesn't accept \v / \f simple escapes.
  • Extended browser test_string_pad (empty-fill no-op for both sides), test_string_trim (VT/FF mixed in), and test_string_words (VT/FF separators + all-VT/FF input).
  • Real bug: _translate_strip was open-coding a narrower whitespace predicate {tab, LF, CR, space} that diverged from _translate_trim after the Add string utility built-ins (reverse, pad, chars, trim variants) #470 series added VT and FF. Refactored _translate_strip to delegate to _translate_trim(trim_start=True, trim_end=True) — eliminates ~160 lines of duplicate code and locks the predicate in sync.
  • vera/README.md aggregate refreshed (was duplicate of previous-round comment but the strip refactor changed sizes again — calls_strings.py now 3,890 lines, wasm/ total 16,521).

Dismissed:

  • spec/09 fence convention: existing chapter-9 convention is vera-tagged + FRAGMENT allowlist (matches string_index_of/replace/join/repeat at lines 1108/1190/1221/1250). My five new blocks already follow it.

3,486 tests pass, mypy clean, all 14 doc validation scripts green.

@coderabbitai

coderabbitai Bot commented Apr 23, 2026

Copy link
Copy Markdown

Tip

For best results, initiate chat on the files or code changes.

@aallan Thanks for the thorough round-2 summary. Everything looks well-resolved:

  • The stale "shared backing buffer" wording sweep across all five locations is correct — the per-slice allocation refactor was the right semantic change and the doc/comment language needed to follow.
  • The six new test_codegen.py cases (especially test_string_trim_vt_ff_full_set, test_string_words_vt_ff_separators, test_string_words_only_vt_ff) and the three extended browser tests give good coverage of the VT/FF edge that triggered the real bug.
  • The _translate_strip_translate_trim delegation is the standout fix: collapsing ~160 lines of open-coded duplicate logic into a single delegation call both fixes the whitespace predicate divergence and makes future predicate changes automatically consistent across strip and trim. That's a clean correctness + DRY win.
  • The spec/09 fence-convention dismissal is correct — the existing vera-tagged + FRAGMENT-allowlist pattern is the established chapter-9 convention and the new blocks already comply.

3,486 tests / mypy clean / all 14 doc scripts green is a solid green state. Happy to see this land.


✏️ Learnings added
Learnt from: aallan
URL: https://github.com/aallan/vera/pull/510

Timestamp: 2026-04-23T10:03:25.976Z
Learning: In `vera/wasm/calls_strings.py`, `_translate_strip` (used for `string_strip`) must delegate to `_translate_trim(trim_start=True, trim_end=True)` rather than open-coding its own whitespace predicate. Before PR `#510` commit 98f3a74, `_translate_strip` used the narrower set {tab(0x09), LF(0x0A), CR(0x0D), space(0x20)} while `_translate_trim` was extended in the `#470` series to include VT(0x0B) and FF(0x0C). The fix eliminated ~160 lines of duplicate code and locks both paths to a single whitespace predicate. Do NOT re-introduce a separate predicate in `_translate_strip`.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.


🧠 Learnings used
Learnt from: aallan
Repo: aallan/vera PR: 0
File: :0-0
Timestamp: 2026-03-29T10:15:51.538Z
Learning: In aallan/vera, general housekeeping PRs (CI hardening, documentation, tooling) may acceptably bundle incidental code fixes (e.g. adding `vera --version` CLI support, HTTP Content-Type headers, urlopen timeouts) without requiring them to be split into separate PRs. Do not flag such bundling as a scope concern in this repository.

Learnt from: aallan
Repo: aallan/vera PR: 350
File: tests/test_codegen.py:8630-8633
Timestamp: 2026-03-25T16:32:11.316Z
Learning: Repo aallan/vera PR `#350`: tests/test_codegen.py includes explicit WAT import‑gating coverage for JSON host imports — tests test_json_parse_wat_import, test_json_stringify_wat_import, and test_json_no_imports_when_unused verify that "vera.json_parse"/"vera.json_stringify" imports are emitted only when referenced and absent when unused.

Learnt from: aallan
Repo: aallan/vera PR: 357
File: vera/wasm/calls.py:353-358
Timestamp: 2026-03-26T09:40:40.644Z
Learning: Repo aallan/vera: The canonical allocator-needed flag is self.needs_alloc (no underscore). For host-import call sites (e.g., Http/Json/Markdown/Regex/Map/Set/Decimal) in vera/wasm/calls.py and related mixins, set self.needs_alloc = True to ensure allocator wiring is kept.

Learnt from: aallan
Repo: aallan/vera PR: 474
File: vera/wasm/calls_parsing.py:80-90
Timestamp: 2026-04-16T19:47:56.756Z
Learning: In `vera/wasm/calls_parsing.py` (originally in `vera/wasm/calls.py`), the `_translate_parse_nat` and `_translate_parse_int` state-machine loops skip ASCII space (byte 32) unconditionally at every iteration, not just during leading/trailing whitespace. This means embedded spaces (e.g. `"12 34"`) are silently accepted. This is known pre-existing behaviour, intentionally left unchanged in PR `#474` (pure code-motion refactor) and tracked for fixing in issue `#475`.

Learnt from: aallan
Repo: aallan/vera PR: 485
File: vera/wasm/calls_arrays.py:622-626
Timestamp: 2026-04-17T10:13:59.841Z
Learning: In `vera/wasm/calls_arrays.py`, do NOT add per-site `arr_len * sizeof(B) > 65535` trap guards in individual array combinators (`array_map`, `array_range`, `array_concat`, `array_append`, `array_slice`). All of these can allocate >65535 bytes and are equally affected by the 16-bit GC object-header size field truncation bug. The correct, single-point fix is issue `#484`: widen the GC header size field from 16-bit to 31-bit in `vera/codegen/assembly.py` by updating the two `i32.const 65535 / i32.and` masks in `$alloc` and three in `$gc_collect`. Per-site guards would shift the landmine without defusing it.

Learnt from: aallan
Repo: aallan/vera PR: 508
File: vera/README.md:100-100
Timestamp: 2026-04-22T21:32:22.539Z
Learning: In `vera/README.md`, the wasm/ module sub-table uses the pattern `` ` ├ filename.py` `` (leading space inside backticks) consistently across all ~19 rows to produce visual tree-indent alignment in the rendered table. This triggers markdownlint MD038 but the pattern is intentional and pre-existing. Do NOT flag individual rows for MD038 in isolation — fixing a single row would visually misalign it from its siblings. A file-wide markdownlint cleanup pass (stripping leading spaces from all 19 rows, or re-encoding the indent outside the backticks) belongs in a dedicated PR, not bundled into feature PRs.

Learnt from: aallan
Repo: aallan/vera PR: 414
File: codecov.yml:23-26
Timestamp: 2026-03-28T22:44:04.224Z
Learning: In the `codecov.yml` for `aallan/vera`, the Python **project** check uses `target: auto` (compares to base commit) with `threshold: 0.5%` as the hard enforcement floor — this is the real CI gate for coverage regressions. The Python **patch** check is deliberately set to `informational: true` because `coverage.py` does not mark multi-line expression continuation lines as executable; Codecov counts every `+` diff line (including continuation lines) as needing coverage, which systematically under-reports patch coverage on any PR that reformats code into multi-line calls. Enforcing a hard patch target would cause spurious CI failures on every refactor.

Learnt from: aallan
Repo: aallan/vera PR: 350
File: vera/wasm/calls.py:4456-4463
Timestamp: 2026-03-25T14:53:24.176Z
Learning: Repo aallan/vera: In vera/wasm/calls.py, JSON host-import helpers must record usage via self._json_ops_used.add("json_parse") and self._json_ops_used.add("json_stringify") inside CallsMixin._translate_json_parse/_translate_json_stringify so codegen propagates result.json_ops_used and conditional imports are injected in assembly/api; otherwise WAT may reference $vera.json_parse/$vera.json_stringify without matching imports.

Learnt from: aallan
Repo: aallan/vera PR: 489
File: vera/wasm/calls_arrays.py:1072-1082
Timestamp: 2026-04-17T17:27:34.889Z
Learning: In `vera/wasm/calls_arrays.py` (PR `#489`), the `u_is_adt` / `u_needs_root` heuristic in `_translate_array_fold` deliberately over-roots any non-Bool/Byte `i32` accumulator, including opaque host handles (Map, Set, Decimal, Regex). This is intentional safe over-caution: the conservative GC's alignment/range check rejects out-of-range small integers (e.g. `5` from `host_map_new`) so no spurious marking occurs, while under-rooting would be a silent-GC bug (class `#464`). The correct fix — a positive `is_gc_managed(type)` predicate applied cross-cutting across all rooting-decision sites in every mixin — is tracked in issue `#490` and is out of scope for PR `#489`. Do NOT flag the `u_is_adt` condition as incorrect or missing a positive-predicate check; it is a deliberate bias-to-root default.

Learnt from: aallan
Repo: aallan/vera PR: 498
File: vera/wasm/inference.py:328-338
Timestamp: 2026-04-20T10:50:48.202Z
Learning: In `vera/wasm/inference.py`, `_infer_fncall_wasm_type` and `_infer_fncall_vera_type` must always be kept in sync for every built-in function. When a new built-in is added to `_infer_fncall_wasm_type` (returning e.g. `"f64"` or `"i64"`), a matching branch must be added to `_infer_fncall_vera_type` (returning `"Float64"` or `"Int"` respectively). Without this, nesting a built-in call inside an expression whose Vera type is consumed upstream (generics inference, `show`, `hash`) falls back to `None` and causes mis-compilation. This was confirmed by aallan in PR `#498` for the math built-ins added in issue `#467` (log/log2/log10/sin/cos/tan/asin/acos/atan/atan2/pi/e/float_clamp → Float64; sign/clamp → Int).

Learnt from: aallan
Repo: aallan/vera PR: 414
File: vera/codegen/core.py:258-265
Timestamp: 2026-03-28T22:06:21.085Z
Learning: In `vera/codegen/core.py` and `vera/codegen/api.py` (PR `#414`), `CompileResult.fn_param_types` carries erased WASM-level tags (`i64`, `f64`, `i32`, `i32_pair`) rather than Vera-level kinds. This is intentional for the current CLI: `i32_pair` = String only (Array CLI not yet supported), `i64` = Int/Nat, `f64` = Float64, `i32` = Bool/Byte (identical CLI parsing). Carrying Vera-level parameter kinds through CompileResult to distinguish Nat vs Int (negative rejection) and Bool vs Byte is a future-proofing refactor explicitly deferred from PR `#414` and filed as a follow-up note by aallan.

Learnt from: aallan
Repo: aallan/vera PR: 486
File: vera/wasm/calls_arrays.py:832-841
Timestamp: 2026-04-17T14:46:14.693Z
Learning: In `vera/wasm/calls.py` (post-PR `#486`), `_infer_concat_elem_type` was extended to detect `array_map(arr, closure)` call nodes and delegate to `_infer_closure_return_vera_type(closure_arg)` to return the *output* element type B rather than the input element type A. This fix ensures that `array_filter(array_map(...), pred)`, `array_concat(array_map(...), ...)`, `array_slice(array_map(...), ...)` and similar compositions resolve the correct element size, load/store ops, and closure signatures. Do NOT flag `_infer_concat_elem_type` as missing `array_map` handling — it is already present.

@coderabbitai

coderabbitai Bot commented Apr 23, 2026

Copy link
Copy Markdown

Tip

For best results, initiate chat on the files or code changes.

@aallan: Round 2 is fully resolved — thank you for the thorough write-up.

The _translate_strip refactor is the standout fix: open-coding a 4-character whitespace set against a 6-character _translate_trim predicate is exactly the kind of silent divergence that accumulates maintenance debt over time. Delegating to _translate_trim(trim_start=True, trim_end=True) makes the invariant structurally impossible to violate and removes ~160 lines of duplicated logic — a clear correctness win.

The VT/FF unit-test coverage and the \u{0B}/\u{0C} escape note are good to have on record. All other items look correct. No further concerns on my end.


✏️ Learnings added
Learnt from: aallan
URL: https://github.com/aallan/vera/pull/510

Timestamp: 2026-04-23T10:03:48.910Z
Learning: In `vera/wasm/calls_strings.py`, `_translate_strip` (used by the structural-split helpers) must delegate to `_translate_trim(trim_start=True, trim_end=True)` rather than open-coding its own whitespace predicate. Open-coding caused a silent divergence from `_translate_trim` when VT (0x0B) and FF (0x0C) were added in PR `#510`. The canonical 6-byte ASCII whitespace set (space 0x20, tab 0x09, LF 0x0A, VT 0x0B, FF 0x0C, CR 0x0D) lives exclusively in `_translate_trim`; all other trim/strip logic must call it.

Learnt from: aallan
URL: https://github.com/aallan/vera/pull/510

Timestamp: 2026-04-23T10:03:48.910Z
Learning: In Vera string literals (vera/wasm/calls_strings.py tests and test_codegen.py), VT and FF must be spelled as `\u{0B}` and `\u{0C}` respectively. Vera's lexer does not accept `\v` or `\f` as simple escape sequences. This was confirmed in PR `#510` when adding VT/FF whitespace coverage tests.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.


🧠 Learnings used
Learnt from: aallan
Repo: aallan/vera PR: 474
File: vera/wasm/calls_parsing.py:80-90
Timestamp: 2026-04-16T19:47:56.756Z
Learning: In `vera/wasm/calls_parsing.py` (originally in `vera/wasm/calls.py`), the `_translate_parse_nat` and `_translate_parse_int` state-machine loops skip ASCII space (byte 32) unconditionally at every iteration, not just during leading/trailing whitespace. This means embedded spaces (e.g. `"12 34"`) are silently accepted. This is known pre-existing behaviour, intentionally left unchanged in PR `#474` (pure code-motion refactor) and tracked for fixing in issue `#475`.

Learnt from: aallan
Repo: aallan/vera PR: 350
File: tests/test_codegen.py:8630-8633
Timestamp: 2026-03-25T16:32:11.316Z
Learning: Repo aallan/vera PR `#350`: tests/test_codegen.py includes explicit WAT import‑gating coverage for JSON host imports — tests test_json_parse_wat_import, test_json_stringify_wat_import, and test_json_no_imports_when_unused verify that "vera.json_parse"/"vera.json_stringify" imports are emitted only when referenced and absent when unused.

Learnt from: aallan
Repo: aallan/vera PR: 0
File: :0-0
Timestamp: 2026-03-29T10:15:51.538Z
Learning: In aallan/vera, general housekeeping PRs (CI hardening, documentation, tooling) may acceptably bundle incidental code fixes (e.g. adding `vera --version` CLI support, HTTP Content-Type headers, urlopen timeouts) without requiring them to be split into separate PRs. Do not flag such bundling as a scope concern in this repository.

Learnt from: aallan
Repo: aallan/vera PR: 508
File: vera/README.md:100-100
Timestamp: 2026-04-22T21:32:22.539Z
Learning: In `vera/README.md`, the wasm/ module sub-table uses the pattern `` ` ├ filename.py` `` (leading space inside backticks) consistently across all ~19 rows to produce visual tree-indent alignment in the rendered table. This triggers markdownlint MD038 but the pattern is intentional and pre-existing. Do NOT flag individual rows for MD038 in isolation — fixing a single row would visually misalign it from its siblings. A file-wide markdownlint cleanup pass (stripping leading spaces from all 19 rows, or re-encoding the indent outside the backticks) belongs in a dedicated PR, not bundled into feature PRs.

Learnt from: aallan
Repo: aallan/vera PR: 485
File: vera/wasm/calls_arrays.py:622-626
Timestamp: 2026-04-17T10:13:59.841Z
Learning: In `vera/wasm/calls_arrays.py`, do NOT add per-site `arr_len * sizeof(B) > 65535` trap guards in individual array combinators (`array_map`, `array_range`, `array_concat`, `array_append`, `array_slice`). All of these can allocate >65535 bytes and are equally affected by the 16-bit GC object-header size field truncation bug. The correct, single-point fix is issue `#484`: widen the GC header size field from 16-bit to 31-bit in `vera/codegen/assembly.py` by updating the two `i32.const 65535 / i32.and` masks in `$alloc` and three in `$gc_collect`. Per-site guards would shift the landmine without defusing it.

Learnt from: aallan
Repo: aallan/vera PR: 357
File: vera/wasm/calls.py:353-358
Timestamp: 2026-03-26T09:40:40.644Z
Learning: Repo aallan/vera: The canonical allocator-needed flag is self.needs_alloc (no underscore). For host-import call sites (e.g., Http/Json/Markdown/Regex/Map/Set/Decimal) in vera/wasm/calls.py and related mixins, set self.needs_alloc = True to ensure allocator wiring is kept.

Learnt from: aallan
Repo: aallan/vera PR: 498
File: tests/test_codegen.py:0-0
Timestamp: 2026-04-20T10:50:37.891Z
Learning: Repo aallan/vera — tests/test_codegen.py: When asserting presence of a specific $vera.<op> import in WAT where similarly prefixed ops exist (e.g., log vs log2/log10), use a boundary-safe regex like re.search(r'\$vera\.log(?!\d)', result.wat) to avoid substring false-positives.

Learnt from: aallan
Repo: aallan/vera PR: 414
File: vera/codegen/core.py:258-265
Timestamp: 2026-03-28T22:06:21.085Z
Learning: In `vera/codegen/core.py` and `vera/codegen/api.py` (PR `#414`), `CompileResult.fn_param_types` carries erased WASM-level tags (`i64`, `f64`, `i32`, `i32_pair`) rather than Vera-level kinds. This is intentional for the current CLI: `i32_pair` = String only (Array CLI not yet supported), `i64` = Int/Nat, `f64` = Float64, `i32` = Bool/Byte (identical CLI parsing). Carrying Vera-level parameter kinds through CompileResult to distinguish Nat vs Int (negative rejection) and Bool vs Byte is a future-proofing refactor explicitly deferred from PR `#414` and filed as a follow-up note by aallan.

Learnt from: aallan
Repo: aallan/vera PR: 482
File: vera/README.md:94-94
Timestamp: 2026-04-16T22:33:48.063Z
Learning: In `vera/README.md`, the `wasm/` module table and the `Files:` summary line in the "Code Generation" section both intentionally exclude `__init__.py`. As of PR `#482`, the documented count is 18 modules (19 actual files including `__init__.py` which is 6 lines). Do not flag the difference between documented count and `ls vera/wasm/*.py | wc -l` as an inconsistency — `__init__.py` is always omitted by design.

Learnt from: aallan
Repo: aallan/vera PR: 492
File: tests/test_codegen.py:10188-10316
Timestamp: 2026-04-20T08:57:59.362Z
Learning: Repo aallan/vera: In tests/test_codegen.py::TestRandomEffect, each Random op test asserts per-operation WAT import-gating — the used "$vera.random_{op}" import is present while the other Random imports are absent. This guards regressions in _random_ops_used.

Learnt from: aallan
Repo: aallan/vera PR: 350
File: vera/wasm/calls.py:4456-4463
Timestamp: 2026-03-25T14:53:24.176Z
Learning: Repo aallan/vera: In vera/wasm/calls.py, JSON host-import helpers must record usage via self._json_ops_used.add("json_parse") and self._json_ops_used.add("json_stringify") inside CallsMixin._translate_json_parse/_translate_json_stringify so codegen propagates result.json_ops_used and conditional imports are injected in assembly/api; otherwise WAT may reference $vera.json_parse/$vera.json_stringify without matching imports.

Learnt from: aallan
Repo: aallan/vera PR: 489
File: vera/wasm/calls_arrays.py:1072-1082
Timestamp: 2026-04-17T17:27:34.889Z
Learning: In `vera/wasm/calls_arrays.py` (PR `#489`), the `u_is_adt` / `u_needs_root` heuristic in `_translate_array_fold` deliberately over-roots any non-Bool/Byte `i32` accumulator, including opaque host handles (Map, Set, Decimal, Regex). This is intentional safe over-caution: the conservative GC's alignment/range check rejects out-of-range small integers (e.g. `5` from `host_map_new`) so no spurious marking occurs, while under-rooting would be a silent-GC bug (class `#464`). The correct fix — a positive `is_gc_managed(type)` predicate applied cross-cutting across all rooting-decision sites in every mixin — is tracked in issue `#490` and is out of scope for PR `#489`. Do NOT flag the `u_is_adt` condition as incorrect or missing a positive-predicate check; it is a deliberate bias-to-root default.

Learnt from: aallan
Repo: aallan/vera PR: 498
File: vera/wasm/inference.py:328-338
Timestamp: 2026-04-20T10:50:48.202Z
Learning: In `vera/wasm/inference.py`, `_infer_fncall_wasm_type` and `_infer_fncall_vera_type` must always be kept in sync for every built-in function. When a new built-in is added to `_infer_fncall_wasm_type` (returning e.g. `"f64"` or `"i64"`), a matching branch must be added to `_infer_fncall_vera_type` (returning `"Float64"` or `"Int"` respectively). Without this, nesting a built-in call inside an expression whose Vera type is consumed upstream (generics inference, `show`, `hash`) falls back to `None` and causes mis-compilation. This was confirmed by aallan in PR `#498` for the math built-ins added in issue `#467` (log/log2/log10/sin/cos/tan/asin/acos/atan/atan2/pi/e/float_clamp → Float64; sign/clamp → Int).

Learnt from: aallan
Repo: aallan/vera PR: 486
File: vera/wasm/calls_arrays.py:832-841
Timestamp: 2026-04-17T14:46:14.693Z
Learning: In `vera/wasm/calls.py` (post-PR `#486`), `_infer_concat_elem_type` was extended to detect `array_map(arr, closure)` call nodes and delegate to `_infer_closure_return_vera_type(closure_arg)` to return the *output* element type B rather than the input element type A. This fix ensures that `array_filter(array_map(...), pred)`, `array_concat(array_map(...), ...)`, `array_slice(array_map(...), ...)` and similar compositions resolve the correct element size, load/store ops, and closure signatures. Do NOT flag `_infer_concat_elem_type` as missing `array_map` handling — it is already present.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

♻️ Duplicate comments (1)
spec/09-standard-library.md (1)

1682-1691: 🧹 Nitpick | 🔵 Trivial

Use untagged fences for signature-only blocks

These signature fragments should use untagged code fences rather than ```vera, consistent with the established convention for spec chapters. Signature blocks without bodies are not complete runnable programs and belong in the parser-example allowlist rather than validated as full Vera code.

Based on learnings, "In spec/09-standard-library.md … function signature code blocks are intentionally left with untagged fences … and should not be tagged as vera because they are not complete, runnable programs."

Also applies to: 1711-1720, 1734-1740, 1757-1763, 1778-1785

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@spec/09-standard-library.md` around lines 1682 - 1691, The fenced code blocks
showing only function signatures (e.g., public fn string_chars, public fn
string_lines, public fn string_words) are tagged as ```vera but are
signature-only and not runnable; change those fenced blocks to use untagged code
fences (remove the "vera" language tag) so they are treated as signature-only
examples consistent with the spec convention; apply the same fix to the other
signature-only blocks mentioned (the blocks around lines with the other
signature fragments referenced in the comment).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@TESTING.md`:
- Line 12: Update remaining references that still say "all 30 examples" to
reflect the new count "32" in TESTING.md; search for the literal phrase "all 30
examples" (occurrences near the sections mentioning example programs and
validation) and replace each with "all 32 examples" so the document is
internally consistent with the changed header line that now reads "32, all
validated through `vera check` + `vera verify`".

In `@tests/test_browser.py`:
- Around line 1091-1107: Extend the test_string_lines test to exercise the
empty-input case for string_lines("") so the alloc(0) path in
vera/wasm/calls_strings.py::_translate_structural_split is covered: add a call
that evaluates array_length(string_lines("")) (or otherwise inspects
string_lines("") == [] via printing its length or serialized form) in the WASM
source emitted by test_string_lines and update the final stdout assertion (in
test_string_lines) to include the expected value (empty list => length 0 or
appropriate serialized output) so the test asserts the empty-input behavior.

In `@vera/README.md`:
- Line 94: The README has inconsistent line-counts for the wasm/ package (table
shows 16,521 while the "Code Generation" summary shows 12,998); open the README
and update the "Code Generation" section (search for the "wasm/" entry and the
phrase "Code Generation") so the wasm/ line count matches the canonical value
(verify the actual line count from the repo and replace 12,998 or 16,521
accordingly), and scan for any other occurrences of the wasm/ line-count
elsewhere in the file to keep all references consistent.

In `@vera/wasm/calls_strings.py`:
- Around line 2877-2953: The trimming path can allocate and collect the source
because `ptr` is still read after `call $alloc` but never rooted; before the
`call $alloc` in the trimming sequence (the block that computes `new_len` and
then does `call $alloc` / `local.set {dst}`), push the source pointer onto the
GC shadow stack (use the existing `gc_shadow_push`/`gc_shadow_pop` helpers
referencing the same `ptr` local) so the original string is rooted across the
allocation, and pop it after the copy is complete; update the trim helper that
`string_strip()` delegates to (the code around `local.get {end}`/`local.get
{start}` -> `local.set {new_len}` -> `call $alloc`) to include the push before
`call $alloc` and corresponding pop after the newly allocated data is populated.

---

Duplicate comments:
In `@spec/09-standard-library.md`:
- Around line 1682-1691: The fenced code blocks showing only function signatures
(e.g., public fn string_chars, public fn string_lines, public fn string_words)
are tagged as ```vera but are signature-only and not runnable; change those
fenced blocks to use untagged code fences (remove the "vera" language tag) so
they are treated as signature-only examples consistent with the spec convention;
apply the same fix to the other signature-only blocks mentioned (the blocks
around lines with the other signature fragments referenced in the comment).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 8c527c88-53f4-4e0a-a2fd-4104ea3062b5

📥 Commits

Reviewing files that changed from the base of the PR and between 4100790 and 98f3a74.

📒 Files selected for processing (9)
  • CHANGELOG.md
  • HISTORY.md
  • ROADMAP.md
  • TESTING.md
  • spec/09-standard-library.md
  • tests/test_browser.py
  • tests/test_codegen.py
  • vera/README.md
  • vera/wasm/calls_strings.py

Comment thread TESTING.md
Comment thread tests/test_browser.py Outdated
Comment thread vera/README.md
Comment thread vera/wasm/calls_strings.py
Verified each finding against the current code; fixed all four real
ones; treated the spec fence-tag suggestion as a duplicate of the
already-dismissed round-2 finding.

Real bug fixed:
- vera/wasm/calls_strings.py _translate_trim: the source `ptr` local
  was being read inside the post-alloc copy loop without being
  rooted on the shadow stack across the destination $alloc.  WASM
  locals are NOT GC roots — only the shadow stack is — so a
  heap-allocated input string (e.g. the result of string_concat)
  could be collected during the alloc, leaving the copy reading
  from a freed buffer.  Pushed `ptr` via gc_shadow_push() before
  the alloc.  Verified _translate_string_chars already roots its
  ptr_s correctly; the pattern was missing only here.  This fix
  also covers _translate_strip (delegates to _translate_trim) and
  the two one-sided trim functions.

Stale doc fixes:
- TESTING.md line 274: "all 30 examples" → "all 32 examples" in the
  Ch 3 row of the chapter coverage table.
- vera/README.md line 486: "Code Generation" summary still showed
  "wasm/ (12,998 lines across 18 modules)"; bumped to 16,521 to
  match the wasm/ table at line 94.

Test coverage gap filled:
- tests/test_browser.py test_string_lines: extended source program
  to cover the empty-input path (string_lines("") → length 0) so
  the alloc(0) branch in _translate_structural_split is exercised
  under the browser runtime.  Updated expected stdout accordingly.

Findings dismissed:
- spec/09 fence convention (duplicate): same suggestion CR made in
  the previous two rounds — chapter-9 convention is `vera`-tagged
  + FRAGMENT allowlist (matches string_index_of/replace/join/repeat
  at lines 1108/1190/1221/1250).  My five new blocks already
  follow the convention.

Memory: also saved a [Compiler review heuristics](feedback_review_heuristics.md)
file capturing the three mechanical CR-derived rules (byte-literal
collision = candidate for delegation; gc_shadow_push before any
$alloc; inline-WAT-no-host-imports invariant) so future PRs apply
them at write-time rather than discovering them at review-time.

Doc-count rolling counts: 3,500 tests / 949 codegen tests / 12,389
codegen lines / 1,595 browser lines.

Co-Authored-By: Claude <noreply@anthropic.invalid>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/test_browser.py`:
- Around line 1005-1029: Add a browser regression asserting that string_strip
handles VT/FF like the trim functions: add a case in tests/test_browser.py
(e.g., in or alongside test_string_trim or new test_string_strip) that
compiles/runs a snippet calling string_strip("\u{0B}\u{0C}hi \u{0B}") and
asserts the output equals "hi"; also ensure the runtime translation path
(_translate_strip) delegates to the canonical ASCII whitespace helper so VT/FF
are treated the same as string_trim_start/string_trim_end (reference symbols:
string_strip, _translate_strip, string_trim_start, string_trim_end).

In `@vera/wasm/calls_strings.py`:
- Around line 2896-2906: Extract the ASCII-whitespace emitter currently defined
as the local function _is_ws_inline() inside _translate_trim into a single
shared helper (e.g., make_is_ws_inline or emit_is_ascii_whitespace) that returns
the canonical instruction sequence checking byte == 32 OR (byte-9)<5 (the
branchless 6-byte set for tab/LF/VT/FF/CR/space); then replace the inline
_is_ws_inline definition in _translate_trim with a call to that shared helper
and update other translators (is_whitespace(), string_words(), and any
string_strip-related translators) to delegate to the same helper instead of
re-encoding the 32 / 9..13 predicate so all passes reuse the single canonical
emitter.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: dde721a3-1584-4f23-b9d6-a480f95924a2

📥 Commits

Reviewing files that changed from the base of the PR and between 98f3a74 and fb4d0b6.

📒 Files selected for processing (4)
  • TESTING.md
  • tests/test_browser.py
  • vera/README.md
  • vera/wasm/calls_strings.py

Comment thread tests/test_browser.py
Comment thread vera/wasm/calls_strings.py
Both findings real, both fixed.

Predicate hoist (#2):
- Extracted the canonical ASCII whitespace predicate into a
  module-level helper emit_is_ascii_whitespace(byte_local, indent)
  in vera/wasm/helpers.py.  Single-source-of-truth for the
  branchless {space(32) | (byte - 9) < 5} encoding of Python's
  str.isspace() ASCII set {tab(9), LF(10), VT(11), FF(12), CR(13),
  space(32)}.  Replaced four open-coded copies with calls to the
  helper:
    1. _translate_is_whitespace (~10 LOC → 1 line)
    2. _is_ws_inline closure inside _translate_trim
    3. _translate_structural_split count pass (string_words)
    4. _translate_structural_split emit pass (string_words)
- Helper docstring documents the historical context (PR #510 round 2
  caught _translate_strip's divergent narrow predicate; round 4
  hoisted the helper) and warns against re-encoding the byte set.
- Direct application of the byte-literal-collision review rule (memory
  file feedback_review_heuristics.md): two functions sharing a
  predicate routed through one helper from day 1 prevents future
  divergence.

Browser regression (#1):
- New tests/test_browser.py::TestBrowserStringUtilities::
  test_string_strip_vt_ff: compiles and runs a program that calls
  string_strip("\\u{0B}\\u{0C}hi \\u{0B}") and asserts the output
  is "hi".  Pins the strip→trim delegation contract under the
  browser runtime — if a future refactor accidentally re-opens the
  old narrow predicate for strip, this assertion breaks.

All 3,487 tests pass; mypy clean; doc count check + site asset
check + spec/SKILL/conformance/example checks all green.

Co-Authored-By: Claude <noreply@anthropic.invalid>
@aallan aallan merged commit e5d97e2 into main Apr 23, 2026
20 checks passed
@aallan aallan deleted the string-and-char-builtins branch April 23, 2026 11:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add character classification built-ins (is_digit, is_alpha, etc.) Add string utility built-ins (reverse, pad, chars, trim variants)

1 participant