Release v0.0.118: string utilities + character classification (#470, #471)#510
Conversation
Sixteen new built-in functions implemented as inline WAT - no host imports, bit-identical Python/browser output. Closes #470 and #471. String utilities (eight, #470): - string_chars / string_lines / string_words - bridge from String to Array<String>; lines follows Python splitlines, words follows Python split with no args - string_reverse / string_trim_start / string_trim_end - string_pad_start / string_pad_end - JS padStart/padEnd cycling semantics: fill cycles left-to-right, truncated to padding length Character classification (eight, #471): - is_digit, is_alpha, is_alphanumeric, is_whitespace, is_upper, is_lower - first-byte ASCII predicates, false for empty string - char_to_upper, char_to_lower - first byte only Tests added: 33 unit tests (TestCharClassification, TestStringUtilities), 9 browser parity tests, conformance ch09_string_char_utilities.vera with 14 test functions covering classifier, transform, pad, split, and empty-string edge cases. Example added: examples/string_utilities.vera exercises all sixteen built-ins in a coherent log-line-processor scenario. Phase 2 (Unicode-aware variants - string_codepoints, string_graphemes, whole-string string_to_upper/lower, etc.) tracked separately in #509. Co-Authored-By: Claude <noreply@anthropic.invalid>
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #510 +/- ##
==========================================
+ Coverage 90.27% 90.59% +0.31%
==========================================
Files 58 58
Lines 21087 21958 +871
Branches 236 237 +1
==========================================
+ Hits 19036 19892 +856
- Misses 2047 2062 +15
Partials 4 4
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds 16 ASCII-first-byte built‑ins (8 string utilities, 8 character ops), registers them in the type environment, extends WASM/Vera inference and lowering, expands cross‑module allowlists, adds unit/browser/conformance tests, updates specs/docs, and bumps package version to v0.0.118. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Dev as Developer/PR
participant Env as TypeEnv\n(vera/environment.py)
participant Infer as Inference\n(vera/wasm/inference.py)
participant Lower as Wasm Lowering\n(vera/wasm/calls_strings.py)
participant WasmMod as Wasm Module
participant Runtime as Runtime\n(Node.js / wasmtime)
participant Tests as Test Harness
Dev->>Env: register new builtins (signatures)
Env->>Infer: expose builtin types to inference
Infer->>Lower: infer call return shapes (i32, i32_pair)
Lower->>WasmMod: emit WASM (alloc/copy/loops/splits)
WasmMod->>Runtime: execute in Node/wasmtime
Runtime->>Tests: produce stdout/results
Tests->>Dev: assert parity & conformance
Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes Possibly related issues
Possibly related PRs
Suggested labels
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
The five signature-only code blocks in the new section 9.6.22 (string utilities + character classification, #470 + #471) are bare function signatures with no body. They follow the same pattern already allowlisted for string_index_of, string_replace, string_join, and string_repeat at lines 1108, 1190, 1221, 1250 of the same file. CI lint job's check_spec_examples.py was failing on these five new blocks. Locally I'd seen the script exit OK earlier in the session because I read only the tail of the output and missed the PARSE FAILED line at the top. Co-Authored-By: Claude <noreply@anthropic.invalid>
There was a problem hiding this comment.
Actionable comments posted: 10
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@HISTORY.md`:
- Line 255: The release row for v0.0.118 contains unescaped pipe characters
("||") that break the Markdown table (MD056) and the tagged-release total in the
footer wasn't incremented; fix by editing the table cell that mentions string
utilities/character classification (the row starting with "v0.0.118") to remove
or escape the literal "||" (e.g., replace with "or", "||", or wrap the
phrase in a code span) so pipes no longer act as column separators, and update
the tagged-release total in the footer from 113 to 114 to reflect the new
release.
In `@scripts/check_skill_examples.py`:
- Around line 154-155: The allowlist entry for the FRAGMENT case in
scripts/check_skill_examples.py is pointing at the wrong line so the code fence
isn't skipped; update the allowlist mapping for ("FRAGMENT", "String utility /
classifier signatures, bare function calls") to use the actual line number of
the opening "```vera" fence (i.e., change the key from 791 to the line where the
opening ```vera appears) so check_skill_examples.py will skip this fragment
correctly.
In `@spec/09-standard-library.md`:
- Line 1774: The sentence incorrectly claims whole-string versions of
string_upper and string_lower are future work; update the sentence so it does
not assert that whole-string string_upper/string_lower are pending—either remove
the sentence or rephrase it to say only "Unicode-aware variants of string_upper
and string_lower" are future work, referencing the already-specified
whole-string functions in §9.6.14 (symbols: string_upper, string_lower) to make
the distinction clear.
In `@tests/test_browser.py`:
- Around line 1116-1137: The test_classifiers test currently doesn't cover both
true/false paths for every predicate; update the source payload used in
test_classifiers so each classifier (is_digit, is_alpha, is_alphanumeric,
is_whitespace, is_upper, is_lower) is called twice with one value that should
return true and one that should return false (e.g., change the is_alphanumeric
call to use an alphanumeric and a non-alphanumeric input, add failing cases for
is_whitespace and failing cases for is_upper/is_lower), add explicit \v and \f
inputs to fully exercise ASCII-whitespace, and then update the expected stdout
string in the assertion to match the new sequence of bool_to_string outputs;
reference the existing symbols is_alphanumeric, is_whitespace, is_upper,
is_lower, is_digit, is_alpha, bool_to_string and the test function name
test_classifiers when making changes.
In `@tests/test_codegen.py`:
- Around line 11784-11793: The test test_is_whitespace is claiming CR (\"\\r\")
is considered whitespace but the input matrix for is_whitespace omits that case;
update the matrix in tests/test_codegen.py inside test_is_whitespace to include
the CR example (e.g., add ("\\r", 1)) so the function is_whitespace is exercised
for carriage return and the assertion verifies it returns 1 just like space,
tab, and LF.
- Around line 11997-12040: Add missing edge-case tests for string_lines: create
tests in tests/test_codegen.py alongside test_string_lines_* that cover (1)
trailing "\r" input (e.g. "a\r\nb\r" or just "a\r") and trailing "\r\n"
specifically, and (2) interior blank-line preservation (e.g. "a\n\nb" and
"a\r\rb" variants) to assert array_length/string_join behavior using the
existing helpers (_run and _run_io) and the same structure as
test_string_lines_crlf/test_string_lines_trailing_newline so splitlines
semantics are fully covered for string_lines.
- Around line 11740-12086: The inline-WAT built-ins tests
(TestCharClassification and TestStringUtilities covering symbols like
char_to_upper, char_to_lower, is_digit/is_alpha/is_whitespace, string_reverse,
string_trim_start/trim_end, string_pad_*, string_chars, string_lines,
string_words, etc.) must assert the "no host imports" invariant up front; update
the test harness so every test that emits result.wat for these inline built-ins
validates the import table contains no host imports (e.g. add a call to the
existing import-gating helper or a new assert_no_host_imports(result.wat) before
running the WAT), so regressions that introduce host imports are caught early.
In `@vera/README.md`:
- Line 107: Recompute and update the wasm/ aggregate line count in
vera/README.md so the package total reflects the new sum: replace the stale
total that assumed calls_strings.py was smaller with the corrected subtotal
(update the wasm/ submodule rows so their sum is 16,098) and then update the
package total entry in the table to match the recomputed sum; specifically
adjust the row referencing `calls_strings.py` and the package total cell so the
table sums consistently.
In `@vera/wasm/calls_strings.py`:
- Around line 3696-3704: The string splitting code only checks ASCII space (32),
tab (9), LF (10) and CR (13); add checks for VT (11) and FF (12) so inputs like
"a\vb" and "a\fb" are treated as whitespace. Update the comparisons built into
string_words (the block that appends local.get ... i32.const 32/9/10/13 ...
i32.eq / i32.or) to also compare against 11 and 12, and make the identical
change in the is_whitespace routine and the trim logic near the other occurrence
(the same sequence around lines ~3902-3910) so all three places use the full set
{9,10,11,12,13,32}.
- Around line 3446-3454: The code currently stores interior pointers (data +
offset) into the resulting Array<String> by writing `local.get {data}` +
`local.get {idx}` + offset math into the slice pointer (variables outer, idx,
data), which can produce dangling interior roots; change this to allocate a
fresh buffer for each slice (like _translate_split() does), copy the slice bytes
into that new allocation, and store the new allocation's pointer into the
Array<String> instead of `data + offset`; apply the same change to the other
similar blocks referenced (around lines with the same pattern).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 4f40cbe5-7f08-40d6-ac1c-127ece676561
⛔ Files ignored due to path filters (8)
docs/SKILL.mdis excluded by!docs/**docs/index.htmlis excluded by!docs/**docs/index.mdis excluded by!docs/**docs/llms-full.txtis excluded by!docs/**docs/llms.txtis excluded by!docs/**examples/string_utilities.verais excluded by!**/*.veratests/conformance/ch09_string_char_utilities.verais excluded by!**/*.verauv.lockis excluded by!**/*.lock,!uv.lock
📒 Files selected for processing (25)
AGENTS.mdCHANGELOG.mdCLAUDE.mdFAQ.mdHISTORY.mdREADME.mdROADMAP.mdSKILL.mdTESTING.mdexamples/README.mdpyproject.tomlscripts/check_skill_examples.pyscripts/check_spec_examples.pyspec/09-standard-library.mdtests/conformance/manifest.jsontests/test_browser.pytests/test_codegen.pytests/test_verifier.pyvera/README.mdvera/__init__.pyvera/codegen/modules.pyvera/environment.pyvera/wasm/calls.pyvera/wasm/calls_strings.pyvera/wasm/inference.py
CR identified ten findings; verified each against the code, fixed the
five that were real issues plus one nice-to-have (no-host-imports
invariant test).
Real bugs fixed:
- HISTORY.md: literal "||" in v0.0.118 row broke the markdown table;
changed to "or" in the prose example. Stale "113 tagged releases"
total updated to 118.
- spec/09-standard-library.md: the closing sentence of section 9.6.22
incorrectly claimed whole-string string_upper/string_lower were
future work; they already exist in section 9.6.14. Reworded to
reference 9.6.14 and clarify only Unicode-aware variants are
pending.
- vera/wasm/calls_strings.py: string_chars / string_lines /
string_words stored interior pointers (data + offset) into the
result Array<String>. The GC mark phase's alignment check
((val - gc_heap_start) % 8 == 4 in vera/codegen/assembly.py)
rejects interior pointers, so the shared backing buffer would
not stay marked across a collection triggered after the function
returned, leaving the array elements pointing at freed memory.
Switched to per-slice $alloc + copy + store pattern (matches the
existing _translate_split implementation). Added a Python
closure helper _emit_slice() shared by all five emit sites in
_translate_structural_split.
- vera/wasm/calls_strings.py: extended is_whitespace, _translate_trim,
and _translate_structural_split to recognize the full Python
str.isspace() ASCII set: tab(9), LF(10), VT(11), FF(12), CR(13),
space(32). Previously omitted VT and FF. 9..=13 collapses to a
single (byte - 9) < 5 unsigned range check for branchless emit.
Spec, SKILL, CHANGELOG updated to document the full set.
- vera/README.md: stale wasm/ aggregate (12,998) bumped to live
count 16,608.
Tests improved:
- tests/test_codegen.py test_is_whitespace: extended cases to cover
CR, VT, FF. Used \\u{0B} / \\u{0C} unicode escapes since Vera's
lexer doesn't recognise \\v or \\f as simple escapes.
- tests/test_codegen.py: added five string_lines edge-case tests
(trailing \\r, trailing \\r\\n, interior blank LF, interior blank
CR) to cover Python splitlines semantics more thoroughly.
- tests/test_codegen.py: new
test_no_host_imports_for_inline_builtins compiles a program using
every #470/#471 built-in and asserts the WAT contains no
(import "vera" "<name>") line for any of the 16 names. Catches
a regression where someone routes one through a host import,
contradicting the design contract.
- tests/test_browser.py test_classifiers: extended source program to
exercise both true and false paths for every classifier, plus the
full whitespace set including \\u{0B} / \\u{0C} and the
empty-string rejection shared by all six.
CR findings dismissed after verification:
- check_skill_examples.py allowlist key 791: the key IS the actual
line of the opening ```vera fence; the script passes locally.
- HISTORY footer: CR proposed 113 -> 114 but the actual count is
118 (matching ROADMAP); used the live number rather than CR's
+1.
Spec example allowlist: also added entries for the 5 signature-only
code blocks I added in section 9.6.22 (one per subsection). The CI
lint job's check_spec_examples.py was failing on those 5 blocks
because the existing FRAGMENT category doesn't auto-detect
no-body signatures.
All 3480 tests pass; doc count check, spec/SKILL/site asset checks
all green; mypy clean.
Co-Authored-By: Claude <noreply@anthropic.invalid>
There was a problem hiding this comment.
Actionable comments posted: 10
♻️ Duplicate comments (1)
vera/README.md (1)
94-107:⚠️ Potential issue | 🟡 MinorRecompute the
wasm/totals from the current sources.
vera/wasm/calls_strings.pyin this PR runs through Line 4018, so the3,977figure at Line 107 is already stale, and the16,608subtotal at Line 94 no longer matches the submodule rows either. Please refresh both numbers from the current file sizes; the later Code Generation summary still needs the same update as a downstream fix.Suggested doc fix
-| `wasm/` | 16,608 | Compile | WASM translation layer (package) | `WasmContext`, `WasmSlotEnv`, `StringPool` | +| `wasm/` | 16,137 | Compile | WASM translation layer (package) | `WasmContext`, `WasmSlotEnv`, `StringPool` | @@ -| ` ├ calls_strings.py` | 3,977 | | All string ops (length, concat, slice, search, transform, split, join, chars/lines/words, reverse, trim_start/end, pad_start/end, char_to_upper/lower, classifiers) + to-string conversions | | +| ` ├ calls_strings.py` | 4,018 | | All string ops (length, concat, slice, search, transform, split, join, chars/lines/words, reverse, trim_start/end, pad_start/end, char_to_upper/lower, classifiers) + to-string conversions | |As per coding guidelines, "Review Markdown files for factual accuracy against the codebase, broken links, and outdated information."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@vera/README.md` around lines 94 - 107, The README's wasm/ subtotal and the calls_strings.py line count are out of date; update the two figures (the wasm/ subtotal `16,608` and the calls_strings.py entry `3,977`) to reflect the current file sizes (e.g., recalc lines for vera/wasm/*.py and replace the stale numbers) and also update the downstream Code Generation summary that references the same totals; locate the entries by searching for the `wasm/` table, the `calls_strings.py` row, and the Code Generation summary in vera/README.md and replace the numeric totals with the recomputed values.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@CHANGELOG.md`:
- Line 14: The changelog incorrectly states that
string_chars/string_lines/string_words return slices that share a single backing
buffer; update the entry to reflect the GC-safety fix: mention that each split
now performs a per-slice allocation+copy (fixing the interior-pointer issue) so
slices do not rely on a single shared buffer, and adjust the sentence around
"The three split operations return Array<String> whose elements share a single
backing data buffer" to describe the new per-slice allocation behavior and
reference the interior-pointer/GC-safety fix.
In `@HISTORY.md`:
- Line 255: Update the v0.0.118 release note for string_chars to reflect the GC
fix: change the clause that says “each 1-byte element shares a single backing
data buffer” to state that the implementation now allocates and copies each
1-byte slice separately (no shared interior pointers) to avoid the
interior-pointer GC bug; keep the rest of the description (behavior and purpose
of string_chars) intact and reference the shipped fix for correctness.
In `@spec/09-standard-library.md`:
- Line 1707: The sentence claiming that the three functions return a fresh
`@Array<String>` whose elements "share a single backing data buffer" is
outdated; update the wording to reflect that `string_chars`, `string_lines`, and
`string_words` now allocate and copy each slice individually (per-slice
allocation + copy) so elements do not share a single backing buffer, and ensure
the description of `@Array<String>` behavior matches the current implementation.
- Around line 1682-1691: The code fences around the signature-only spec
fragments for functions like string_chars, string_lines, and string_words are
incorrectly tagged as `vera`; change those fenced blocks back to untagged
(remove the `vera` language tag) so they remain signature-only and non-runnable,
and do the same for the other similar signature fragments noted in this chapter
(the additional signature blocks referenced in the comment). Locate the blocks
by the function signatures (e.g., "public fn string_chars(`@String` ->
`@Array`<String>)", "public fn string_lines(...)", "public fn string_words(...)")
and remove the language tag from their triple-backtick fences to restore the
spec convention.
In `@tests/test_browser.py`:
- Around line 1021-1043: The test_string_pad test doesn't exercise the
empty-fill no-op branch for string_pad_start/string_pad_end; add cases in
test_string_pad that call string_pad_start("x", N, "") and string_pad_end("x",
N, "") (and a longer input like string_pad_start("hello", 10, "")) to verify
that when fill == "" the original string is returned unchanged, and update the
expected node["stdout"] string to include the outputs of these no-op calls;
locate the assertions and IO.print calls in test_string_pad and append the new
IO.print invocations and corresponding expected values for stdout accordingly.
- Around line 1005-1103: Update the browser tests to include vertical tab (VT,
\x0B) and form feed (FF, \x0C) as whitespace edge-cases: modify test_string_trim
(string_trim_start / string_trim_end) to call string_trim_start(" \x0B\x0Chi ")
and string_trim_end(" hi\x0B\x0C ") (or similar placements) and assert the
expected trimmed results, and modify test_string_words to include strings
containing VT/FF (e.g. " \x0Bfoo\x0Cbar ") and assert they are treated as
whitespace producing "foo|bar" and zero-length-only inputs with VT/FF yield 0;
refer to test function names test_string_trim and test_string_words to locate
and update the test sources and their corresponding stdout assertions.
In `@tests/test_codegen.py`:
- Around line 11922-12049: Add mirrored unit tests for char_to_lower and
string_pad_end to cover the missing edge cases: create a
test_char_to_lower_empty that asserts char_to_lower("") returns "" (mirroring
test_char_to_upper_empty), and add tests for string_pad_end that mirror
string_pad_start's cases — e.g., test_string_pad_end with multi-char fill
cycling, test_string_pad_no_change_when_longer for string_pad_end (input length
>= target returns input), and test_string_pad_empty_fill for string_pad_end
(empty fill returns original). Locate and follow the style of existing tests
around test_char_to_lower_first_only and string_pad_start when adding the new
test functions.
- Around line 11968-12222: Add VT (vertical tab \x0b) and FF (form feed \x0c)
whitespace cases to the existing trim and words tests: for
string_trim_start/string_trim_end/string_trim_all_whitespace add inputs like "
\x0b\x0chi\x0b " and assert _run_io/_run results match the same behavior as
ASCII whitespace (trim_start → "hi\x0b ", trim_end → " \x0bhi", all-whitespace →
length 0), and for string_words add cases such as "foo\x0b\x0cbar " and "
\x0b\x0c " to assert splitting/empty-word behavior (use test_string_words_runs,
test_string_words_only_whitespace, and test_string_words_content_via_join
patterns); ensure you reuse the same helper calls (_run/_run_io) and test names
(e.g., test_string_trim_start, test_string_trim_end,
test_string_trim_all_whitespace, test_string_words_*) so the VT/FF coverage
exercises the shared is_whitespace logic.
In `@vera/wasm/calls_strings.py`:
- Around line 3355-3370: The block comment describing the structural-split
algorithm is out of date: it claims slices share a single "data" buffer, but the
implementation of _emit_slice() now allocates and copies each segment to avoid
the GC bug; update the commentary for string_chars, string_lines, string_words
to state that each slice is separately allocated/copied (no shared data buffer),
remove the old shared-buffer allocation description and any claims about O(slen
+ count*slen) vs O(slen + count), and ensure the note references the rationale
(GC avoidance via per-slice allocation) so future maintainers won't reintroduce
the old shared-data approach.
- Around line 2993-3038: The strip implementation diverges from the new
whitespace predicate used by _translate_trim, causing inconsistent behavior
between string_strip and trimming via _translate_trim; fix this by routing
string_strip's implementation to reuse the shared helper: call
_translate_trim(...) with trim_start=True and trim_end=True (or factor out the
_is_ws_inline predicate into a shared method and use it from both
_translate_trim and _translate_strip) so both string_strip and the trim
functions use the exact same whitespace test and code path.
---
Duplicate comments:
In `@vera/README.md`:
- Around line 94-107: The README's wasm/ subtotal and the calls_strings.py line
count are out of date; update the two figures (the wasm/ subtotal `16,608` and
the calls_strings.py entry `3,977`) to reflect the current file sizes (e.g.,
recalc lines for vera/wasm/*.py and replace the stale numbers) and also update
the downstream Code Generation summary that references the same totals; locate
the entries by searching for the `wasm/` table, the `calls_strings.py` row, and
the Code Generation summary in vera/README.md and replace the numeric totals
with the recomputed values.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 52db25b2-eb50-41b3-a238-8fbd4df6f139
⛔ Files ignored due to path filters (2)
docs/SKILL.mdis excluded by!docs/**docs/llms-full.txtis excluded by!docs/**
📒 Files selected for processing (10)
CHANGELOG.mdHISTORY.mdROADMAP.mdSKILL.mdTESTING.mdspec/09-standard-library.mdtests/test_browser.pytests/test_codegen.pyvera/README.mdvera/wasm/calls_strings.py
Verified each finding against the current code; fixed the seven that were real, dismissed two as already-correct, treated one as a duplicate. Stale-claim fixes (the per-slice-allocation refactor in commit 4100790 made the "shared backing data buffer" wording incorrect in three docs): - CHANGELOG.md line 14: rewrote the Array<String> sentence to describe the per-slice alloc+copy and reference the GC mark-phase alignment check. - HISTORY.md line 255: updated the string_chars description to say each 1-byte element is independently $alloc-ed. - spec/09-standard-library.md line 1707: rewrote the section 9.6.22 description with the same clarification + a parenthetical pointing to _emit_gc_collect. - vera/wasm/calls_strings.py block comment around #470 splits: rewrote the algorithm summary to state per-slice allocation is required for GC correctness, removed the stale O(slen) claim, and pointed at the alignment check in vera/codegen/assembly.py. - tests/test_codegen.py TestStringUtilities docstring: same. Test coverage gaps filled: - tests/test_codegen.py: added test_char_to_lower_empty mirror of test_char_to_upper_empty; added test_string_pad_end_no_change_ when_longer + test_string_pad_end_empty_fill mirrors of the pad_start versions; added test_string_trim_vt_ff_full_set to exercise both trim ends with a string mixing space/\\t/\\n/VT/FF/CR; added test_string_words_vt_ff_separators and test_string_words_only_vt_ff exercising VT/FF as word separators per Python str.split() semantics. - tests/test_browser.py test_string_pad: extended source program to cover the empty-fill no-op for both pad_start and pad_end (CR was right that this branch was missing). - tests/test_browser.py test_string_trim: extended to mix VT/FF with regular whitespace at both ends. - tests/test_browser.py test_string_words: extended to use VT/FF as separators and to cover an all-VT/FF input yielding zero words. - VT (0x0B) and FF (0x0C) are spelled with \\u{0B} / \\u{0C} since Vera's lexer doesn't recognise \\v / \\f as simple escapes. Real bug fixed: - vera/wasm/calls_strings.py _translate_strip: was open-coding a narrower whitespace predicate {tab(9), LF(10), CR(13), space(32)} that diverged from the new _translate_trim Python isspace() set {tab, LF, VT(11), FF(12), CR, space}. Refactored _translate_strip to delegate to _translate_trim with both flags set — eliminates ~160 lines of duplicate code and guarantees the predicate stays in sync. The existing test_string_strip case uses ASCII spaces only so the narrower-set behaviour was masked. Findings dismissed after verification: - spec/09-standard-library.md 1682-1691: CR suggested untagging the ```vera fences for signature-only blocks. Existing chapter-9 convention is `vera`-tagged + FRAGMENT allowlist (matches string_index_of, string_replace, string_join, string_repeat at lines 1108/1190/1221/1250). My five new blocks already follow the convention and are allowlisted in scripts/check_spec_examples.py. - vera/README.md 94-107: duplicate of the previous round's comment 3129576089 — the wasm/ aggregate was already updated. Refreshed the calls_strings.py figure and the package total to reflect the post-strip-refactor sizes (now 3,890 lines and 16,521 total). Doc-count rolling counts: now 3,500 tests / 949 codegen tests / 12,389 codegen lines / 1,585 browser lines. Co-Authored-By: Claude <noreply@anthropic.invalid>
|
@coderabbitai Round 2 addressed in commit 98f3a74. Verified each finding against the current code; fixed seven, dismissed two as already-correct, treated one as a duplicate of the previous round. Fixed:
Dismissed:
3,486 tests pass, mypy clean, all 14 doc validation scripts green. |
|
Tip For best results, initiate chat on the files or code changes.
3,486 tests / mypy clean / all 14 doc scripts green is a solid green state. Happy to see this land. ✏️ Learnings added
🧠 Learnings used |
|
Tip For best results, initiate chat on the files or code changes.
The The VT/FF unit-test coverage and the ✏️ Learnings added
🧠 Learnings used |
There was a problem hiding this comment.
Actionable comments posted: 4
♻️ Duplicate comments (1)
spec/09-standard-library.md (1)
1682-1691: 🧹 Nitpick | 🔵 TrivialUse untagged fences for signature-only blocks
These signature fragments should use untagged code fences rather than
```vera, consistent with the established convention for spec chapters. Signature blocks without bodies are not complete runnable programs and belong in the parser-example allowlist rather than validated as full Vera code.Based on learnings, "In
spec/09-standard-library.md… function signature code blocks are intentionally left with untagged fences … and should not be tagged asverabecause they are not complete, runnable programs."Also applies to: 1711-1720, 1734-1740, 1757-1763, 1778-1785
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@spec/09-standard-library.md` around lines 1682 - 1691, The fenced code blocks showing only function signatures (e.g., public fn string_chars, public fn string_lines, public fn string_words) are tagged as ```vera but are signature-only and not runnable; change those fenced blocks to use untagged code fences (remove the "vera" language tag) so they are treated as signature-only examples consistent with the spec convention; apply the same fix to the other signature-only blocks mentioned (the blocks around lines with the other signature fragments referenced in the comment).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@TESTING.md`:
- Line 12: Update remaining references that still say "all 30 examples" to
reflect the new count "32" in TESTING.md; search for the literal phrase "all 30
examples" (occurrences near the sections mentioning example programs and
validation) and replace each with "all 32 examples" so the document is
internally consistent with the changed header line that now reads "32, all
validated through `vera check` + `vera verify`".
In `@tests/test_browser.py`:
- Around line 1091-1107: Extend the test_string_lines test to exercise the
empty-input case for string_lines("") so the alloc(0) path in
vera/wasm/calls_strings.py::_translate_structural_split is covered: add a call
that evaluates array_length(string_lines("")) (or otherwise inspects
string_lines("") == [] via printing its length or serialized form) in the WASM
source emitted by test_string_lines and update the final stdout assertion (in
test_string_lines) to include the expected value (empty list => length 0 or
appropriate serialized output) so the test asserts the empty-input behavior.
In `@vera/README.md`:
- Line 94: The README has inconsistent line-counts for the wasm/ package (table
shows 16,521 while the "Code Generation" summary shows 12,998); open the README
and update the "Code Generation" section (search for the "wasm/" entry and the
phrase "Code Generation") so the wasm/ line count matches the canonical value
(verify the actual line count from the repo and replace 12,998 or 16,521
accordingly), and scan for any other occurrences of the wasm/ line-count
elsewhere in the file to keep all references consistent.
In `@vera/wasm/calls_strings.py`:
- Around line 2877-2953: The trimming path can allocate and collect the source
because `ptr` is still read after `call $alloc` but never rooted; before the
`call $alloc` in the trimming sequence (the block that computes `new_len` and
then does `call $alloc` / `local.set {dst}`), push the source pointer onto the
GC shadow stack (use the existing `gc_shadow_push`/`gc_shadow_pop` helpers
referencing the same `ptr` local) so the original string is rooted across the
allocation, and pop it after the copy is complete; update the trim helper that
`string_strip()` delegates to (the code around `local.get {end}`/`local.get
{start}` -> `local.set {new_len}` -> `call $alloc`) to include the push before
`call $alloc` and corresponding pop after the newly allocated data is populated.
---
Duplicate comments:
In `@spec/09-standard-library.md`:
- Around line 1682-1691: The fenced code blocks showing only function signatures
(e.g., public fn string_chars, public fn string_lines, public fn string_words)
are tagged as ```vera but are signature-only and not runnable; change those
fenced blocks to use untagged code fences (remove the "vera" language tag) so
they are treated as signature-only examples consistent with the spec convention;
apply the same fix to the other signature-only blocks mentioned (the blocks
around lines with the other signature fragments referenced in the comment).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 8c527c88-53f4-4e0a-a2fd-4104ea3062b5
📒 Files selected for processing (9)
CHANGELOG.mdHISTORY.mdROADMAP.mdTESTING.mdspec/09-standard-library.mdtests/test_browser.pytests/test_codegen.pyvera/README.mdvera/wasm/calls_strings.py
Verified each finding against the current code; fixed all four real
ones; treated the spec fence-tag suggestion as a duplicate of the
already-dismissed round-2 finding.
Real bug fixed:
- vera/wasm/calls_strings.py _translate_trim: the source `ptr` local
was being read inside the post-alloc copy loop without being
rooted on the shadow stack across the destination $alloc. WASM
locals are NOT GC roots — only the shadow stack is — so a
heap-allocated input string (e.g. the result of string_concat)
could be collected during the alloc, leaving the copy reading
from a freed buffer. Pushed `ptr` via gc_shadow_push() before
the alloc. Verified _translate_string_chars already roots its
ptr_s correctly; the pattern was missing only here. This fix
also covers _translate_strip (delegates to _translate_trim) and
the two one-sided trim functions.
Stale doc fixes:
- TESTING.md line 274: "all 30 examples" → "all 32 examples" in the
Ch 3 row of the chapter coverage table.
- vera/README.md line 486: "Code Generation" summary still showed
"wasm/ (12,998 lines across 18 modules)"; bumped to 16,521 to
match the wasm/ table at line 94.
Test coverage gap filled:
- tests/test_browser.py test_string_lines: extended source program
to cover the empty-input path (string_lines("") → length 0) so
the alloc(0) branch in _translate_structural_split is exercised
under the browser runtime. Updated expected stdout accordingly.
Findings dismissed:
- spec/09 fence convention (duplicate): same suggestion CR made in
the previous two rounds — chapter-9 convention is `vera`-tagged
+ FRAGMENT allowlist (matches string_index_of/replace/join/repeat
at lines 1108/1190/1221/1250). My five new blocks already
follow the convention.
Memory: also saved a [Compiler review heuristics](feedback_review_heuristics.md)
file capturing the three mechanical CR-derived rules (byte-literal
collision = candidate for delegation; gc_shadow_push before any
$alloc; inline-WAT-no-host-imports invariant) so future PRs apply
them at write-time rather than discovering them at review-time.
Doc-count rolling counts: 3,500 tests / 949 codegen tests / 12,389
codegen lines / 1,595 browser lines.
Co-Authored-By: Claude <noreply@anthropic.invalid>
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/test_browser.py`:
- Around line 1005-1029: Add a browser regression asserting that string_strip
handles VT/FF like the trim functions: add a case in tests/test_browser.py
(e.g., in or alongside test_string_trim or new test_string_strip) that
compiles/runs a snippet calling string_strip("\u{0B}\u{0C}hi \u{0B}") and
asserts the output equals "hi"; also ensure the runtime translation path
(_translate_strip) delegates to the canonical ASCII whitespace helper so VT/FF
are treated the same as string_trim_start/string_trim_end (reference symbols:
string_strip, _translate_strip, string_trim_start, string_trim_end).
In `@vera/wasm/calls_strings.py`:
- Around line 2896-2906: Extract the ASCII-whitespace emitter currently defined
as the local function _is_ws_inline() inside _translate_trim into a single
shared helper (e.g., make_is_ws_inline or emit_is_ascii_whitespace) that returns
the canonical instruction sequence checking byte == 32 OR (byte-9)<5 (the
branchless 6-byte set for tab/LF/VT/FF/CR/space); then replace the inline
_is_ws_inline definition in _translate_trim with a call to that shared helper
and update other translators (is_whitespace(), string_words(), and any
string_strip-related translators) to delegate to the same helper instead of
re-encoding the 32 / 9..13 predicate so all passes reuse the single canonical
emitter.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: dde721a3-1584-4f23-b9d6-a480f95924a2
📒 Files selected for processing (4)
TESTING.mdtests/test_browser.pyvera/README.mdvera/wasm/calls_strings.py
Both findings real, both fixed. Predicate hoist (#2): - Extracted the canonical ASCII whitespace predicate into a module-level helper emit_is_ascii_whitespace(byte_local, indent) in vera/wasm/helpers.py. Single-source-of-truth for the branchless {space(32) | (byte - 9) < 5} encoding of Python's str.isspace() ASCII set {tab(9), LF(10), VT(11), FF(12), CR(13), space(32)}. Replaced four open-coded copies with calls to the helper: 1. _translate_is_whitespace (~10 LOC → 1 line) 2. _is_ws_inline closure inside _translate_trim 3. _translate_structural_split count pass (string_words) 4. _translate_structural_split emit pass (string_words) - Helper docstring documents the historical context (PR #510 round 2 caught _translate_strip's divergent narrow predicate; round 4 hoisted the helper) and warns against re-encoding the byte set. - Direct application of the byte-literal-collision review rule (memory file feedback_review_heuristics.md): two functions sharing a predicate routed through one helper from day 1 prevents future divergence. Browser regression (#1): - New tests/test_browser.py::TestBrowserStringUtilities:: test_string_strip_vt_ff: compiles and runs a program that calls string_strip("\\u{0B}\\u{0C}hi \\u{0B}") and asserts the output is "hi". Pins the strip→trim delegation contract under the browser runtime — if a future refactor accidentally re-opens the old narrow predicate for strip, this assertion breaks. All 3,487 tests pass; mypy clean; doc count check + site asset check + spec/SKILL/conformance/example checks all green. Co-Authored-By: Claude <noreply@anthropic.invalid>
Summary
Sixteen new built-in functions for string utilities (#470) and character classification (#471), shipped together as a single PR. All inline WAT — no host imports, bit-identical Python (wasmtime) and browser (Node.js) output.
String utilities (8, #470):
string_chars/string_lines/string_words— bridge fromStringtoArray<String>. Lines follows Pythonsplitlines()(trailing terminator does not add empty segment); words follows Pythonsplit()with no args (whitespace runs collapse, empty segments discarded).string_reverse/string_trim_start/string_trim_endstring_pad_start/string_pad_end— JavaScriptpadStart/padEndsemantics: fill cycles left-to-right, truncated to padding length; empty fill is a no-op; target shorter than input returns input unchanged.Character classification (8, #471):
is_digit/is_alpha/is_alphanumeric/is_whitespace/is_upper/is_lower— first-byte ASCII predicates; all returnfalsefor the empty string.char_to_upper/char_to_lower— first byte only.ASCII-only by design. Phase 2 (Unicode-aware variants —
string_codepoints,string_graphemes, whole-string case conversion, etc.) tracked separately in #509.Implementation
vera/wasm/calls_strings.pygrew from 2,588 → 3,979 lines with sixteen new translators. Shared helpers factor out the predicate scaffolding (_translate_classifier), case conversion (_translate_char_case), trim variants (_translate_trim), pad variants (_translate_pad), and structural splits (_translate_structural_splitcovers bothstring_linesandstring_wordsvia a two-pass count-then-emit shape mirroring the existingstring_split).(byte - 48) < 10for digit,((byte | 0x20) - 97) < 26for case-folded alpha. Branchless and idiomatic.vera/wasm/inference.pyextended for the new return-type shapes (i32_pairfor string/array returns,i32for bool returns).vera/wasm/calls.pydispatch table extended with sixteen new branches;_infer_concat_elem_typenow returns"String"for the three split operations.vera/codegen/modules.pyknown-names allowlist extended.vera/environment.pyregisters all sixteenFunctionInfoentries.Tests
TestCharClassificationandTestStringUtilitiesintests/test_codegen.py.TestBrowserStringUtilitiesandTestBrowserCharClassificationintests/test_browser.py.tests/conformance/ch09_string_char_utilities.vera(level:run) with 14 test functions covering classifier, transform, pad, split, and empty-string edge cases.examples/string_utilities.veracontributing 29 T1 + 3 T3 + 32 contracts.Example
examples/string_utilities.veraexercises all sixteen built-ins in a coherent log-line-processor scenario — line counting, word splitting, byte iteration viastring_chars, classifier-based token validation, title-casing, trim, pad, palindrome check viastring_reverse, plus comprehensive empty-string round-trip coverage. Verified clean (29 Tier 1, 3 Tier 3).Documentation
Comprehensive docs sweep across both human- and agent-facing surfaces:
string_utilities.verarow added.scripts/build_site.py.Release prep
vera/__init__.py,pyproject.toml,docs/index.html,README.md,uv.lock.v0.0.118placed on the feature branch; will be moved to main HEAD after merge so the CHANGELOG-extraction release flow can run.Test plan
pytest tests/test_codegen.py::TestCharClassification tests/test_codegen.py::TestStringUtilities)pytest tests/test_browser.py::TestBrowserStringUtilities tests/test_browser.py::TestBrowserCharClassification)python scripts/check_conformance.py)python scripts/check_examples.py)Closes #470
Closes #471
Summary by CodeRabbit
New Features
Tests
Documentation
Chores