Skip to content

Implement eight string/conversion built-in operations#173

Merged
aallan merged 3 commits into
aallan:mainfrom
rlseaman:feature/string-operations
Mar 2, 2026
Merged

Implement eight string/conversion built-in operations#173
aallan merged 3 commits into
aallan:mainfrom
rlseaman:feature/string-operations

Conversation

@rlseaman

@rlseaman rlseaman commented Mar 2, 2026

Copy link
Copy Markdown
Contributor

Summary

Implements all eight string/conversion built-in operations needed for string-processing Vera programs. These are the first dynamic string operations in the compiler — prior to this, only string constants (literals) were supported.

Operations added

Function Signature Description
string_length String -> Int Returns byte length of string
string_concat String, String -> String Concatenates two strings
string_slice String, Int, Int -> String Extracts substring [start, end)
char_code String, Int -> Nat Returns byte value at index
parse_nat String -> Nat Parses decimal string to natural number
parse_float64 String -> Float64 Parses decimal string to float (sign, decimal point)
to_string Int -> String Converts integer to decimal string
strip String -> String Trims leading/trailing whitespace

Uses the existing bump allocator ($alloc) for heap allocation. No GC dependency.

Notable design choices

  • strip returns a zero-copy view into the original string (no allocation)
  • parse_float64 handles optional sign, integer part, and fractional part
  • to_string handles negative numbers; uses a 20-byte reverse-digit buffer
  • parse_nat skips leading spaces for compatibility with fixed-width format parsing

Files changed

File Change
vera/environment.py Register eight built-in functions with correct type signatures
vera/wasm/calls.py WASM translation for all eight operations (~820 lines)
vera/wasm/inference.py Return type inference for both WASM and Vera type systems
vera/codegen/modules.py Add to known-functions whitelist
tests/test_checker.py 10 new type-checker tests
tests/test_codegen.py 42 new codegen tests
examples/string_ops.vera Example demonstrating all eight operations

Related Issues

Partial progress on #52 and #134.

Type of Change

  • Specification change
  • Compiler implementation
  • Bug fix
  • Tests
  • Documentation

Checklist

  • I have read CONTRIBUTING.md
  • My changes follow the project's coding standards
  • I have added/updated tests as appropriate
  • I have updated relevant documentation
  • All tests pass locally

Local validation

  • 1267 tests pass (1215 existing + 52 new)
  • All 15 examples pass vera check and vera verify
  • mypy clean on vera/
  • All pre-commit hooks pass
  • Composition works: parse_nat(to_string(123)) round-trips correctly
  • strip(string_slice(...)) chains correctly for fixed-width field extraction

🤖 Generated with Claude Code

Add runtime support for the three string operations specified in
Chapter 4, Section 4.13 of the language spec. These are the first
dynamic string operations in the compiler — prior to this, only
string constants were supported.

Implementation:
- Register string_length, string_concat, string_slice as built-in
  functions in the type checker environment
- Add WASM codegen for all three operations using the existing bump
  allocator ($alloc) and byte-copy loops for string_concat/string_slice
- Add return type inference for string operations in both WASM and
  Vera type inference paths
- Add string built-ins to the known-functions whitelist in the
  cross-module call scanner

Tests: 18 new tests (5 checker + 13 codegen), all passing.
Example: examples/string_ops.vera demonstrates all three operations.

Partial progress on aallan#52 and aallan#134.

Co-Authored-By: Claude <noreply@anthropic.invalid>
@rlseaman rlseaman requested a review from aallan as a code owner March 2, 2026 04:58
…uilt-ins

Add five more string/conversion built-in functions to complete the set
needed for string-processing Vera programs:

- char_code(String, Int -> Nat): returns byte value at given index
- parse_nat(String -> Nat): parses decimal string to natural number,
  skipping leading spaces
- parse_float64(String -> Float64): parses decimal string with optional
  sign and decimal point to 64-bit float
- to_string(Int -> String): converts integer to decimal string
  representation (handles negatives)
- strip(String -> String): trims leading/trailing ASCII whitespace
  (space, tab, CR, LF); returns a view without allocation

Implementation details:
- parse_float64 handles sign, integer part, and fractional part using
  f64 arithmetic in WASM
- to_string uses a 20-byte temp buffer with reverse digit extraction
- strip returns a pointer into the original string (zero-copy)

Tests: 34 new tests (5 checker + 29 codegen), all passing.
Updated string_ops.vera example to demonstrate all eight string built-ins.

Further progress on aallan#52 and aallan#134.

Co-Authored-By: Claude <noreply@anthropic.invalid>
@rlseaman rlseaman changed the title Implement string_length, string_concat, string_slice built-ins Implement eight string/conversion built-in operations Mar 2, 2026
@rlseaman

rlseaman commented Mar 2, 2026

Copy link
Copy Markdown
Contributor Author

Hi Alasdair! Hot-wired indeed!

aallan
aallan previously approved these changes Mar 2, 2026

@aallan aallan left a comment

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much for this! The implementation follows all the project's patterns perfectly, the test coverage is thorough, and the WASM codegen is clean and well-commented. The strip zero-copy approach is particularly clever.

I'm going to push a small commit to align two type signatures with the spec:

string_length will returns Nat (was Int) — spec §4.13 specifies non-negative return
string_slice will takes Nat indices (was Int) — spec §4.13 specifies non-negative positions

One other spec discrepancy: parse_nat should return Result<Nat, String> per spec §9 rather than bare Nat. That's a bigger change — it requires new codegen infrastructure for built-in functions returning ADTs, plus digit validation and error paths — so I'll open an issue to track it separately rather than hold up this PR.

I'll handle the version bump, CHANGELOG, docs updates, and spec updates in a follow-up PR after merging this. Really appreciate the contribution!

- string_length: return type INT → NAT (spec §4.13 says non-negative)
- string_slice: param types INT, INT → NAT, NAT (spec §4.13 says
  non-negative positions)
- Update test_string_slice_ok to use @nat params

Co-Authored-By: Claude <noreply@anthropic.invalid>
@aallan aallan merged commit 2d29324 into aallan:main Mar 2, 2026
10 checks passed
aallan added a commit that referenced this pull request Mar 2, 2026
Version bump and documentation updates following PR #173
(eight string/conversion built-in operations by @rlseaman).

- Version 0.0.49 → 0.0.50
- CHANGELOG: document all 8 operations, note parse_nat limitation (#174)
- spec/04-expressions.md §4.13: list all 8 operations with signatures,
  remove "Not yet implemented" banner, add parse_nat Result note
- SKILLS.md: new "Built-in Functions" section with string operations
- vera/README.md: add 8 string functions to built-ins table
- README.md: update test/example counts, strike #134, add #174 to roadmap
- TESTING.md: update test counts (1,267 tests, 15 examples)
- CLAUDE.md: update example count (14 → 15)
- CONTRIBUTING.md: add spec alignment guidance for built-in functions
- Fix README allowlist line numbers (scripts + test)

Co-Authored-By: Claude <noreply@anthropic.invalid>
aallan added a commit that referenced this pull request Mar 4, 2026
All 8 string built-in operations (string_concat, to_string, string_slice,
strip, parse_nat, parse_float64, char_code, string_length) were implemented
in WASM codegen across v0.0.50 (PR #173) and v0.0.60 (#174). This updates
limitation tables in spec/11, spec/12, README, and vera/README to reflect
that #52 is complete. Also fixes stale #53 and #110 rows in spec/12-runtime
to match spec/11-compilation. GC for string memory remains in #51.

Co-Authored-By: Claude <noreply@anthropic.invalid>
aallan added a commit that referenced this pull request Mar 4, 2026
All 8 string built-in operations (string_concat, to_string, string_slice,
strip, parse_nat, parse_float64, char_code, string_length) were implemented
in WASM codegen across v0.0.50 (PR #173) and v0.0.60 (#174). This updates
limitation tables in spec/11, spec/12, README, and vera/README to reflect
that #52 is complete. Also fixes stale #53 and #110 rows in spec/12-runtime
to match spec/11-compilation. GC for string memory remains in #51.

Co-Authored-By: Claude <noreply@anthropic.invalid>
aallan added a commit that referenced this pull request Mar 4, 2026
All 8 string built-in operations (string_concat, to_string, string_slice,
strip, parse_nat, parse_float64, char_code, string_length) were implemented
in WASM codegen across v0.0.50 (PR #173) and v0.0.60 (#174). This updates
limitation tables in spec/11, spec/12, README, and vera/README to reflect
that #52 is complete. Also fixes stale #53 and #110 rows in spec/12-runtime
to match spec/11-compilation. GC for string memory remains in #51.

Co-Authored-By: Claude <noreply@anthropic.invalid>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants