Skip to content

Add string utility built-ins (reverse, pad, chars, trim variants) #470

@aallan

Description

@aallan

Summary

Vera's string built-ins cover the most common operations but are missing several utilities that come up frequently in text processing and formatting. Two distinct gaps:

  1. Character-level ops (string_reverse, string_pad_*, string_chars, string_trim_*) — close the per-character recursion pattern.
  2. Structural splits (string_lines, string_words) — string_split only takes a single delimiter character, so splitting on line endings (handling \r\n) or whitespace runs requires regex today.

Proposed API

Transformation

  • string_reverse(@String) -> @String — reverse a string (Unicode-aware, by grapheme cluster)
  • string_pad_start(@String, @Nat, @String) -> @String — pad to target length with fill string (left-pad)
  • string_pad_end(@String, @Nat, @String) -> @String — pad to target length with fill string (right-pad)

Decomposition

  • string_chars(@String) -> @Array<String> — split into individual characters (grapheme clusters) as an array of single-character strings. Bridge primitive: once this exists, string operations become array operations (e.g. string_reverse = string_from_chars(array_reverse(string_chars(s))), character predicates become array_any(string_chars(s), is_digit)).

Structural splits

  • string_lines(@String) -> @Array<String> — split on line terminators (\n, \r\n, \r). Mirrors Python's str.splitlines(), Rust's str::lines(), Java's String.lines(). Current workaround requires string_split("\n") and manual \r stripping — LLMs get this wrong on Windows-style line endings.
  • string_words(@String) -> @Array<String> — split on whitespace runs (spaces, tabs, newlines), discarding empty segments. Mirrors Python's str.split() with no argument. Currently impossible with string_split (single-delimiter only); requires regex.

Trimming variants

  • string_trim_start(@String) -> @String — remove leading whitespace only
  • string_trim_end(@String) -> @String — remove trailing whitespace only

Note: string_strip already exists for trimming both sides.

Implementation

  • environment.py: Register as pure functions
  • codegen/api.py: Host imports delegating to Python string methods (.ljust(), .rjust(), .lstrip(), .rstrip(), list() for chars, .splitlines() for lines, .split() with no arg for words)
  • Browser runtime: padStart(), padEnd(), trimStart(), trimEnd(), spread operator for chars, regex-based split for lines/words
  • Verification:
    • string_reverse preserves length
    • string_pad_start / string_pad_end: result length >= input length
    • string_trim_*: result length <= input length
    • string_chars: result length = string_length of input
    • string_lines / string_words: concatenating results (with separator) recovers a prefix of the input

Priority

Medium. Less urgent than array utilities, sleep, and random. Within this issue, priority order:

  1. string_chars — bridge primitive; unlocks array-combinator approach to string processing
  2. string_lines / string_words — real capability gaps; string_split can't currently do these correctly
  3. string_pad_start / string_pad_end — common formatting need
  4. string_reverse, string_trim_start, string_trim_end — convenience, once string_chars exists several of these become one-liners

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions