Skip to content

Add character classification built-ins (is_digit, is_alpha, etc.) #471

@aallan

Description

@aallan

Summary

Vera has string_char_code and string_from_char_code for working with individual characters, but no character classification or case-conversion functions for single characters. These are fundamental for parsing, validation, and text processing — and hand-rolling the equivalents via ASCII range comparisons is a common source of off-by-one errors for LLMs.

Proposed API

All functions take a String (first character is examined) and return Bool or String:

Classification (boolean)

  • is_digit(@String) -> @Bool — ASCII 0–9
  • is_alpha(@String) -> @Bool — ASCII a–z, A–Z
  • is_alphanumeric(@String) -> @Bool — ASCII letter or digit
  • is_whitespace(@String) -> @Bool — space, tab, newline, carriage return
  • is_upper(@String) -> @Bool — ASCII A–Z
  • is_lower(@String) -> @Bool — ASCII a–z

All classification functions return false for empty strings.

Case conversion (single-character)

  • char_to_upper(@String) -> @String — uppercase the first character; return other characters unchanged.
  • char_to_lower(@String) -> @String — lowercase the first character; return other characters unchanged.

Both return the empty string unchanged. Operate on the first character of the input; callers can slice if they want whole-string conversion (string_upper / string_lower already exist for that).

Rationale for adding single-character case conversion

Whole-string string_upper / string_lower already exist. But the common "capitalize the first letter of a word" pattern currently requires:

string_concat(
  string_upper(string_slice(s, 0, 1)),
  string_slice(s, 1, string_length(s))
)

That's verbose and every string_slice offset is an error surface. char_to_upper / char_to_lower combine naturally with string_chars (once #470 lands) for building case-transformation pipelines.

Implementation

  • environment.py: Register as pure functions — (STRING,) -> BOOL for classifiers, (STRING,) -> STRING for case conversion
  • codegen/api.py: Host imports using Python's str.isdigit(), str.isalpha(), etc. on the first character; s[0].upper() + s[1:] (and .lower()) for the case converters
  • Browser runtime: Regex or char-code range checks for classification; s[0].toUpperCase() + s.slice(1) for case conversion
  • Verification: Boolean predicates — standard Z3 encoding. Can relate to string_char_code axioms (e.g. is_digit(s) <=> 48 <= string_char_code(s, 0) <= 57). Case conversion: result length equals input length.

Design note

These operate on the first character of a string rather than taking a char type, because Vera has no Char type — characters are represented as single-character strings (same as Elm, PureScript). This is consistent with string_char_code(@String, @Nat) -> @Int which also operates on a string position.

Priority

Lower priority than array utilities, sleep, and random. Within this issue: classification functions and case converters ship together (same implementation pattern, same design note). Useful for hand-written parsers, input validation, and the first-letter-capitalize pattern.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions