Summary
Vera has string_char_code and string_from_char_code for working with individual characters, but no character classification or case-conversion functions for single characters. These are fundamental for parsing, validation, and text processing — and hand-rolling the equivalents via ASCII range comparisons is a common source of off-by-one errors for LLMs.
Proposed API
All functions take a String (first character is examined) and return Bool or String:
Classification (boolean)
is_digit(@String) -> @Bool — ASCII 0–9
is_alpha(@String) -> @Bool — ASCII a–z, A–Z
is_alphanumeric(@String) -> @Bool — ASCII letter or digit
is_whitespace(@String) -> @Bool — space, tab, newline, carriage return
is_upper(@String) -> @Bool — ASCII A–Z
is_lower(@String) -> @Bool — ASCII a–z
All classification functions return false for empty strings.
Case conversion (single-character)
char_to_upper(@String) -> @String — uppercase the first character; return other characters unchanged.
char_to_lower(@String) -> @String — lowercase the first character; return other characters unchanged.
Both return the empty string unchanged. Operate on the first character of the input; callers can slice if they want whole-string conversion (string_upper / string_lower already exist for that).
Rationale for adding single-character case conversion
Whole-string string_upper / string_lower already exist. But the common "capitalize the first letter of a word" pattern currently requires:
string_concat(
string_upper(string_slice(s, 0, 1)),
string_slice(s, 1, string_length(s))
)
That's verbose and every string_slice offset is an error surface. char_to_upper / char_to_lower combine naturally with string_chars (once #470 lands) for building case-transformation pipelines.
Implementation
- environment.py: Register as pure functions —
(STRING,) -> BOOL for classifiers, (STRING,) -> STRING for case conversion
- codegen/api.py: Host imports using Python's
str.isdigit(), str.isalpha(), etc. on the first character; s[0].upper() + s[1:] (and .lower()) for the case converters
- Browser runtime: Regex or char-code range checks for classification;
s[0].toUpperCase() + s.slice(1) for case conversion
- Verification: Boolean predicates — standard Z3 encoding. Can relate to
string_char_code axioms (e.g. is_digit(s) <=> 48 <= string_char_code(s, 0) <= 57). Case conversion: result length equals input length.
Design note
These operate on the first character of a string rather than taking a char type, because Vera has no Char type — characters are represented as single-character strings (same as Elm, PureScript). This is consistent with string_char_code(@String, @Nat) -> @Int which also operates on a string position.
Priority
Lower priority than array utilities, sleep, and random. Within this issue: classification functions and case converters ship together (same implementation pattern, same design note). Useful for hand-written parsers, input validation, and the first-letter-capitalize pattern.
Summary
Vera has
string_char_codeandstring_from_char_codefor working with individual characters, but no character classification or case-conversion functions for single characters. These are fundamental for parsing, validation, and text processing — and hand-rolling the equivalents via ASCII range comparisons is a common source of off-by-one errors for LLMs.Proposed API
All functions take a
String(first character is examined) and returnBoolorString:Classification (boolean)
is_digit(@String) -> @Bool— ASCII 0–9is_alpha(@String) -> @Bool— ASCII a–z, A–Zis_alphanumeric(@String) -> @Bool— ASCII letter or digitis_whitespace(@String) -> @Bool— space, tab, newline, carriage returnis_upper(@String) -> @Bool— ASCII A–Zis_lower(@String) -> @Bool— ASCII a–zAll classification functions return
falsefor empty strings.Case conversion (single-character)
char_to_upper(@String) -> @String— uppercase the first character; return other characters unchanged.char_to_lower(@String) -> @String— lowercase the first character; return other characters unchanged.Both return the empty string unchanged. Operate on the first character of the input; callers can slice if they want whole-string conversion (
string_upper/string_loweralready exist for that).Rationale for adding single-character case conversion
Whole-string
string_upper/string_loweralready exist. But the common "capitalize the first letter of a word" pattern currently requires:That's verbose and every
string_sliceoffset is an error surface.char_to_upper/char_to_lowercombine naturally withstring_chars(once #470 lands) for building case-transformation pipelines.Implementation
(STRING,) -> BOOLfor classifiers,(STRING,) -> STRINGfor case conversionstr.isdigit(),str.isalpha(), etc. on the first character;s[0].upper() + s[1:](and.lower()) for the case converterss[0].toUpperCase() + s.slice(1)for case conversionstring_char_codeaxioms (e.g.is_digit(s) <=> 48 <= string_char_code(s, 0) <= 57). Case conversion: result length equals input length.Design note
These operate on the first character of a string rather than taking a char type, because Vera has no
Chartype — characters are represented as single-character strings (same as Elm, PureScript). This is consistent withstring_char_code(@String, @Nat) -> @Intwhich also operates on a string position.Priority
Lower priority than array utilities, sleep, and random. Within this issue: classification functions and case converters ship together (same implementation pattern, same design note). Useful for hand-written parsers, input validation, and the first-letter-capitalize pattern.