This library is a set of APIs defined with module types, and a set of modules and functors implementing one or more of those interfaces.
The APIs define what a character and a string of characters should be.
See the INSTALL file for build instructions and/or the documentation website.
The library is “packed” in the Sosa toplevel module name.
We have, in the sub-module Api:
BASIC_CHARACTER: characters of any length.NATIVE_CONVERSIONS: functions to transform from/to native OCaml strings.BASIC_STRING: immutable strings of (potentially abstract) characters:- includes
NATIVE_CONVERSIONS, - contains a functor to provide a thread agnostic
outputfunction:Make_output:OUTPUT_MODEL→sig val output: ... end.
- includes
UNSAFELY_MUTABLE: mutability of some string implementations (“unsafe” meaning that they break immutability invariants/assumptions).MINIMALISTIC_MUTABLE_STRING: abstract mutable string used as argument of theOf_mutablefunctor.
The Native_character module implements BASIC_CHARACTER with
OCaml's char type.
The Native_string module implements BASIC_STRING with OCaml's string type
considered immutable (and hence Native_character).
The Native_bytes module implements BASIC_STRING
and UNSAFELY_MUTABLE with OCaml's bytes type.
List_of is a functor: BASIC_CHARACTER → BASIC_STRING, i.e., it creates a
string datastructure made of a list of characters.
The functor Of_mutable uses an implementation of
MINIMALISTIC_MUTABLE_STRING to build a BASIC_STRING.
The Int_utf8_character module implements BASIC_CHARACTER with
OCaml integers (int) representing Utf8 characters (we force the
handling of not more than 31 bits, even if RFC 3629
restricts them to end at U+10FFFF, c.f. also
wikipedia). Note that the function is_whitespace considers
only ASCII whitespace (useful while writing parsers for example).
See the file test/main.ml for usage examples, the
library is tested with:
- native strings and characters,
- lists of native characters (
List_of(Native_character)), - lists of integers representing UTF-8 characters (
List_of(utf8-int array)), - arrays of integers representing UTF-8 characters (
Of_mutable(utf8-int)), - bigarrays of 8-bit integers (
Of_mutable(int8 Bigarray1.t)).
The tests depend on the Nonstd,
unix, and bigarray libraries:
make test
./sosa_tests
and you may add the basic benchmarks to the process with:
./sosa_tests bench