Releases: jacksonllee/rustling
Releases · jacksonllee/rustling
v0.8.0
v0.7.0
Added
- Word segmentation:
- Added
scoremethod for the HMM and DAG-HMM segmenters. predictmethod can optionally output offsets for the (start, end) indices
of segmented words compared to the original string.
- Added
- CHAT parsing: Support custom tier names other than the standard %mor and %gra.
- Python model classes are now subclassable.
Changed
- Ngram counters:
Ngrams.most_commonnow sorts tuples lexicographically
when counts are tied. - CHAT parsing:
- If a date is available at
Headers'sdate,
it's now a Pythondatetime.dateobject instead of a string. - In handling the main tier transcription for creating
Tokenobjects:- Special form markers suffixed with "@" are now stripped.
- Words that have partiallly parenthetical material have the parentheses
removed, e.g., (un)til -> until, sit(ting) -> sitting.
- Renamed the
CHAT.rawattribute toCHAT.audiblefor a best-effort,
audibly faithful transcription string, to facilitate automatic speech recognition,
forced alignment, etc. - A subset of the testchat/bad dataset is now used to validate CHAT data format.
- If a date is available at
- Refactored core Rust code so that Rust-only consumers no longer need PyO3/Python.
v0.6.0
Added
- Hidden Markov Model (HMM)
- Word segmentation: Added DAG-HMM word segmenter
- CHAT parsing: Added
from_utterancesmethod
Changed
- Models are now persisted as a zstd-compressed FlatBuffers binary.
v0.5.0
- Added CHAT data parsing
Full Changelog: v0.4.0...v0.5.0
v0.4.0
- Added language models
Full Changelog: v0.3.0...v0.4.0
v0.3.0
- Added the averaged perceptron tagger
- Started the Python Sphinx docs and set up the Read The Docs site
Full Changelog: v0.2.0...v0.3.0
v0.2.0
First public release