Skip to content

Releases: jacksonllee/rustling

v0.8.0

20 Mar 23:16

Choose a tag to compare

Added

  • Support for data formats:
    • CoNLL-U for Universal Dependencies
    • ELAN for annotated multimedia data
    • TextGrid for Praat annotations
    • SRT for subtitles
  • CHAT data handling:
    • Added a convenience function `read_chat.
    • Added from_git and from_url methods for remote data sources.

v0.7.0

14 Mar 18:54

Choose a tag to compare

Added

  • Word segmentation:
    • Added score method for the HMM and DAG-HMM segmenters.
    • predict method can optionally output offsets for the (start, end) indices
      of segmented words compared to the original string.
  • CHAT parsing: Support custom tier names other than the standard %mor and %gra.
  • Python model classes are now subclassable.

Changed

  • Ngram counters: Ngrams.most_common now sorts tuples lexicographically
    when counts are tied.
  • CHAT parsing:
    • If a date is available at Headers's date,
      it's now a Python datetime.date object instead of a string.
    • In handling the main tier transcription for creating Token objects:
      • Special form markers suffixed with "@" are now stripped.
      • Words that have partiallly parenthetical material have the parentheses
        removed, e.g., (un)til -> until, sit(ting) -> sitting.
    • Renamed the CHAT.raw attribute to CHAT.audible for a best-effort,
      audibly faithful transcription string, to facilitate automatic speech recognition,
      forced alignment, etc.
    • A subset of the testchat/bad dataset is now used to validate CHAT data format.
  • Refactored core Rust code so that Rust-only consumers no longer need PyO3/Python.

v0.6.0

05 Mar 22:14
a3d0672

Choose a tag to compare

Added

  • Hidden Markov Model (HMM)
  • Word segmentation: Added DAG-HMM word segmenter
  • CHAT parsing: Added from_utterances method

Changed

  • Models are now persisted as a zstd-compressed FlatBuffers binary.

v0.5.0

18 Feb 09:43

Choose a tag to compare

  • Added CHAT data parsing

Full Changelog: v0.4.0...v0.5.0

v0.4.0

08 Feb 15:48

Choose a tag to compare

  • Added language models

Full Changelog: v0.3.0...v0.4.0

v0.3.0

06 Feb 07:02

Choose a tag to compare

  • Added the averaged perceptron tagger
  • Started the Python Sphinx docs and set up the Read The Docs site

Full Changelog: v0.2.0...v0.3.0

v0.2.0

04 Feb 19:39

Choose a tag to compare

First public release