Skip to content

Ability to train from memory#544

Merged
n1t0 merged 8 commits intomasterfrom
trainer-experiments
Nov 28, 2020
Merged

Ability to train from memory#544
n1t0 merged 8 commits intomasterfrom
trainer-experiments

Conversation

@n1t0
Copy link
Copy Markdown
Contributor

@n1t0 n1t0 commented Nov 25, 2020

Adds the ability to train from an Iterator in Rust, and anything that can be used as an iterator in Python too.

Training a tokenizer using datasets or a List[str] roughly takes as much time as training from files (cf examples/train_with_datasets.py)

Fix #198 & Fix #524

Still need to add:

  • Documentation (API Reference + examples)

@n1t0 n1t0 force-pushed the trainer-experiments branch 4 times, most recently from 3580858 to 6e066d8 Compare November 28, 2020 17:02
@n1t0 n1t0 force-pushed the trainer-experiments branch from 6e066d8 to f5ec740 Compare November 28, 2020 17:13
@n1t0 n1t0 merged commit 49bd055 into master Nov 28, 2020
@n1t0 n1t0 deleted the trainer-experiments branch November 28, 2020 17:29
@n1t0 n1t0 mentioned this pull request Nov 28, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Extract the word-counts to each Trainer Training a model from in-memory data

1 participant