TF/Numpy variants for all DataCollator classes#13105
Conversation
sgugger
left a comment
There was a problem hiding this comment.
Great addition! We could also add the NumPy part for the Flax/Jax folks
sgugger
left a comment
There was a problem hiding this comment.
Left a few more nits to polish the PR.
|
More updates done - please note that tests will fail until all of the data collators are updated, because I removed the top-level imports. I definitely won't be merging this until that's done, don't worry! |
|
All the classes are in! Thank you to @aromans and @sdwalker62, whose PR #12199 I cannibalized for MLM and its variants. Next step is finishing tests and making sure all of this actually works. |
sgugger
left a comment
There was a problem hiding this comment.
LGTM, thanks a lot for adapting all of those and writing all the tests.
…g import is found
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
…ionLanguageModeling
…hem were making us fail code quality checks
a53734b to
4b9cfb5
Compare
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
LysandreJik
left a comment
There was a problem hiding this comment.
Looks good to me, thank you @Rocketknight1! Thanks for writing such extensive tests.
|
Hi @aromans and @sdwalker62, we're ready to merge now. I just realized I'll need your Github no-reply e-mail addresses to add you though - see the docs here. |
|
Thanks! |
|
It's in, and all authors have been properly credited! If you want to delete the messages with your e-mails (in case of spambot harvesting), feel free. |
This is a draft PR again - I've written an example of what a TF variant of one of our data collators would look like. If we're happy with this format, it should be easy to expand it to support Numpy/JAX as well, and to do the same for other data collators, and I'll probably add most of the other data collators to this PR before merging it. Let me know what you think!