Skip to content

TF/Numpy variants for all DataCollator classes#13105

Merged
Rocketknight1 merged 36 commits intomasterfrom
tf_data_collators
Aug 31, 2021
Merged

TF/Numpy variants for all DataCollator classes#13105
Rocketknight1 merged 36 commits intomasterfrom
tf_data_collators

Conversation

@Rocketknight1
Copy link
Copy Markdown
Member

@Rocketknight1 Rocketknight1 commented Aug 12, 2021

This is a draft PR again - I've written an example of what a TF variant of one of our data collators would look like. If we're happy with this format, it should be easy to expand it to support Numpy/JAX as well, and to do the same for other data collators, and I'll probably add most of the other data collators to this PR before merging it. Let me know what you think!

Copy link
Copy Markdown
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great addition! We could also add the NumPy part for the Flax/Jax folks

Comment thread src/transformers/data/data_collator.py Outdated
Comment thread src/transformers/data/data_collator.py
Copy link
Copy Markdown
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few more nits to polish the PR.

Comment thread src/transformers/data/data_collator.py Outdated
Comment thread src/transformers/data/data_collator.py Outdated
Comment thread src/transformers/data/data_collator.py
Comment thread src/transformers/data/data_collator.py Outdated
Comment thread src/transformers/data/data_collator.py Outdated
@Rocketknight1
Copy link
Copy Markdown
Member Author

More updates done - please note that tests will fail until all of the data collators are updated, because I removed the top-level imports. I definitely won't be merging this until that's done, don't worry!

@Rocketknight1 Rocketknight1 changed the title Adding a TF variant of the DataCollatorForTokenClassification TF/Numpy variants for all DataCollator classes Aug 19, 2021
@Rocketknight1
Copy link
Copy Markdown
Member Author

All the classes are in! Thank you to @aromans and @sdwalker62, whose PR #12199 I cannibalized for MLM and its variants. Next step is finishing tests and making sure all of this actually works.

Copy link
Copy Markdown
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks a lot for adapting all of those and writing all the tests.

Comment thread tests/test_data_collator.py Outdated
Comment thread src/transformers/data/data_collator.py Outdated
Comment thread src/transformers/data/data_collator.py Outdated
Rocketknight1 and others added 3 commits August 30, 2021 19:31
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Copy link
Copy Markdown
Member

@LysandreJik LysandreJik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, thank you @Rocketknight1! Thanks for writing such extensive tests.

@Rocketknight1
Copy link
Copy Markdown
Member Author

Hi @aromans and @sdwalker62, we're ready to merge now. I just realized I'll need your Github no-reply e-mail addresses to add you though - see the docs here.

@sdwalker62
Copy link
Copy Markdown
Contributor

dalton_walker@icloud.com

@Rocketknight1
Copy link
Copy Markdown
Member Author

Thanks!

@aromans
Copy link
Copy Markdown
Contributor

aromans commented Aug 31, 2021

@Rocketknight1 Rocketknight1 merged commit 854260c into master Aug 31, 2021
@Rocketknight1 Rocketknight1 deleted the tf_data_collators branch August 31, 2021 12:06
@Rocketknight1
Copy link
Copy Markdown
Member Author

It's in, and all authors have been properly credited! If you want to delete the messages with your e-mails (in case of spambot harvesting), feel free.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants