Conversation
|
The documentation is not available anymore as the PR was closed or merged. |
|
Thank you very much for the proposed fix. Before reviewing this PR, I would like to see if it is possible to keep the new feature in |
There was a problem hiding this comment.
If both @SaulLu and @Narsil agree that this is the correct fix, then I'm ok to merge it like this to make master green.
Note: from transformers perspective it would obviously be better to have a 0.12.1 revert the breaking change, as all versions of transformers that preceded this fix will continue to be broken.
|
Summarizing an oral discussion around our options: TL;DR Ultimately the balance between 1/ The change in 2/ The A potential less clunky way would be to add The promoted way 3/ Forward compatibility. The main caveat to this proposed change, is that earlier versions of 4/ Use of It's a legacy thing, and is not changed to not break BC, but causes issues on its own: #15785 (comment) Using 5/ Tokenizer tests |
|
Would it be possible to consider a deprecation cycle for the tokenizer change, for example with an opt-in flag for the new behavior? Doing this would both keep all previous versions working with 0.12.0, while providing support for the new behavior. This would allow us to prepare for the breaking change and have at least a few versions that support this before dropping support of the current behavior. |
|
Ok Will revert in |
What does this PR do?
tokenizers
0.12changed the waydecoder.decode(works.Instead of returning directly a string, it returns a list of strings (the "decoded" parts), which enables the decoders to be chained (and hence customized more easily).
The fix works by simply joining those parts for versions >= 0.12.
Fixes ##16520
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.