Skip to content

[split_special_tokens] Add support for split_special_tokens argument to encode#25081

Merged
ArthurZucker merged 26 commits intohuggingface:mainfrom
ArthurZucker:encode_special_token
Aug 18, 2023
Merged

[split_special_tokens] Add support for split_special_tokens argument to encode#25081
ArthurZucker merged 26 commits intohuggingface:mainfrom
ArthurZucker:encode_special_token

Conversation

@ArthurZucker
Copy link
Collaborator

What does this PR do?

Argument name is totally debatable. Will also require a pull request in tokenizers.
The goal is to be able to simply activate and de-activate the special token splitting. Feature was asked in #22490, and is required for some production type cases, where users pass inputs and we don't want them to be able to hack them

@ArthurZucker ArthurZucker changed the title [split_special_tokens] Add support for split_special_tokens argument to encode WIP [split_special_tokens] Add support for split_special_tokens argument to encode Jul 25, 2023
@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Jul 26, 2023

The documentation is not available anymore as the PR was closed or merged.

@ArthurZucker ArthurZucker changed the title WIP [split_special_tokens] Add support for split_special_tokens argument to encode [split_special_tokens] Add support for split_special_tokens argument to encode Aug 17, 2023
@ArthurZucker ArthurZucker requested a review from sgugger August 17, 2023 15:25
Copy link
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants