Skip to content

Add skipSpecialTokens option to Tokenizer.decode#148

Merged
pcuenca merged 1 commit intohuggingface:mainfrom
finnvoor:main
Dec 26, 2024
Merged

Add skipSpecialTokens option to Tokenizer.decode#148
pcuenca merged 1 commit intohuggingface:mainfrom
finnvoor:main

Conversation

@finnvoor
Copy link
Contributor

Merry Christmas! 🎅

This PR adds a skipSpecialTokens option to Tokenizer.decode. Default arguments can't be used since Tokenizer is a protocol, so I added another function, leaving func decode(tokens:) as a convenience / maintain source compatibility.

Copy link
Member

@pcuenca pcuenca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great to me, thank you @finnvoor 🙌, and merry Christmas!

Comment on lines +221 to +224
XCTAssertEqual(
tokenizer.decode(tokens: edgeCase.encoded.input_ids, skipSpecialTokens: true),
edgeCase.decoded_without_special
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

@pcuenca pcuenca merged commit 44e2c04 into huggingface:main Dec 26, 2024
@pcuenca pcuenca mentioned this pull request Jan 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants