Skip to content

🚨 Support updating template processors#1652

Merged
McPatate merged 22 commits intomainfrom
sequential-post-processor
Jan 28, 2025
Merged

🚨 Support updating template processors#1652
McPatate merged 22 commits intomainfrom
sequential-post-processor

Conversation

@ArthurZucker
Copy link
Copy Markdown
Collaborator

@ArthurZucker ArthurZucker commented Oct 14, 2024

Goal:

from tokenizers import Tokenizer
from tokenizers.processors import TemplateProcessing
tokenizer = Tokenizer.from_pretrained("meta-llama/Llama-3.3-70B-Instruct")
tokenizer.post_processor 

tokenizer.post_processor[1] = TemplateProcessing(
    single="[CLS] $0 [SEP]",
    pair="[CLS] $A [SEP] $B:1 [SEP]:1",
    special_tokens=[("[CLS]", 1), ("[SEP]", 0)],
)

@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@McPatate McPatate force-pushed the sequential-post-processor branch from 11533c5 to 4bb595b Compare January 14, 2025 02:37
@McPatate McPatate marked this pull request as ready for review January 16, 2025 02:33
Copy link
Copy Markdown
Member

@McPatate McPatate left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Self-approving here as it is your PR @ArthurZucker, waiting for your review before merging

Copy link
Copy Markdown
Collaborator Author

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! WOuld just add python tests! 😉

let's check set_item and also that get_item_ is mutable

@McPatate McPatate force-pushed the sequential-post-processor branch from d37229f to ff80e9f Compare January 27, 2025 23:03
@McPatate McPatate changed the title Support updating template processors 🚨 Support updating template processors Jan 28, 2025
@McPatate McPatate merged commit c45aebd into main Jan 28, 2025
@McPatate McPatate deleted the sequential-post-processor branch January 28, 2025 13:58
Narsil added a commit that referenced this pull request Mar 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants