Skip to content

Improve handling of missing vocab_file attribute in HFTokenizerConverter#677

Merged
wenbingl merged 2 commits intomicrosoft:mainfrom
kazssym:bugfix/hf-missing-vocab_file
Mar 26, 2024
Merged

Improve handling of missing vocab_file attribute in HFTokenizerConverter#677
wenbingl merged 2 commits intomicrosoft:mainfrom
kazssym:bugfix/hf-missing-vocab_file

Conversation

@kazssym
Copy link
Copy Markdown
Contributor

@kazssym kazssym commented Mar 24, 2024

This commit updates HFTokenizerConverter to handle cases where the hf_tokenizer object might not have a vocab_file attribute.

Changes:

  • Uses getattr to retrieve the vocab_file attribute for flexibility
  • Stores the retrieved value in a separate variable vocab_file for clarity
  • Checks if vocab_file is None before checking its existence

This ensures the converter works correctly even with tokenizers that don't define a vocab_file attribute.

…erter

This commit updates `HFTokenizerConverter` to handle cases where the `hf_tokenizer` object might not have a `vocab_file` attribute.

Changes:

* Uses `getattr` to retrieve the `vocab_file` attribute for flexibility
* Stores the retrieved value in a separate variable `vocab_file` for clarity
* Checks if `vocab_file` is `None` before checking its existence

This ensures the converter works correctly even with tokenizers that don't define a `vocab_file` attribute.
@kazssym kazssym requested a review from a team as a code owner March 24, 2024 08:43
@kazssym
Copy link
Copy Markdown
Contributor Author

kazssym commented Mar 24, 2024

I'm not sure but it looks like GPT2Tokenizer/-Fast lacks the vocab_file attribute.

Copy link
Copy Markdown
Contributor

@wenbingl wenbingl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fixing!

@sayanshaw24
Copy link
Copy Markdown
Collaborator

/azp run onnxruntime-extensions.CI

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@wenbingl wenbingl merged commit 31f129c into microsoft:main Mar 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants