Skip to content

Add pointer to Microsoft.ML.Tokenizers#37

Merged
dmitry-brazhenko merged 1 commit intodmitry-brazhenko:mainfrom
ericstj:reccomendMLTokenizers
Apr 8, 2024
Merged

Add pointer to Microsoft.ML.Tokenizers#37
dmitry-brazhenko merged 1 commit intodmitry-brazhenko:mainfrom
ericstj:reccomendMLTokenizers

Conversation

@ericstj
Copy link
Contributor

@ericstj ericstj commented Apr 5, 2024

Add benchmark info for Microsoft.ML.Tokenizers and direct folks to this package.

cc @dmitry-brazhenko @tarekgh @luisquintanilla

and .NET Standard 2.0, making it compatible with a wide range of frameworks.

> [!Important]
> The functionality in `SharpToken` has been added to [`Microsoft.ML.Tokenizers`](https://www.nuget.org/packages/Microsoft.ML.Tokenizers). `Microsoft.ML.Tokenizers` is a tokenizer library being developed by the .NET team and going forward, the central place for tokenizer development in .NET. By using `Microsoft.ML.Tokenizers`, you should see improved performance over existing tokenizer library implementations, including `SharpToken`. A stable release of `Microsoft.ML.Tokenizers` is expected alongside the .NET 9.0 release (November 2024). Instructions for migration can be found at https://github.com/dotnet/machinelearning/blob/main/docs/code/microsoft-ml-tokenizers-migration-guide.md.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we specify the Tiktoken tokenizer to ensure users will not get lost which tokenizer to use when replacing SharpToken? I see the benchmark code has it but maybe not that obvious to the users? Tokenizer.CreateTiktokenForModel("gpt-4");

Copy link
Contributor Author

@ericstj ericstj Apr 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I just copied this from the text that @luisquintanilla helped draft for DeepDev. We can mention Tiktoken.

edit: I see you meant more about migration. I think that should be added to the migration guide. Do you think you can do that @tarekgh?

@dmitry-brazhenko dmitry-brazhenko merged commit c7de8c0 into dmitry-brazhenko:main Apr 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants