-
Notifications
You must be signed in to change notification settings - Fork 4.5k
Closed
Labels
.NETIssue or Pull requests regarding .NET codeIssue or Pull requests regarding .NET code
Description
The existing GPT3Tokenizer class uses p50k_base encoding. GPT-3.5, GPT-4, and text-embedding-ada-002 all rely on cl100k_base encoding.
The below table is extracted from the OpenAI notebook on token counting.
| Encoding name | OpenAI models |
|---|---|
| cl100k_base | gpt-4, gpt-3.5-turbo, text-embedding-ada-002 |
| p50k_base | Codex models, text-davinci-002, text-davinci-003 |
| r50k_base (or gpt2) | GPT-3 models like davinci |
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
.NETIssue or Pull requests regarding .NET codeIssue or Pull requests regarding .NET code
Type
Projects
Status
Sprint: Done