Use `tokenizer.vocab_size()` instead of hardcoding 32000 when converting by Ronsor · Pull Request #142 · ggml-org/llama.cpp

Ronsor · 2023-03-14T20:34:30Z

When converting the model + tokenizer, use the vocabulary size returned by the tokenizer rather than assuming 32000.

There are ways that special tokens or other new tokens could be added to the tokenizer; therefore it's probably best not to assume the vocabulary is only 32000 tokens.

…th-to-ggml.py There are ways that special tokens or other new tokens could be added to the tokenizer; therefore it's probably best not to assume the vocabulary is only 32000 tokens.

…th-to-ggml.py (#142) There are ways that special tokens or other new tokens could be added to the tokenizer; therefore it's probably best not to assume the vocabulary is only 32000 tokens.

* Not working bf16_r4 * Adding bf16_r8 Small performance gain compared to bf16 - 258 t/s vs 234 t/s. I guess, this is still sub-obtimal. * bf16_rx: Very slightly faster by interleaving 16 rows 258 t/s -> 263 t/s * Rename bf16_r4 to bf16_r16 We are interleaving 16 rows now. * Cleanup unused stuff --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>

Use tokenizer.vocab_size() instead of hardcoding 32000 in convert-p…

9da4e66

…th-to-ggml.py There are ways that special tokens or other new tokens could be added to the tokenizer; therefore it's probably best not to assume the vocabulary is only 32000 tokens.

ggerganov approved these changes Mar 15, 2023

View reviewed changes

ggerganov merged commit 956dfda into ggml-org:master Mar 15, 2023

Ronsor deleted the patch-2 branch March 17, 2023 00:57

sw mentioned this pull request Mar 23, 2023

Mismatch in Vocabulary Size: Investigating Inconsistencies between Token-to-ID and ID-to-Token Dictionaries #413

Closed

Bearsaerker mentioned this pull request Mar 12, 2025

Eval bug: Gemma 3 extremly slow prompt processing when using quantized kv cache. #12352

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use `tokenizer.vocab_size()` instead of hardcoding 32000 when converting#142

Use `tokenizer.vocab_size()` instead of hardcoding 32000 when converting#142
ggerganov merged 1 commit intoggml-org:masterfrom
Ronsor:patch-2

Ronsor commented Mar 14, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Ronsor commented Mar 14, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants