Skip to content

Use tokenizer.vocab_size() instead of hardcoding 32000 when converting#142

Merged
ggerganov merged 1 commit intoggml-org:masterfrom
Ronsor:patch-2
Mar 15, 2023
Merged

Use tokenizer.vocab_size() instead of hardcoding 32000 when converting#142
ggerganov merged 1 commit intoggml-org:masterfrom
Ronsor:patch-2

Conversation

@Ronsor
Copy link
Contributor

@Ronsor Ronsor commented Mar 14, 2023

When converting the model + tokenizer, use the vocabulary size returned by the tokenizer rather than assuming 32000.

There are ways that special tokens or other new tokens could be added to the tokenizer; therefore it's probably best not to assume the vocabulary is only 32000 tokens.

…th-to-ggml.py

There are ways that special tokens or other new tokens could be added to the tokenizer; therefore it's probably best not to assume the vocabulary is only 32000 tokens.
@ggerganov ggerganov merged commit 956dfda into ggml-org:master Mar 15, 2023
blackhole89 pushed a commit that referenced this pull request Mar 15, 2023
…th-to-ggml.py (#142)

There are ways that special tokens or other new tokens could be added to the tokenizer; therefore it's probably best not to assume the vocabulary is only 32000 tokens.
@Ronsor Ronsor deleted the patch-2 branch March 17, 2023 00:57
SamuelOliveirads pushed a commit to SamuelOliveirads/llama.cpp that referenced this pull request Dec 29, 2025
* Not working bf16_r4

* Adding bf16_r8

Small performance gain compared to bf16 - 258 t/s vs 234 t/s.
I guess, this is still sub-obtimal.

* bf16_rx: Very slightly faster by interleaving 16 rows

258 t/s -> 263 t/s

* Rename bf16_r4 to bf16_r16

We are interleaving 16 rows now.

* Cleanup unused stuff

---------

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants