Fixed Llama-3_1-Nemotron-51B doesn't work when 4K or more tokens#11008
Merged
ggerganov merged 8 commits intoggml-org:masterfrom Dec 31, 2024
ymcki:master
Merged
Fixed Llama-3_1-Nemotron-51B doesn't work when 4K or more tokens#11008ggerganov merged 8 commits intoggml-org:masterfrom ymcki:master
ggerganov merged 8 commits intoggml-org:masterfrom
ymcki:master
Conversation
slaren
approved these changes
Dec 31, 2024
ggerganov
approved these changes
Dec 31, 2024
arthw
pushed a commit
to arthw/llama.cpp
that referenced
this pull request
Feb 26, 2025
* conflict resolution * move comments after bracket to its own line * DeciLMCausalModel now reads rope_theta from config.json properly
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Make sure to read the contributing guidelines before submitting a PR
This is to fix this bug:
#11002
After inspecting the parameters between Llama-3.1-70B and the 51B ggufs while loading the ggufs with llama-cli, I noticed that there is exactly one difference at rope_theta (500000.0 vs 10000.0). Looking at config.json of 51B, this value should be 500000.0. That means the current convert_hf_to_gguf.py doesn't read rope_theta for DeciLMCausalModel. I fixed that and make this PR.
I generated an gguf with the correct rope_theta of 500000.0. It can work with llama.cpp b4380 or above without recompilation as I only fixed convert_hf_to_gguf.py without touching the C code.
https://huggingface.co/ymcki/Llama-3_1-Nemotron-51B-Instruct-GGUF/blob/main/Llama-3_1-Nemotron-51B-Instruct.imatrix.Q4_K_M.gguf
As a side, inspecting the tokenizer_config.json of Llama-3.1-70B, I find that it also have both eos_token and eot_token set to '<|eot_id|>'. Therefore, it probably is not a typo for 51B. So I also remove the four lines in set_vocab related to this.
This can get rid of this warning without causing any problems:
llm_load_vocab: special_eos_id is not in special_eog_ids - the tokenizer config may be incorrect