[BUGFIX] Mistral format yarn apply_scale support. by juliendenize · Pull Request #22612 · ggml-org/llama.cpp

juliendenize · 2026-05-02T10:05:05Z

Overview

This PR fixes the mistral format conversion to properly handle apply_scale for yarn scaling. Previously models that had in params.json apply_scale=True would have still disabled it leading to poor performance for mid-long context.

Additional information

Requirements

I have read and agree with the contributing guidelines
AI usage disclosure: NO

juliendenize · 2026-05-02T11:24:14Z

Sorry for closing and opening back and forth i was just checking if my understanding was correct.

IIUC at llama-model.cpp line 2871:

ml.get_key(LLM_KV_ROPE_SCALING_YARN_LOG_MUL,   hparams.rope_yarn_log_mul, 0.0f);

does put the value to 0 when unset. So if I just set it explicitly as I did in the PR, it should have the same effect.

My goal here is making sure standard yarn scaling is applied.

CISC · 2026-05-02T13:19:56Z

ml.get_key(LLM_KV_ROPE_SCALING_YARN_LOG_MUL,   hparams.rope_yarn_log_mul, 0.0f);
does put the value to 0 when unset. So if I just set it explicitly as I did in the PR, it should have the same effect.

I think someone misunderstood, the last parameter is a bool, false meaning the key is not required, leaving hparams.rope_yarn_log_mul untouched (defaults to 0.0f) if it doesn't exist.

isaac-mcfadyen · 2026-05-02T14:32:46Z

Note that Mistral applied a fix upstream themselves after Unsloth notified them of the issue: https://huggingface.co/mistralai/Mistral-Medium-3.5-128B/discussions/18

~~Once this is merged, will the combination of proper apply_scale=True and the mscale_all_dim=0.0 cause any issues?~~
EDIT: Ignore me, I re-read this and realized that's exactly what the apply_scale param would do with this PR...

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

juliendenize · 2026-05-02T19:01:42Z

I think someone misunderstood, the last parameter is a bool, false meaning the key is not required, leaving hparams.rope_yarn_log_mul untouched (defaults to 0.0f) if it doesn't exist.

Yes i get it now, explicitly setting it to 0 ends up having the same behavior so that's ok ! thanks a lot for approving !
Edit ok based on your last commit i didn't get it thanks for the contrib ^^'

Note that Mistral applied a fix upstream themselves after Unsloth notified them of the issue: https://huggingface.co/mistralai/Mistral-Medium-3.5-128B/discussions/18
EDIT: Ignore me, I re-read this and realized that's exactly what the apply_scale param would do with this PR...

thanks for double checking, indeed this is related to the same issue. The fix in the model hub is for Transformers conversion. A while back, thanks to the help of CISC and Son we added an alternative mistral format for conversion in case Transformers weights are not accessible but it would end up having the same issue before this change.

* [BUGFIX] Mistral format apply_scale support. * Update convert_hf_to_gguf.py Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * fix misunderstood boolean parameters --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

Ports 15 upstream commits (05e141a..5d44db6) that touched the monolithic convert_hf_to_gguf.py into the new conversion/*.py layout introduced by the refactor split. New text/mmproj architectures registered: GraniteSpeechForConditionalGeneration, MiMoV2ForCausalLM, MiniCPMV4_6ForConditionalGeneration, Sarashina2VisionForCausalLM, SarvamMoEForCausalLM (+ modeling_sarvam_moe.SarvamMoEForCausalLM). Notable changes: - filter_tensors classmethod added to ModelBase/TextModel/MmprojModel and wired into index_tensors; many model classes refactored to move tensor-name skip/rename logic out of modify_tensors and into filter_tensors (upstream ggml-org#22597). - LlamaModel._repack_nvfp4 override (Q/K RoPE permutation, ggml-org#22611). - MistralModel yarn apply_scale support (ggml-org#22612). - Gemma4Model._generate_nvfp4_tensors override for 26B NVFP4 (ggml-org#22804). - LlavaVisionModel image-break token fallback for Mistral params.json -1 placeholders (ggml-org#22914). - Pixtral 12B --mistral-format conversion fixes (ggml-org#22981). - FP8 KV-cache scales fix (ggml-org#22818) and uint dtype byteswap disable (ggml-org#18908). New files: conversion/sarashina2.py (Sarashina2VL text + vision)

* [BUGFIX] Mistral format apply_scale support. * Update convert_hf_to_gguf.py Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * fix misunderstood boolean parameters --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

* [BUGFIX] Mistral format apply_scale support. * Update convert_hf_to_gguf.py Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * fix misunderstood boolean parameters --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> (cherry picked from commit 048a490)

* [BUGFIX] Mistral format apply_scale support. * Update convert_hf_to_gguf.py Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * fix misunderstood boolean parameters --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

[BUGFIX] Mistral format apply_scale support.

744218d

juliendenize requested a review from CISC as a code owner May 2, 2026 10:05

github-actions Bot added the python python script changes label May 2, 2026

juliendenize closed this May 2, 2026

juliendenize reopened this May 2, 2026

juliendenize closed this May 2, 2026

juliendenize reopened this May 2, 2026

CISC approved these changes May 2, 2026

View reviewed changes

Comment thread convert_hf_to_gguf.py Outdated

Update convert_hf_to_gguf.py

f602da7

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

fix misunderstood boolean parameters

0573838

CISC added the merge ready A maintainer can use this label to indicate that they consider the changes final and ready to merge. label May 2, 2026

CISC reviewed May 2, 2026

View reviewed changes

Comment thread convert_hf_to_gguf.py

ggerganov approved these changes May 3, 2026

View reviewed changes

CISC merged commit 048a490 into ggml-org:master May 3, 2026
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUGFIX] Mistral format yarn apply_scale support.#22612

[BUGFIX] Mistral format yarn apply_scale support.#22612
CISC merged 3 commits into
ggml-org:masterfrom
juliendenize:mistral_format_apply_scale

juliendenize commented May 2, 2026

Uh oh!

juliendenize commented May 2, 2026 •

edited

Loading

Uh oh!

CISC commented May 2, 2026

Uh oh!

Uh oh!

isaac-mcfadyen commented May 2, 2026 •

edited

Loading

Uh oh!

juliendenize commented May 2, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

juliendenize commented May 2, 2026

Overview

Additional information

Requirements

Uh oh!

juliendenize commented May 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

CISC commented May 2, 2026

Uh oh!

Uh oh!

isaac-mcfadyen commented May 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

juliendenize commented May 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

juliendenize commented May 2, 2026 •

edited

Loading

isaac-mcfadyen commented May 2, 2026 •

edited

Loading

juliendenize commented May 2, 2026 •

edited

Loading