convert : support rope_scaling type and rope_type#13349
Conversation
|
The same code is copied in multiple places, so I think it's better to group them into a new function like Edit: or we can extend Btw, which model(s) you have been testing with? |
|
The problem with extending the base I tested with a few models I still had the original files for, not every single one I touched, but I'm fairly sure I didn't break anything. :) |
ngxson
left a comment
There was a problem hiding this comment.
I still think having this as a dedicated function like self.set_rope_config() will make it easier to maintain. We can optionally check if rope_scaling["factor"] has a good value (i.e. non-null), but it's up to you anyway.
|
I absolutely agree, but I worry about the special cases. :) I'll see what I can do... |
|
I will merge this as-is for now and make a new PR later. Deduplication of the rope code requires careful thought (I think we also can deduplicate the llama3 rope_freqs calculations). |
* origin/master: (39 commits) server : vision support via libmtmd (ggml-org#12898) sycl : implementation of reordered Q4_0 MMVQ for Intel GPUs (ggml-org#12858) metal : optimize MoE for large batches (ggml-org#13388) CUDA: FA support for Deepseek (Ampere or newer) (ggml-org#13306) llama : do not crash if there is no CPU backend (ggml-org#13395) CUDA: fix crash on large batch size for MoE models (ggml-org#13384) imatrix : Add --parse-special for enabling parsing of special tokens in imatrix calculation (ggml-org#13389) llama-run: add support for downloading models from ModelScope (ggml-org#13370) mtmd : fix batch_view for m-rope (ggml-org#13397) llama : one-off chat template fix for Mistral-Small-2503 (ggml-org#13398) rpc : add rpc_msg_set_tensor_hash_req (ggml-org#13353) vulkan: Allow up to 4096 elements for mul_mat_id row_ids (ggml-org#13326) server : (webui) rename has_multimodal --> modalities (ggml-org#13393) ci : limit write permission to only the release step + fixes (ggml-org#13392) mtmd : Expose helper_decode_image_chunk (ggml-org#13366) server : (webui) fix a very small misalignment (ggml-org#13387) server : (webui) revamp the input area, plus many small UI improvements (ggml-org#13365) convert : support rope_scaling type and rope_type (ggml-org#13349) mtmd : fix the calculation of n_tokens for smolvlm (ggml-org#13381) context : allow cache-less context for embeddings (ggml-org#13108) ...
At some point
transformersrenamedrope_scalingtypetorope_type, so support both.