convert: text-only support for GLM-4.1V-9B-Thinking#14823
Merged
CISC merged 1 commit intoggml-org:masterfrom Jul 23, 2025
Merged
convert: text-only support for GLM-4.1V-9B-Thinking#14823CISC merged 1 commit intoggml-org:masterfrom
CISC merged 1 commit intoggml-org:masterfrom
Conversation
CISC
reviewed
Jul 23, 2025
* use language_model part only, ignore visual layers * fix rope_dim calculation
ad66a8f to
d959644
Compare
CISC
approved these changes
Jul 23, 2025
4 tasks
Contributor
|
Please do the mmproj next! This VLM is supposed to be really good. |
taronaeo
pushed a commit
to taronaeo/llama.cpp-s390x
that referenced
this pull request
Jul 25, 2025
* use language_model part only, ignore visual layers * fix rope_dim calculation
gabe-l-hart
added a commit
to gabe-l-hart/llama.cpp
that referenced
this pull request
Jul 25, 2025
* origin/master: docs : update HOWTO‑add‑model.md for ModelBase and new model classes (ggml-org#14874) ggml : remove invalid portPos specifiers from dot files (ggml-org#14838) context : restore preemptive sched reset when LLAMA_SET_ROWS=0 (ggml-org#14870) mtmd : fix 32-bit narrowing issue in export-lora and mtmd clip (ggml-org#14503) rpc : check for null buffers in get/set/copy tensor endpoints (ggml-org#14868) sched : fix multiple evaluations of the same graph with pipeline parallelism (ggml-org#14855) musa: upgrade musa sdk to rc4.2.0 (ggml-org#14498) sync : ggml cmake : fix usage issues (ggml/1257) ggml-cpu : remove stdlib include from repack.cpp (ggml/1276) context : perform output reorder lazily upon access after sync (ggml-org#14853) chat : fix kimi-k2 chat template (ggml-org#14852) sycl: fixed semantics of block offset calculation (ggml-org#14814) llama : fix MiniCPM inference after Granite Four changes (ggml-org#14850) docs: add libcurl-dev install hint for Linux distros (ggml-org#14801) metal : fix fusion across different encoders (ggml-org#14849) sycl: fix undefined variable in work group size check (ggml-org#14843) convert : text-only support for GLM-4.1V-9B-Thinking (ggml-org#14823) CUDA: fix overflow in FA, tune performance (ggml-org#14840) CUDA: fix compilation with GGML_CUDA_F16 (ggml-org#14837)
Contributor
|
I made some quants! https://huggingface.co/unsloth/GLM-4.1V-9B-Thinking-GGUF |
blime4
referenced
this pull request
in blime4/llama.cpp
Feb 5, 2026
* use language_model part only, ignore visual layers * fix rope_dim calculation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fix #14495
This is my first attempt to contribute to llama.cpp.
I used Transformers to compare layers with GLM-4-9B-0414, the text structure appears identical.
the config for GLM-4.1V-9B-Thinking is missing the head_dim field