Name and Version
llama.cpp version - b9038 .
Operating systems
Windows
GGML backends
CUDA
Hardware
RTX 5090
Models
Mistral Medium 3.5 128B Q4_K_M
mmproj.GGUF
Problem description & steps to reproduce
Use official HF model as input - https://huggingface.co/mistralai/Mistral-Medium-3.5-128B
Create GGUF model from HF model -
convert_hf_to_gguf.py /path/to/hf-model-directory --outfile /path/to/model.gguf --outtype auto
Create mmproj GGUF model using below command -
convert_hf_to_gguf.py /path/to/hf-model-directory --outfile /path/to/mmproj.gguf --mistral-common --mmproj --outtype auto
Create Q4_K_M model using below command
llama-quantize --output-tensor-type q8_0 model.gguf output-q4_k_m.gguf q4_k_m
Try to run inference using llama-mtmd-cli.exe
llama-mtmd-cli.exe -m "Path/to/Mistral-Medium-3.5-128B-Q4_K_M.gguf" --mmproj "Path/to/Mistral-Medium-3.5-128B-mmproj.gguf" --image "test.jpg" -p "What is shown in this image ?" --jinja
We get this error -
clip_init: failed to load model 'D:\Models\LLM-VLM-models\DawnRidge\HF-Official-GGUF-model\Mistral-Medium-3.5-128B-mmproj.gguf': operator(): unable to find tensor v.token_embd.img_break
�[0mmtmd_init_from_file: error: Failed to load CLIP model from D:\Models\LLM-VLM-models\DawnRidge\HF-Official-GGUF-model\Mistral-Medium-3.5-128B-mmproj.gguf
First Bad Commit
Not known
Relevant log output
llama-mtmd-cli-verbose.txt
llama-mtmd-cli-error.txt
Logs
Name and Version
llama.cpp version - b9038 .
Operating systems
Windows
GGML backends
CUDA
Hardware
RTX 5090
Models
Mistral Medium 3.5 128B Q4_K_M
mmproj.GGUF
Problem description & steps to reproduce
Use official HF model as input - https://huggingface.co/mistralai/Mistral-Medium-3.5-128B
Create GGUF model from HF model -
convert_hf_to_gguf.py /path/to/hf-model-directory --outfile /path/to/model.gguf --outtype auto
Create mmproj GGUF model using below command -
convert_hf_to_gguf.py /path/to/hf-model-directory --outfile /path/to/mmproj.gguf --mistral-common --mmproj --outtype auto
Create Q4_K_M model using below command
llama-quantize --output-tensor-type q8_0 model.gguf output-q4_k_m.gguf q4_k_m
Try to run inference using llama-mtmd-cli.exe
llama-mtmd-cli.exe -m "Path/to/Mistral-Medium-3.5-128B-Q4_K_M.gguf" --mmproj "Path/to/Mistral-Medium-3.5-128B-mmproj.gguf" --image "test.jpg" -p "What is shown in this image ?" --jinja
We get this error -
clip_init: failed to load model 'D:\Models\LLM-VLM-models\DawnRidge\HF-Official-GGUF-model\Mistral-Medium-3.5-128B-mmproj.gguf': operator(): unable to find tensor v.token_embd.img_break
�[0mmtmd_init_from_file: error: Failed to load CLIP model from D:\Models\LLM-VLM-models\DawnRidge\HF-Official-GGUF-model\Mistral-Medium-3.5-128B-mmproj.gguf
First Bad Commit
Not known
Relevant log output
llama-mtmd-cli-verbose.txt
llama-mtmd-cli-error.txt
Logs