Name and Version
❯ ./llama-server --version
version: 9494 (c8d6a00)
built with GNU 16.1.1 for Linux x86_64
Operating systems
Linux
GGML backends
CUDA
Hardware
i7 13620H + rtx 4060 (8gb)
Models
Gemma-4-E4B-Uncensored-HauhauCS-Aggressive-Q5_K_P.gguf
mmproj-Gemma-4-E4B-Uncensored-HauhauCS-Aggressive-f16.gguf
Problem description & steps to reproduce
when i try to load the gemma 4 e4b with mmproj, it crashes with this error message:
"❯ taskset -c 0,2,4,6,8,10 ~/programs/llama.cpp/build/bin/llama-server -fa on --no-mmap --port 8081 --host 0.0.0.0 --reasoning auto -np 1 -m ~/Downloads/models/gemmae4b/Gemma-4-E4B-Uncensored-HauhauCS-Aggressive-Q5_K_P.gguf --mmproj ./mmproj-Gemma-4-E4B-Uncensored-HauhauCS-Aggressive-f16.gguf --no-warmup --jinja --image-min-tokens 560 --image-max-tokens 1120 -ub 512 -b 512 -fitt 50 -c 8000 --temp 1 --webui-mcp-proxy --threads 6 -dio --cache-ram 0 --prio 2 -nocb
0.00.121.396 I log_info: verbosity = 3 (adjust with the -lv N CLI arg)
0.00.121.398 I device_info:
0.00.197.914 I - CUDA0 : NVIDIA GeForce RTX 4060 Laptop GPU (7834 MiB, 7718 MiB free)
0.00.197.920 I - CPU : 13th Gen Intel(R) Core(TM) i7-13620H (15687 MiB, 15687 MiB free)
0.00.197.968 I system_info: n_threads = 6 (n_threads_batch = 6) / 16 | CUDA : ARCHS = 890 | USE_GRAPHS = 1 | PEER_MAX_BATCH_SIZE = 128 | FA_ALL_QUANTS = 1 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX_VNNI = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | BMI2 = 1 | LLAMAFILE = 1 | OPENMP = 1 | REPACK = 1 |
0.00.198.006 I srv init: running without SSL
0.00.198.028 I srv init: using 15 threads for HTTP server
0.00.198.095 W srv llama_server: -----------------
0.00.198.096 W srv llama_server: CORS proxy is enabled, do not expose server to untrusted environments
0.00.198.096 W srv llama_server: This feature is EXPERIMENTAL and may be removed or changed in future versions
0.00.198.096 W srv llama_server: -----------------
0.00.198.099 I srv start: binding port with default address family
0.00.199.236 I srv llama_server: loading model
0.00.199.247 I srv load_model: loading model '/home/turbo/Downloads/models/gemmae4b/Gemma-4-E4B-Uncensored-HauhauCS-Aggressive-Q5_K_P.gguf'
/home/turbo/programs/llama.cpp/tools/mtmd/clip.cpp:4391: Unknown projector type
[New LWP 367501]
[New LWP 367500]
[New LWP 367499]
[New LWP 367498]
[New LWP 367497]
[New LWP 367496]
[New LWP 367495]
[New LWP 367494]
[New LWP 367493]
[New LWP 367492]
[New LWP 367491]
[New LWP 367490]
[New LWP 367489]
[New LWP 367488]
[New LWP 367487]
[New LWP 367486]
[New LWP 367485]
[New LWP 367471]
[New LWP 367470]
[New LWP 367452]
[New LWP 367451]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib/libthread_db.so.1".
0x00007fc81aeb4e22 in ?? () from /usr/lib/libc.so.6
#0 0x00007fc81aeb4e22 in ?? () from /usr/lib/libc.so.6
#1 0x00007fc81aea8178 in ?? () from /usr/lib/libc.so.6
#2 0x00007fc81af2fa6b in wait4 () from /usr/lib/libc.so.6
#3 0x00007fc827b3e9fb in ggml_print_backtrace () from /home/turbo/programs/llama.cpp/build/bin/libggml-base.so.0
#4 0x00007fc827b3eb8e in ggml_abort () from /home/turbo/programs/llama.cpp/build/bin/libggml-base.so.0
#5 0x00007fc8281937a6 in clip_n_mmproj_embd(clip_ctx const*) () from /home/turbo/programs/llama.cpp/build/bin/libmtmd.so.0
#6 0x00007fc8280e24b1 in mtmd_context::mtmd_context(char const*, llama_model const*, mtmd_context_params const&, bool) () from /home/turbo/programs/llama.cpp/build/bin/libmtmd.so.0
#7 0x00007fc8280ddb24 in mtmd_get_memory_usage(char const*, mtmd_context_params) () from /home/turbo/programs/llama.cpp/build/bin/libmtmd.so.0
#8 0x00007fc828396c04 in server_context_impl::load_model(common_params&) () from /home/turbo/programs/llama.cpp/build/bin/libllama-server-impl.so
#9 0x00007fc8282d3ac2 in llama_server(int, char**) () from /home/turbo/programs/llama.cpp/build/bin/libllama-server-impl.so
#10 0x00007fc81ae27c8e in ?? () from /usr/lib/libc.so.6
#11 0x00007fc81ae27dcb in __libc_start_main () from /usr/lib/libc.so.6
#12 0x000055b523cd9075 in _start ()
[Inferior 1 (process 367449) detached]
fish: Job 1, 'taskset -c 0,2,4,6,8,10 ~/progr…' terminated by signal SIGABRT (Abort)"
When i asked ai about it, it said a line is wrong in llama cpp, here is the ai response:
Symptom
Loading a Gemma 4 mmproj that has both vision and audio (clip.vision.projector_type = 'gemma4v', clip.audio.projector_type = 'gemma4a') triggers SIGABRT during model load.
Unknown projector type
clip.cpp:4391: GGML_ABORT("Unknown projector type")
Backtrace
clip_n_mmproj_embd()
→ mtmd_context::mtmd_context()
→ mtmd_get_memory_usage()
→ server_context_impl::load_model()
Root Cause
Commit 951fa5c ("add model") incorrectly replaced case PROJECTOR_TYPE_GEMMA4A with case PROJECTOR_TYPE_GEMMA4UA in clip_n_mmproj_embd(). These are distinct projector types:
Type Purpose Your mmproj
GEMMA4A Conformer audio encoder (mel spectrogram) ✅ clip.audio.projector_type = 'gemma4a'
GEMMA4UA Encoder-free raw-waveform audio ❌ not used
The gemma4a case is handled everywhere else in clip.cpp (builder, hparams, tensor loading) — only clip_n_mmproj_embd was missed.
Fix (one line)
In tools/mtmd/clip.cpp, function clip_n_mmproj_embd():
case PROJECTOR_TYPE_GEMMA4V:
case PROJECTOR_TYPE_GEMMA4UV:
- case PROJECTOR_TYPE_GEMMA4A:
return ctx->model.mm_input_proj_w->ne[1];
Both GEMMA4V (vision) and GEMMA4A (audio) load into the same mm_input_proj_w tensor (from "mm.input_projection.weight" and "mm.a.input_projection.weight" respectively), so the dimension lookup is identical.
First Bad Commit
No response
Relevant log output
Logs
❯ taskset -c 0,2,4,6,8,10 ~/programs/llama.cpp/build/bin/llama-server -fa on --no-mmap --port 8081 --host 0.0.0.0 --reasoning auto -np 1 -m ~/Downloads/models/gemmae4b/Gemma-4-E4B-Uncensored-HauhauCS-Aggressive-Q5_K_P.gguf --mmproj ./mmproj-Gemma-4-E4B-Uncensored-HauhauCS-Aggressive-f16.gguf --no-warmup --jinja --image-min-tokens 560 --image-max-tokens 1120 -ub 512 -b 512 -fitt 50 -c 8000 --temp 1 --webui-mcp-proxy --threads 6 -dio --cache-ram 0 --prio 2 -nocb
0.00.121.396 I log_info: verbosity = 3 (adjust with the `-lv N` CLI arg)
0.00.121.398 I device_info:
0.00.197.914 I - CUDA0 : NVIDIA GeForce RTX 4060 Laptop GPU (7834 MiB, 7718 MiB free)
0.00.197.920 I - CPU : 13th Gen Intel(R) Core(TM) i7-13620H (15687 MiB, 15687 MiB free)
0.00.197.968 I system_info: n_threads = 6 (n_threads_batch = 6) / 16 | CUDA : ARCHS = 890 | USE_GRAPHS = 1 | PEER_MAX_BATCH_SIZE = 128 | FA_ALL_QUANTS = 1 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX_VNNI = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | BMI2 = 1 | LLAMAFILE = 1 | OPENMP = 1 | REPACK = 1 |
0.00.198.006 I srv init: running without SSL
0.00.198.028 I srv init: using 15 threads for HTTP server
0.00.198.095 W srv llama_server: -----------------
0.00.198.096 W srv llama_server: CORS proxy is enabled, do not expose server to untrusted environments
0.00.198.096 W srv llama_server: This feature is EXPERIMENTAL and may be removed or changed in future versions
0.00.198.096 W srv llama_server: -----------------
0.00.198.099 I srv start: binding port with default address family
0.00.199.236 I srv llama_server: loading model
0.00.199.247 I srv load_model: loading model '/home/turbo/Downloads/models/gemmae4b/Gemma-4-E4B-Uncensored-HauhauCS-Aggressive-Q5_K_P.gguf'
/home/turbo/programs/llama.cpp/tools/mtmd/clip.cpp:4391: Unknown projector type
[New LWP 367501]
[New LWP 367500]
[New LWP 367499]
[New LWP 367498]
[New LWP 367497]
[New LWP 367496]
[New LWP 367495]
[New LWP 367494]
[New LWP 367493]
[New LWP 367492]
[New LWP 367491]
[New LWP 367490]
[New LWP 367489]
[New LWP 367488]
[New LWP 367487]
[New LWP 367486]
[New LWP 367485]
[New LWP 367471]
[New LWP 367470]
[New LWP 367452]
[New LWP 367451]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib/libthread_db.so.1".
0x00007fc81aeb4e22 in ?? () from /usr/lib/libc.so.6
#0 0x00007fc81aeb4e22 in ?? () from /usr/lib/libc.so.6
#1 0x00007fc81aea8178 in ?? () from /usr/lib/libc.so.6
#2 0x00007fc81af2fa6b in wait4 () from /usr/lib/libc.so.6
#3 0x00007fc827b3e9fb in ggml_print_backtrace () from /home/turbo/programs/llama.cpp/build/bin/libggml-base.so.0
#4 0x00007fc827b3eb8e in ggml_abort () from /home/turbo/programs/llama.cpp/build/bin/libggml-base.so.0
#5 0x00007fc8281937a6 in clip_n_mmproj_embd(clip_ctx const*) () from /home/turbo/programs/llama.cpp/build/bin/libmtmd.so.0
#6 0x00007fc8280e24b1 in mtmd_context::mtmd_context(char const*, llama_model const*, mtmd_context_params const&, bool) () from /home/turbo/programs/llama.cpp/build/bin/libmtmd.so.0
#7 0x00007fc8280ddb24 in mtmd_get_memory_usage(char const*, mtmd_context_params) () from /home/turbo/programs/llama.cpp/build/bin/libmtmd.so.0
#8 0x00007fc828396c04 in server_context_impl::load_model(common_params&) () from /home/turbo/programs/llama.cpp/build/bin/libllama-server-impl.so
#9 0x00007fc8282d3ac2 in llama_server(int, char**) () from /home/turbo/programs/llama.cpp/build/bin/libllama-server-impl.so
#10 0x00007fc81ae27c8e in ?? () from /usr/lib/libc.so.6
#11 0x00007fc81ae27dcb in __libc_start_main () from /usr/lib/libc.so.6
#12 0x000055b523cd9075 in _start ()
[Inferior 1 (process 367449) detached]
fish: Job 1, 'taskset -c 0,2,4,6,8,10 ~/progr…' terminated by signal SIGABRT (Abort)
Name and Version
❯ ./llama-server --version
version: 9494 (c8d6a00)
built with GNU 16.1.1 for Linux x86_64
Operating systems
Linux
GGML backends
CUDA
Hardware
i7 13620H + rtx 4060 (8gb)
Models
Gemma-4-E4B-Uncensored-HauhauCS-Aggressive-Q5_K_P.gguf
mmproj-Gemma-4-E4B-Uncensored-HauhauCS-Aggressive-f16.gguf
Problem description & steps to reproduce
when i try to load the gemma 4 e4b with mmproj, it crashes with this error message:
"❯ taskset -c 0,2,4,6,8,10 ~/programs/llama.cpp/build/bin/llama-server -fa on --no-mmap --port 8081 --host 0.0.0.0 --reasoning auto -np 1 -m ~/Downloads/models/gemmae4b/Gemma-4-E4B-Uncensored-HauhauCS-Aggressive-Q5_K_P.gguf --mmproj ./mmproj-Gemma-4-E4B-Uncensored-HauhauCS-Aggressive-f16.gguf --no-warmup --jinja --image-min-tokens 560 --image-max-tokens 1120 -ub 512 -b 512 -fitt 50 -c 8000 --temp 1 --webui-mcp-proxy --threads 6 -dio --cache-ram 0 --prio 2 -nocb
0.00.121.396 I log_info: verbosity = 3 (adjust with the
-lv NCLI arg)0.00.121.398 I device_info:
0.00.197.914 I - CUDA0 : NVIDIA GeForce RTX 4060 Laptop GPU (7834 MiB, 7718 MiB free)
0.00.197.920 I - CPU : 13th Gen Intel(R) Core(TM) i7-13620H (15687 MiB, 15687 MiB free)
0.00.197.968 I system_info: n_threads = 6 (n_threads_batch = 6) / 16 | CUDA : ARCHS = 890 | USE_GRAPHS = 1 | PEER_MAX_BATCH_SIZE = 128 | FA_ALL_QUANTS = 1 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX_VNNI = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | BMI2 = 1 | LLAMAFILE = 1 | OPENMP = 1 | REPACK = 1 |
0.00.198.006 I srv init: running without SSL
0.00.198.028 I srv init: using 15 threads for HTTP server
0.00.198.095 W srv llama_server: -----------------
0.00.198.096 W srv llama_server: CORS proxy is enabled, do not expose server to untrusted environments
0.00.198.096 W srv llama_server: This feature is EXPERIMENTAL and may be removed or changed in future versions
0.00.198.096 W srv llama_server: -----------------
0.00.198.099 I srv start: binding port with default address family
0.00.199.236 I srv llama_server: loading model
0.00.199.247 I srv load_model: loading model '/home/turbo/Downloads/models/gemmae4b/Gemma-4-E4B-Uncensored-HauhauCS-Aggressive-Q5_K_P.gguf'
/home/turbo/programs/llama.cpp/tools/mtmd/clip.cpp:4391: Unknown projector type
[New LWP 367501]
[New LWP 367500]
[New LWP 367499]
[New LWP 367498]
[New LWP 367497]
[New LWP 367496]
[New LWP 367495]
[New LWP 367494]
[New LWP 367493]
[New LWP 367492]
[New LWP 367491]
[New LWP 367490]
[New LWP 367489]
[New LWP 367488]
[New LWP 367487]
[New LWP 367486]
[New LWP 367485]
[New LWP 367471]
[New LWP 367470]
[New LWP 367452]
[New LWP 367451]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib/libthread_db.so.1".
0x00007fc81aeb4e22 in ?? () from /usr/lib/libc.so.6
#0 0x00007fc81aeb4e22 in ?? () from /usr/lib/libc.so.6
#1 0x00007fc81aea8178 in ?? () from /usr/lib/libc.so.6
#2 0x00007fc81af2fa6b in wait4 () from /usr/lib/libc.so.6
#3 0x00007fc827b3e9fb in ggml_print_backtrace () from /home/turbo/programs/llama.cpp/build/bin/libggml-base.so.0
#4 0x00007fc827b3eb8e in ggml_abort () from /home/turbo/programs/llama.cpp/build/bin/libggml-base.so.0
#5 0x00007fc8281937a6 in clip_n_mmproj_embd(clip_ctx const*) () from /home/turbo/programs/llama.cpp/build/bin/libmtmd.so.0
#6 0x00007fc8280e24b1 in mtmd_context::mtmd_context(char const*, llama_model const*, mtmd_context_params const&, bool) () from /home/turbo/programs/llama.cpp/build/bin/libmtmd.so.0
#7 0x00007fc8280ddb24 in mtmd_get_memory_usage(char const*, mtmd_context_params) () from /home/turbo/programs/llama.cpp/build/bin/libmtmd.so.0
#8 0x00007fc828396c04 in server_context_impl::load_model(common_params&) () from /home/turbo/programs/llama.cpp/build/bin/libllama-server-impl.so
#9 0x00007fc8282d3ac2 in llama_server(int, char**) () from /home/turbo/programs/llama.cpp/build/bin/libllama-server-impl.so
#10 0x00007fc81ae27c8e in ?? () from /usr/lib/libc.so.6
#11 0x00007fc81ae27dcb in __libc_start_main () from /usr/lib/libc.so.6
#12 0x000055b523cd9075 in _start ()
[Inferior 1 (process 367449) detached]
fish: Job 1, 'taskset -c 0,2,4,6,8,10 ~/progr…' terminated by signal SIGABRT (Abort)"
When i asked ai about it, it said a line is wrong in llama cpp, here is the ai response:
Symptom
Loading a Gemma 4 mmproj that has both vision and audio (clip.vision.projector_type = 'gemma4v', clip.audio.projector_type = 'gemma4a') triggers SIGABRT during model load.
Unknown projector type
clip.cpp:4391: GGML_ABORT("Unknown projector type")
Backtrace
clip_n_mmproj_embd()
→ mtmd_context::mtmd_context()
→ mtmd_get_memory_usage()
→ server_context_impl::load_model()
Root Cause
Commit 951fa5c ("add model") incorrectly replaced case PROJECTOR_TYPE_GEMMA4A with case PROJECTOR_TYPE_GEMMA4UA in clip_n_mmproj_embd(). These are distinct projector types:
Type Purpose Your mmproj
GEMMA4A Conformer audio encoder (mel spectrogram) ✅ clip.audio.projector_type = 'gemma4a'
GEMMA4UA Encoder-free raw-waveform audio ❌ not used
The gemma4a case is handled everywhere else in clip.cpp (builder, hparams, tensor loading) — only clip_n_mmproj_embd was missed.
Fix (one line)
In tools/mtmd/clip.cpp, function clip_n_mmproj_embd():
case PROJECTOR_TYPE_GEMMA4V:
case PROJECTOR_TYPE_GEMMA4UV:
return ctx->model.mm_input_proj_w->ne[1];
Both GEMMA4V (vision) and GEMMA4A (audio) load into the same mm_input_proj_w tensor (from "mm.input_projection.weight" and "mm.a.input_projection.weight" respectively), so the dimension lookup is identical.
First Bad Commit
No response
Relevant log output
Logs