Description:
When attempting to load the gpt-oss-20b model (multiple quantization versions), llama.cpp fails with an error about an invalid tensor type. The issue persists across different quantizations of the same model.
Error message:
gguf_init_from_file_impl: tensor 'blk.0.ffn_down_exps.weight' has invalid ggml type 39 (NONE)
gguf_init_from_file_impl: failed to read tensor info
llama_model_load: error loading model: llama_model_loader: failed to load model from I:\SKLAD\!Models_GGUF\Unsloth\gpt-oss-20b-Q4_K_M.gguf
Reproduction steps:
- Download any quantization version of gpt-oss-20b GGUF model
- Run the following command:
llama-cli.exe -m I:\SKLAD\!Models_GGUF\Unsloth\gpt-oss-20b-Q4_K_M.gguf --threads 6 --prio 2 --ctx-size 32768 --flash-attn --batch-size 24 --n-predict -2 --min-p 0.0 --mlock --no-mmap --temp 0.3 --n-gpu-layers 99 --top-k 20 --top-p 0.8 --repeat-penalty 1.0 --multiline-input --no-display-prompt
Expected behavior:
The model should load successfully and be ready for inference.
Actual behavior:
The loader fails with the same error about invalid tensor type across all quantization versions tried.
System information:
- Windows 10 [Version 10.0.19045.5487]
- NVIDIA GeForce RTX 4060 Ti (16GB VRAM)
- llama.cpp build 6082 (5aa1105d) with MSVC 19.43.34810.0 for x64
Additional information:
- The error persists across multiple quantization versions of the same model (tried Q4_K_M and others)
- The common factor is the tensor 'blk.0.ffn_down_exps.weight' having an invalid type (NONE)
- This suggests either:
- A fundamental issue with the GGUF conversion of this particular model
- An incompatibility between the model's architecture and current llama.cpp implementation
- Corrupted source files used for conversion
Troubleshooting steps attempted:
- Verified CUDA is working (device detected successfully)
- Confirmed sufficient VRAM is available (15GB free)
- Tried multiple quantization versions of the same model
- Verified file integrity (no download errors)
Microsoft Windows [Version 10.0.19045.5487]
(c) Microsoft Corporation. All rights reserved.
C:\Windows\System32>C:\llama.cpp\build\bin\Release\llama-cli.exe -m I:\SKLAD!Models_GGUF\Unsloth\gpt-oss-20b-Q4_K_M.gguf --threads 6 --prio 2 --ctx-size 32768 --flash-attn --batch-size 24 --n-predict -2 --min-p 0.0 --mlock --no-mmap --temp 0.3 --n-gpu-layers 99 --top-k 20 --top-p 0.8 --repeat-penalty 1.0 --multiline-input --no-display-prompt
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
Device 0: NVIDIA GeForce RTX 4060 Ti, compute capability 8.9, VMM: yes
build: 6082 (5aa1105d) with MSVC 19.43.34810.0 for x64
main: llama backend init
main: load the model and apply lora adapter, if any
llama_model_load_from_file_impl: using device CUDA0 (NVIDIA GeForce RTX 4060 Ti) - 15225 MiB free
gguf_init_from_file_impl: tensor 'blk.0.ffn_down_exps.weight' has invalid ggml type 39 (NONE)
gguf_init_from_file_impl: failed to read tensor info
llama_model_load: error loading model: llama_model_loader: failed to load model from I:\SKLAD!Models_GGUF\Unsloth\gpt-oss-20b-Q4_K_M.gguf
llama_model_load_from_file_impl: failed to load model
common_init_from_params: failed to load model 'I:\SKLAD!Models_GGUF\Unsloth\gpt-oss-20b-Q4_K_M.gguf'
main: error: unable to load model
C:\Windows\System32>
Description:
When attempting to load the
gpt-oss-20bmodel (multiple quantization versions), llama.cpp fails with an error about an invalid tensor type. The issue persists across different quantizations of the same model.Error message:
Reproduction steps:
Expected behavior:
The model should load successfully and be ready for inference.
Actual behavior:
The loader fails with the same error about invalid tensor type across all quantization versions tried.
System information:
Additional information:
Troubleshooting steps attempted:
Microsoft Windows [Version 10.0.19045.5487]
(c) Microsoft Corporation. All rights reserved.
C:\Windows\System32>C:\llama.cpp\build\bin\Release\llama-cli.exe -m I:\SKLAD!Models_GGUF\Unsloth\gpt-oss-20b-Q4_K_M.gguf --threads 6 --prio 2 --ctx-size 32768 --flash-attn --batch-size 24 --n-predict -2 --min-p 0.0 --mlock --no-mmap --temp 0.3 --n-gpu-layers 99 --top-k 20 --top-p 0.8 --repeat-penalty 1.0 --multiline-input --no-display-prompt
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
Device 0: NVIDIA GeForce RTX 4060 Ti, compute capability 8.9, VMM: yes
build: 6082 (5aa1105d) with MSVC 19.43.34810.0 for x64
main: llama backend init
main: load the model and apply lora adapter, if any
llama_model_load_from_file_impl: using device CUDA0 (NVIDIA GeForce RTX 4060 Ti) - 15225 MiB free
gguf_init_from_file_impl: tensor 'blk.0.ffn_down_exps.weight' has invalid ggml type 39 (NONE)
gguf_init_from_file_impl: failed to read tensor info
llama_model_load: error loading model: llama_model_loader: failed to load model from I:\SKLAD!Models_GGUF\Unsloth\gpt-oss-20b-Q4_K_M.gguf
llama_model_load_from_file_impl: failed to load model
common_init_from_params: failed to load model 'I:\SKLAD!Models_GGUF\Unsloth\gpt-oss-20b-Q4_K_M.gguf'
main: error: unable to load model
C:\Windows\System32>