[Bug] Failed to load gpt-oss-20b GGUF model - Invalid tensor type across multiple quantizations

Description:
When attempting to load the `gpt-oss-20b` model (multiple quantization versions), llama.cpp fails with an error about an invalid tensor type. The issue persists across different quantizations of the same model.

Error message:
```
gguf_init_from_file_impl: tensor 'blk.0.ffn_down_exps.weight' has invalid ggml type 39 (NONE)
gguf_init_from_file_impl: failed to read tensor info
llama_model_load: error loading model: llama_model_loader: failed to load model from I:\SKLAD\!Models_GGUF\Unsloth\gpt-oss-20b-Q4_K_M.gguf
```

Reproduction steps:
1. Download any quantization version of gpt-oss-20b GGUF model
2. Run the following command:
```
llama-cli.exe -m I:\SKLAD\!Models_GGUF\Unsloth\gpt-oss-20b-Q4_K_M.gguf --threads 6 --prio 2 --ctx-size 32768 --flash-attn --batch-size 24 --n-predict -2 --min-p 0.0 --mlock --no-mmap --temp 0.3 --n-gpu-layers 99 --top-k 20 --top-p 0.8 --repeat-penalty 1.0 --multiline-input --no-display-prompt
```

Expected behavior:
The model should load successfully and be ready for inference.

Actual behavior:
The loader fails with the same error about invalid tensor type across all quantization versions tried.

System information:
- Windows 10 [Version 10.0.19045.5487]
- NVIDIA GeForce RTX 4060 Ti (16GB VRAM)
- llama.cpp build 6082 (5aa1105d) with MSVC 19.43.34810.0 for x64

Additional information:
1. The error persists across multiple quantization versions of the same model (tried Q4_K_M and others)
2. The common factor is the tensor 'blk.0.ffn_down_exps.weight' having an invalid type (NONE)
3. This suggests either:
   - A fundamental issue with the GGUF conversion of this particular model
   - An incompatibility between the model's architecture and current llama.cpp implementation
   - Corrupted source files used for conversion

Troubleshooting steps attempted:
- Verified CUDA is working (device detected successfully)
- Confirmed sufficient VRAM is available (15GB free)
- Tried multiple quantization versions of the same model
- Verified file integrity (no download errors)
```

```
Microsoft Windows [Version 10.0.19045.5487]
(c) Microsoft Corporation. All rights reserved.

C:\Windows\System32>C:\llama.cpp\build\bin\Release\llama-cli.exe -m I:\SKLAD\!Models_GGUF\Unsloth\gpt-oss-20b-Q4_K_M.gguf  --threads 6 --prio 2 --ctx-size 32768 --flash-attn --batch-size 24 --n-predict -2  --min-p 0.0 --mlock --no-mmap --temp 0.3 --n-gpu-layers 99  --top-k 20 --top-p 0.8 --repeat-penalty 1.0 --multiline-input --no-display-prompt
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA GeForce RTX 4060 Ti, compute capability 8.9, VMM: yes
build: 6082 (5aa1105d) with MSVC 19.43.34810.0 for x64
main: llama backend init
main: load the model and apply lora adapter, if any
llama_model_load_from_file_impl: using device CUDA0 (NVIDIA GeForce RTX 4060 Ti) - 15225 MiB free
gguf_init_from_file_impl: tensor 'blk.0.ffn_down_exps.weight' has invalid ggml type 39 (NONE)
gguf_init_from_file_impl: failed to read tensor info
llama_model_load: error loading model: llama_model_loader: failed to load model from I:\SKLAD\!Models_GGUF\Unsloth\gpt-oss-20b-Q4_K_M.gguf
llama_model_load_from_file_impl: failed to load model
common_init_from_params: failed to load model 'I:\SKLAD\!Models_GGUF\Unsloth\gpt-oss-20b-Q4_K_M.gguf'
main: error: unable to load model

C:\Windows\System32>
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug] Failed to load gpt-oss-20b GGUF model - Invalid tensor type across multiple quantizations #3124

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

[Bug] Failed to load gpt-oss-20b GGUF model - Invalid tensor type across multiple quantizations #3124

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions