Skip to content

[Bug] Failed to load gpt-oss-20b GGUF model - Invalid tensor type across multiple quantizations #3124

@Oleg777778

Description

@Oleg777778

Description:
When attempting to load the gpt-oss-20b model (multiple quantization versions), llama.cpp fails with an error about an invalid tensor type. The issue persists across different quantizations of the same model.

Error message:

gguf_init_from_file_impl: tensor 'blk.0.ffn_down_exps.weight' has invalid ggml type 39 (NONE)
gguf_init_from_file_impl: failed to read tensor info
llama_model_load: error loading model: llama_model_loader: failed to load model from I:\SKLAD\!Models_GGUF\Unsloth\gpt-oss-20b-Q4_K_M.gguf

Reproduction steps:

  1. Download any quantization version of gpt-oss-20b GGUF model
  2. Run the following command:
llama-cli.exe -m I:\SKLAD\!Models_GGUF\Unsloth\gpt-oss-20b-Q4_K_M.gguf --threads 6 --prio 2 --ctx-size 32768 --flash-attn --batch-size 24 --n-predict -2 --min-p 0.0 --mlock --no-mmap --temp 0.3 --n-gpu-layers 99 --top-k 20 --top-p 0.8 --repeat-penalty 1.0 --multiline-input --no-display-prompt

Expected behavior:
The model should load successfully and be ready for inference.

Actual behavior:
The loader fails with the same error about invalid tensor type across all quantization versions tried.

System information:

  • Windows 10 [Version 10.0.19045.5487]
  • NVIDIA GeForce RTX 4060 Ti (16GB VRAM)
  • llama.cpp build 6082 (5aa1105d) with MSVC 19.43.34810.0 for x64

Additional information:

  1. The error persists across multiple quantization versions of the same model (tried Q4_K_M and others)
  2. The common factor is the tensor 'blk.0.ffn_down_exps.weight' having an invalid type (NONE)
  3. This suggests either:
    • A fundamental issue with the GGUF conversion of this particular model
    • An incompatibility between the model's architecture and current llama.cpp implementation
    • Corrupted source files used for conversion

Troubleshooting steps attempted:

  • Verified CUDA is working (device detected successfully)
  • Confirmed sufficient VRAM is available (15GB free)
  • Tried multiple quantization versions of the same model
  • Verified file integrity (no download errors)

Microsoft Windows [Version 10.0.19045.5487]
(c) Microsoft Corporation. All rights reserved.

C:\Windows\System32>C:\llama.cpp\build\bin\Release\llama-cli.exe -m I:\SKLAD!Models_GGUF\Unsloth\gpt-oss-20b-Q4_K_M.gguf --threads 6 --prio 2 --ctx-size 32768 --flash-attn --batch-size 24 --n-predict -2 --min-p 0.0 --mlock --no-mmap --temp 0.3 --n-gpu-layers 99 --top-k 20 --top-p 0.8 --repeat-penalty 1.0 --multiline-input --no-display-prompt
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
Device 0: NVIDIA GeForce RTX 4060 Ti, compute capability 8.9, VMM: yes
build: 6082 (5aa1105d) with MSVC 19.43.34810.0 for x64
main: llama backend init
main: load the model and apply lora adapter, if any
llama_model_load_from_file_impl: using device CUDA0 (NVIDIA GeForce RTX 4060 Ti) - 15225 MiB free
gguf_init_from_file_impl: tensor 'blk.0.ffn_down_exps.weight' has invalid ggml type 39 (NONE)
gguf_init_from_file_impl: failed to read tensor info
llama_model_load: error loading model: llama_model_loader: failed to load model from I:\SKLAD!Models_GGUF\Unsloth\gpt-oss-20b-Q4_K_M.gguf
llama_model_load_from_file_impl: failed to load model
common_init_from_params: failed to load model 'I:\SKLAD!Models_GGUF\Unsloth\gpt-oss-20b-Q4_K_M.gguf'
main: error: unable to load model

C:\Windows\System32>

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions