[BUG] GGUF files created from Gemma 3 models lose the 'vision' capability

1. **Bug Description**
When using the [colab notebook for Gemma3_(4B)](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Gemma3_(4B).ipynb), notice that the model in the .gguf file, when used in ollama, does not have the 'vision' capability.

2. **Reproduction Steps:**
- **Get model and tokenizer from unsloth:**
```py
from unsloth import FastModel
model, tokenizer = FastModel.from_pretrained(
    model_name = "unsloth/gemma-3-4b-it",
    max_seq_length = 2048, # Choose any for long context!
    load_in_4bit = True,  # 4 bit quantization to reduce memory
    load_in_8bit = False, # [NEW!] A bit more accurate, uses 2x memory
    full_finetuning = False, # [NEW!] We have full finetuning now!
    # token = "hf_...", # use one if using gated models
)
```

---
- **Save LoRA adapters** (Pay attention to the directory, as you need to use the same directory to save the gguf file)
```py
model.save_pretrained("mano-wii/gemma-3-finetune")  # Local saving
tokenizer.save_pretrained("mano-wii/gemma-3-finetune")
```

---
- **Save to GGUF**
```py
model.save_pretrained_gguf(
    "mano-wii/gemma-3-finetune",
    quantization_type = "Q8_0", # For now only Q8_0, BF16, F16 supported
)
```

---
- **Optionally send to huggingface:**
```py
from unsloth_zoo.saving_utils import prepare_saving
repo_id = "mano-wii/gemma-3-finetune-gguf"
prepare_saving(
    model,
    repo_id,
    push_to_hub = True,
    max_shard_size = "50GB",
    private = False,
    token = hf_token,
)

from huggingface_hub import HfApi
api = HfApi(token = hf_token)
api.upload_folder(
    folder_path = "mano-wii",
    repo_id = repo_id,
    repo_type = "model",
    allow_patterns = ["*.gguf"],
)
```

3. **Expected Behavior:**
When using the `ollama show hf.co/mano-wii/gemma-3-finetune-gguf` command, `vision` should be in `Capabilities` (as `gemma3`):
```
PS D:\> ollama show gemma3
  Model
    architecture        gemma3
    parameters          4.3B
    context length      8192
    embedding length    2560
    quantization        Q4_K_M

  Capabilities
    completion
    vision

  Parameters
    stop           "<end_of_turn>"
    temperature    0.1
```
   
4. **Actual Behavior:**
Note no `vision` in `Capabilities` when using `ollama show hf.co/mano-wii/gemma-3-finetune-gguf`:
```
PS D:\> ollama show hf.co/mano-wii/gemma-3-finetune-gguf
  Model
    architecture        gemma3
    parameters          3.9B
    context length      131072
    embedding length    2560
    quantization        unknown

  Capabilities
    completion

  Parameters
    stop           "<end_of_turn>"
    temperature    0.1
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[BUG] GGUF files created from Gemma 3 models lose the 'vision' capability #2290

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

[BUG] GGUF files created from Gemma 3 models lose the 'vision' capability #2290

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions