[Bug] `ValueError: Invalid input type. Must be a single image, a list of images, or a list of batches of images.` while doing GRPO on Gemma3-4B  with multiple images

1. Did you update? `pip install --upgrade unsloth unsloth_zoo`

No, because doing this leads to the following error-

`ModuleNotFoundError: No module named 'unsloth_zoo.tiled_mlp'`

(I updated this, see the update below)

2. `Colab` or `Kaggle` or local / cloud

`local`

3. Number GPUs used, use `nvidia-smi`

CUDA Version: `NVIDIA-SMI 580.82.07              Driver Version: 580.82.07      CUDA Version: 13.0`

Number of GPUs: `2`

Type: `NVIDIA A100-SXM4-80GB`


4. Which notebook? Please link!

A modified version of [Gemma3 Vision GRPO notebook](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Gemma3_(4B)-Vision-GRPO.ipynb)

5. Which Unsloth version, TRL version, transformers version, PyTorch version?

Used the following lines to answer this-

```python
import unsloth
import trl
import transformers
import torch

print(f"Unsloth version: {unsloth.__version__}")
print(f"TRL version: {trl.__version__}")
print(f"Transformers version: {transformers.__version__}")
print(f"PyTorch version: {torch.__version__}")
```


The output for this is-

```
Unsloth version: 2025.11.3
TRL version: 0.22.2
Transformers version: 4.56.2
PyTorch version: 2.8.0+cu128
```


6. Which trainer? `SFTTrainer`, `GRPOTrainer` etc

`GRPOTrainer`

Here is a minimal code similar to the one in the notebook mentioned above:

```python

def make_conversation(example):
    # Define placeholder constants if they are not defined globally

    # The user's text prompt
    text_content = (example['overall_prompt'])

    image_1 = Image.open(example['img_1_path']).convert("RGB")
    image_2 = Image.open(example['img_2_path']).convert("RGB")

    image_list = [image_1, image_2]

    # Construct the prompt in the desired multi-modal format
    prompt = [
        {
            "role": "user",
            "content": [
                {"type": "image"},  # Placeholder for the image 1
                {"type": "image"},  # Placeholder for the image 2
                {"type": "text", "text": text_content},  # The text part of the prompt
            ],
        },
    ]

    # The actual image data is kept separate for the processor
    return {"prompt": prompt, "image": image_list, "answer": example["answer"]}


def apply_template(example):
    example["prompt"] = tokenizer.apply_chat_template(
        example["prompt"],
        tokenize=False,
        add_generation_prompt=False 
    )
    return example



dataset = dataset.map(make_conversation)
dataset = dataset.map(apply_template)

```

It seems that the following check fails when the code enters image_utils:

```python
if (
        isinstance(images, (list, tuple))
        and all(isinstance(images_i, (list, tuple)) for images_i in images)
        and all(is_valid_list_of_images(images_i) for images_i in images)
    ):
        return images

    # If it's a list of images, it's a single batch, so convert it to a list of lists
    if isinstance(images, (list, tuple)) and is_valid_list_of_images(images):
        if is_pil_image(images[0]) or images[0].ndim == expected_ndims:
            return [images]
        if images[0].ndim == expected_ndims + 1:
            return [list(image) for image in images]

    # If it's a single image, convert it to a list of lists
    if is_valid_image(images):
        if is_pil_image(images) or images.ndim == expected_ndims:
            return [[images]]
        if images.ndim == expected_ndims + 1:
            return [list(images)]
```

The `images` just before these checks is-

```
images in make_nested_list_of_images(): [[[<PIL.PngImagePlugin.PngImageFile image mode=RGB size=512x512 at 0x7FBBB452C220>, <PIL.PngImagePlugin.PngImageFile image mode=RGB size=512x512 at 0x7FBBB452C340>]], [[<PIL.PngImagePlugin.PngImageFile image mode=RGB size=512x512 at 0x7FBBB452C1F0>, <PIL.PngImagePlugin.PngImageFile image mode=RGB size=512x512 at 0x7FBBB452C400>]], [[<PIL.PngImagePlugin.PngImageFile image mode=RGB size=512x512 at 0x7FBBB452C280>, <PIL.PngImagePlugin.PngImageFile image mode=RGB size=512x512 at 0x7FBBB452C4C0>]], [[<PIL.PngImagePlugin.PngImageFile image mode=RGB size=512x512 at 0x7FBBB452C490>, <PIL.PngImagePlugin.PngImageFile image mode=RGB size=512x512 at 0x7FBBB452C580>]]]
```

So, it seems that somehow the images are interleaved in an extra list which causes this issue.


Happy to provide any other information needed to debug this.


### Update:

I updated the unsloth and unsloth_zoo libraries, however, the error still persists. 

I updated the libraries by-
`pip install --upgrade --force-reinstall --no-deps unsloth unsloth_zoo`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug] `ValueError: Invalid input type. Must be a single image, a list of images, or a list of batches of images.` while doing GRPO on Gemma3-4B with multiple images #3605

Update:

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

[Bug] ValueError: Invalid input type. Must be a single image, a list of images, or a list of batches of images. while doing GRPO on Gemma3-4B with multiple images #3605

Description

Update:

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

[Bug] `ValueError: Invalid input type. Must be a single image, a list of images, or a list of batches of images.` while doing GRPO on Gemma3-4B with multiple images #3605