Skip to content

[Bug] ValueError: Invalid input type. Must be a single image, a list of images, or a list of batches of images. while doing GRPO on Gemma3-4B with multiple images #3605

@backpropagator

Description

@backpropagator
  1. Did you update? pip install --upgrade unsloth unsloth_zoo

No, because doing this leads to the following error-

ModuleNotFoundError: No module named 'unsloth_zoo.tiled_mlp'

(I updated this, see the update below)

  1. Colab or Kaggle or local / cloud

local

  1. Number GPUs used, use nvidia-smi

CUDA Version: NVIDIA-SMI 580.82.07 Driver Version: 580.82.07 CUDA Version: 13.0

Number of GPUs: 2

Type: NVIDIA A100-SXM4-80GB

  1. Which notebook? Please link!

A modified version of Gemma3 Vision GRPO notebook

  1. Which Unsloth version, TRL version, transformers version, PyTorch version?

Used the following lines to answer this-

import unsloth
import trl
import transformers
import torch

print(f"Unsloth version: {unsloth.__version__}")
print(f"TRL version: {trl.__version__}")
print(f"Transformers version: {transformers.__version__}")
print(f"PyTorch version: {torch.__version__}")

The output for this is-

Unsloth version: 2025.11.3
TRL version: 0.22.2
Transformers version: 4.56.2
PyTorch version: 2.8.0+cu128
  1. Which trainer? SFTTrainer, GRPOTrainer etc

GRPOTrainer

Here is a minimal code similar to the one in the notebook mentioned above:

def make_conversation(example):
    # Define placeholder constants if they are not defined globally

    # The user's text prompt
    text_content = (example['overall_prompt'])

    image_1 = Image.open(example['img_1_path']).convert("RGB")
    image_2 = Image.open(example['img_2_path']).convert("RGB")

    image_list = [image_1, image_2]

    # Construct the prompt in the desired multi-modal format
    prompt = [
        {
            "role": "user",
            "content": [
                {"type": "image"},  # Placeholder for the image 1
                {"type": "image"},  # Placeholder for the image 2
                {"type": "text", "text": text_content},  # The text part of the prompt
            ],
        },
    ]

    # The actual image data is kept separate for the processor
    return {"prompt": prompt, "image": image_list, "answer": example["answer"]}


def apply_template(example):
    example["prompt"] = tokenizer.apply_chat_template(
        example["prompt"],
        tokenize=False,
        add_generation_prompt=False 
    )
    return example



dataset = dataset.map(make_conversation)
dataset = dataset.map(apply_template)

It seems that the following check fails when the code enters image_utils:

if (
        isinstance(images, (list, tuple))
        and all(isinstance(images_i, (list, tuple)) for images_i in images)
        and all(is_valid_list_of_images(images_i) for images_i in images)
    ):
        return images

    # If it's a list of images, it's a single batch, so convert it to a list of lists
    if isinstance(images, (list, tuple)) and is_valid_list_of_images(images):
        if is_pil_image(images[0]) or images[0].ndim == expected_ndims:
            return [images]
        if images[0].ndim == expected_ndims + 1:
            return [list(image) for image in images]

    # If it's a single image, convert it to a list of lists
    if is_valid_image(images):
        if is_pil_image(images) or images.ndim == expected_ndims:
            return [[images]]
        if images.ndim == expected_ndims + 1:
            return [list(images)]

The images just before these checks is-

images in make_nested_list_of_images(): [[[<PIL.PngImagePlugin.PngImageFile image mode=RGB size=512x512 at 0x7FBBB452C220>, <PIL.PngImagePlugin.PngImageFile image mode=RGB size=512x512 at 0x7FBBB452C340>]], [[<PIL.PngImagePlugin.PngImageFile image mode=RGB size=512x512 at 0x7FBBB452C1F0>, <PIL.PngImagePlugin.PngImageFile image mode=RGB size=512x512 at 0x7FBBB452C400>]], [[<PIL.PngImagePlugin.PngImageFile image mode=RGB size=512x512 at 0x7FBBB452C280>, <PIL.PngImagePlugin.PngImageFile image mode=RGB size=512x512 at 0x7FBBB452C4C0>]], [[<PIL.PngImagePlugin.PngImageFile image mode=RGB size=512x512 at 0x7FBBB452C490>, <PIL.PngImagePlugin.PngImageFile image mode=RGB size=512x512 at 0x7FBBB452C580>]]]

So, it seems that somehow the images are interleaved in an extra list which causes this issue.

Happy to provide any other information needed to debug this.

Update:

I updated the unsloth and unsloth_zoo libraries, however, the error still persists.

I updated the libraries by-
pip install --upgrade --force-reinstall --no-deps unsloth unsloth_zoo

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions