Skip to content

apply_chat_template(tokenize=True) crashes on assistant messages with tool calls and no content #45290

@qgallouedec

Description

@qgallouedec

System Info

Transformers 5.5..0

Who can help?

@zucchini-nlp @Rocketknight1

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

ProcessorMixin.apply_chat_template raises KeyError: 'content' when tokenize=True and the conversation contains an assistant message with tool_calls but no content key.

from transformers import AutoProcessor

processor = AutoProcessor.from_pretrained("Qwen/Qwen2.5-VL-3B-Instruct")

messages = [
    [
        {"role": "user", "content": [{"type": "text", "text": "dummy"}]},
        {"role": "assistant", "tool_calls": [{"type": "function", "function": {"name": "foo", "arguments": {}}}]},
    ]
]

processor.apply_chat_template(messages, tokenize=True)
KeyError: 'content'
  File "processing_utils.py", line 1807, in apply_chat_template
    visuals = [content for content in message["content"] if content["type"] in ["image", "video"]]
                                      ~~~~~~~^^^^^^^^^^^

It comes from these lines

if tokenize:
batch_images, batch_videos = [], []
batch_audios = []
for conversation in conversations:
images, videos = [], []
for message in conversation:
visuals = [content for content in message["content"] if content["type"] in ["image", "video"]]

where it's assumed that all turns contain a content key. However in the codebase, AFAICT, content is always assumed to be optional for assistant turns with tool calls.

Possible fix

Guard the access with .get("content") or skip messages without content:

for message in conversation:
    content = message.get("content") or []
    visuals = [c for c in content if isinstance(c, dict) and c.get("type") in ["image", "video"]]

Expected behavior

to pass with no content

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions