[BUG] Qwen3.6-35B-A3B / llama-server merges consecutive images into 2 frames, causing incorrect image count and partial image understanding

## Summary

When using `llama-server` with `Qwen3.6-35B-A3B` and a matching `mmproj`, consecutive images in a single user message are sometimes merged into super-frames. As a result, 4 uploaded images are interpreted as 2 images, and the model can only describe part of the visual content.

## Environment

- `llama.cpp` release: `b9553`
- Model: `Qwen3.6-35B-A3B-Q4_K_M.gguf`
- mmproj: `mmproj-Qwen3.6-35B-A3B-BF16.gguf`
- Server: `llama-server`
- UI: browser chat page
- Also reproducible through the OpenAI-compatible API

## Reproduction steps

1. Start `llama-server` with:
   ```bash
   llama-server -m Qwen3.6-35B-A3B-Q4_K_M.gguf --mmproj mmproj-Qwen3.6-35B-A3B-BF16.gguf
   ```

2. Open the web UI.

3. Upload 4 images in one message.

4. Ask: `How many images are there?`

5. Observe that the model answers `2` instead of `4`.

## API reproduction

This works correctly if images are separated by text:

```json
{
  "messages": [
    {
      "role": "user",
      "content": [
        { "type": "text", "text": "[image 1]" },
        { "type": "image_url", "image_url": { "url": "<BASE64>" } },
        { "type": "text", "text": "[image 2]" },
        { "type": "image_url", "image_url": { "url": "<BASE64>" } },
        { "type": "text", "text": "[image 3]" },
        { "type": "image_url", "image_url": { "url": "<BASE64>" } },
        { "type": "text", "text": "[image 4]" },
        { "type": "image_url", "image_url": { "url": "<BASE64>" } },
        { "type": "text", "text": "How many images are there?" }
      ]
    }
  ]
}
```

In this case, the model answers `4`.

However, when the same 4 images are sent consecutively without text separators:

```json
{
  "messages": [
    {
      "role": "user",
      "content": [
        { "type": "image_url", "image_url": { "url": "<BASE64>" } },
        { "type": "image_url", "image_url": { "url": "<BASE64>" } },
        { "type": "image_url", "image_url": { "url": "<BASE64>" } },
        { "type": "image_url", "image_url": { "url": "<BASE64>" } },
        { "type": "text", "text": "How many images are there?" }
      ]
    }
  ]
}
```

the model answers `2`.

## Expected behavior

* All 4 images should be treated as 4 separate images.
* The model should answer `4`.
* The model should be able to describe content from all 4 images independently.

## Actual behavior

* Consecutive images appear to be merged into 2 units.
* The model answers `2`.
* When asked about the image contents, it only describes part of the images.

## Notes

This seems related to the recent frame-merge / super-frame behavior for consecutive images in Qwen-VL-style models.

It looks like the merge is triggered only when images are consecutive in the content array. If text is inserted between images, the issue disappears.

## Possible impact

This breaks multimodal behavior for users who upload multiple images in one turn, because the model may under-count images and miss part of the visual context.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Qwen3.6-35B-A3B / llama-server merges consecutive images into 2 frames, causing incorrect image count and partial image understanding #24303

Summary

Environment

Reproduction steps

API reproduction

Expected behavior

Actual behavior

Notes

Possible impact

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[BUG] Qwen3.6-35B-A3B / llama-server merges consecutive images into 2 frames, causing incorrect image count and partial image understanding #24303

Description

Summary

Environment

Reproduction steps

API reproduction

Expected behavior

Actual behavior

Notes

Possible impact

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions