chat/completions endpoint returns 500 when mmproj is loaded (Qwen3.5-27B VLM)

## Bug Description

The `/v1/chat/completions` endpoint returns a 500 error with `"Failed to parse input at pos 25"` when a multimodal projector (`--mmproj`) is loaded alongside the model. The `/completion` endpoint works fine with the same model.

## Environment

- **llama-server version:** 1 (d6f999b), built with GNU 11.4.0 for Linux x86_64
- **Model:** `Qwen3.5-27B-Q8_0.gguf` + `Qwen3.5-27B-mmproj-BF16.gguf`
- **Hardware:** 2x RTX 3090 (48GB VRAM), Linux x86_64
- **Launch flags:**
  ```
  llama-server --host 127.0.0.1 --metrics --port 41131 \
    --remap-developer-role --alias qwen3.5-27b --cont-batching \
    --cache-type-k q8_0 --cache-type-v q8_0 --flash-attn on \
    --model /models/gguf/qwen3.5-27b/Qwen3.5-27B-Q8_0.gguf \
    --mmproj /models/gguf/qwen3.5-27b/Qwen3.5-27B-mmproj-BF16.gguf \
    --n-gpu-layers 999 --parallel 1
  ```

## Reproduction

**Failing request** (`/v1/chat/completions`):
```bash
curl -s http://127.0.0.1:41131/v1/chat/completions -X POST \
  -H "Content-Type: application/json" \
  -d '{"model":"qwen3.5-27b","messages":[{"role":"user","content":"hi"}],"max_tokens":8}'
```

Response:
```json
{"error":{"code":500,"message":"Failed to parse input at pos 25: ","type":"server_error"}}
```

All content formats fail — both `"content": "string"` and `"content": [{"type": "text", "text": "..."}]` produce the same error (different pos values: 25 and 53 respectively).

**Working request** (`/completion` with manual Qwen chat template):
```bash
curl -s http://127.0.0.1:41131/completion -X POST \
  -H "Content-Type: application/json" \
  -d '{"model":"qwen3.5-27b","prompt":"<|im_start|>user\nhi<|im_end|>\n<|im_start|>assistant\n","n_predict":16,"stop":["<|im_end|>"]}'
```

This works perfectly and returns a valid response.

## Root Cause Analysis

The GGUF file **does** contain a valid chat template at `tokenizer.chat_template` (a Qwen vision template with image/video handling). However, when querying the `/props` endpoint, `chat_template` is not reported — suggesting llama-server ignores/disables the embedded chat template when `--mmproj` is loaded.

Without a chat template, the chat completions endpoint cannot parse the `messages` array, hence the "Failed to parse input" error.

## Expected Behavior

The `/v1/chat/completions` endpoint should work with VLM models that have mmproj loaded, using the embedded chat template from the GGUF metadata. Text-only chat requests should be handled normally, and multimodal requests (with `image_url` content parts) should route through the vision pipeline.

## Workaround

Use the `/completion` endpoint with the Qwen chat template applied manually:
```
<|im_start|>user\n{message}<|im_end|>\n<|im_start|>assistant\n<think>\n</think>\n
```

For image inputs, use `image_data` parameter with the `/completion` endpoint.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chat/completions endpoint returns 500 when mmproj is loaded (Qwen3.5-27B VLM) #16

Bug Description

Environment

Reproduction

Root Cause Analysis

Expected Behavior

Workaround

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

chat/completions endpoint returns 500 when mmproj is loaded (Qwen3.5-27B VLM) #16

Description

Bug Description

Environment

Reproduction

Root Cause Analysis

Expected Behavior

Workaround

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions