Eval bug: llama-server crashes with Qwen3.6-35B-A3B

### Name and Version

```
version: 9525 (ad1b88ca0)
built with MSVC 19.44.35221.0 for Windows AMD64
```

### Operating systems

Windows

### GGML backends

CPU, CUDA

### Hardware

AMD Ryzen 9 5900X + NVIDIA GeForce RTX 3090 (24 GB)

### Models

Qwen3.6-35B-A3B-UD-IQ4_XS.gguf

### Problem description & steps to reproduce

`llama-server` crashes with an access violation on chat completion request with Qwen3.6-35B-A3B. The model is partially offloaded to CPU.

```shell
llama-server -m Qwen3.6-35B-A3B-UD-IQ4_XS.gguf
```

Steps:
1. Start the server with the command above.
2. Send a chat request with a short prompt, e.g. `{"messages":[{"role":"user","content":"Hi"}]}`, completes normally.
3. Send a chat request with a longer prompt (a few hundred tokens), server crashes.

### First Bad Commit

7acb4e8cd2ce21f457d1298e75fad729520d263c - hparams : refactor `hparams.n_layer` (#24060)

The parent commit 3ecfb150a does not crash.

### Relevant log output

Log: [llama-server-lv4.log](https://github.com/user-attachments/files/28658054/llama-server-lv4.log)

The following is a stack trace from a custom crash handler [I use on Windows](https://gist.github.com/aldehir/25bd77168354146e602a7b9be949125c).

<details>
<summary>Crash / stack trace</summary>

```
=== CRASH (unhandled exception) ===
Exception code:    0xC0000005
Exception address: 0x00007FFCD1A60BD3

Stack trace:
  #0   0x00007ffcd1a60bd3 _NLG_Return2+0x5a3
  #1   0x00007ffc506f34c6 ggml_vec_cpy_f32+0x56 (ggml/src/ggml-cpu/vec.h:119)
  #2   0x00007ffc506e3f2b ggml_compute_forward_set_f32+0x2fb (ggml/src/ggml-cpu/ops.cpp:4599)
  #3   0x00007ffc506399ad ggml_graph_compute_thread+0xdd (ggml/src/ggml-cpu/ggml-cpu.c:3062)
  #4   0x00007ffcbe501801 vcomp_fork+0x2d1
  #5   0x00007ffcbe5017c2 vcomp_fork+0x292
  #6   0x00007ffcbe509041 vcomp_atomic_div_r8+0xb81
  #7   0x00007ffcbe5016e1 vcomp_fork+0x1b1
  #8   0x00007ffc5063974d ggml_graph_compute+0x19d (ggml/src/ggml-cpu/ggml-cpu.c:3333)
  #9   0x00007ffc5063c23e ggml_backend_cpu_graph_compute+0xbe (ggml/src/ggml-cpu/ggml-cpu.cpp:191)
  #10  0x00007ffc6dd96eeb ggml_backend_sched_compute_splits+0x58b (ggml/src/ggml-backend.cpp:1678)
  #11  0x00007ffc2a7ece14 llama_context::graph_compute+0xa4 (src/llama-context.cpp:2334)
  #12  0x00007ffc2a7f0ec6 llama_context::process_ubatch+0xf6 (src/llama-context.cpp:1317)
  #13  0x00007ffc2a7ea47b llama_context::decode+0x68b (src/llama-context.cpp:1795)
  #14  0x00007ffc2a7f4cfb llama_decode+0xb (src/llama-context.cpp:3933)
  #15  0x00007ffbc9030d9a server_context_impl::update_slots+0x3d5a (tools/server/server-context.cpp:3186)
  #16  0x00007ffbc90c94ed server_queue::start_loop+0x65d (tools/server/server-queue.cpp:166)
  #17  0x00007ffbc8f09e0c llama_server+0x346c (tools/server/server.cpp:354)
  #18  0x00007ff77cd32008 __scrt_common_main_seh+0x10c
  #19  0x00007ffceb82e957 BaseThreadInitThunk+0x17
  #20  0x00007ffceda8427c RtlUserThreadStart+0x2c
```

From minidump:

```
ExceptionCode: c0000005 (Access violation)
Attempt to write to address 000000205df9b0a0
```
</details>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Eval bug: llama-server crashes with Qwen3.6-35B-A3B #24223

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Eval bug: llama-server crashes with Qwen3.6-35B-A3B #24223

Description

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions