Misc. bug: Gemma 4 template changes causing degraded inference speed

### Name and Version

llama-cli --version
version: 8744 (d7ff074c8)
built with Clang 19.1.5 for Windows x86_64



### Operating systems

Windows

### Which llama.cpp modules do you know to be affected?

llama-server

### Command line

```shell

```

### Problem description & steps to reproduce

Token generation speed drops from 98t/s down to 70t/s with new reasoning budget template changes in llama.cpp.

The last good build that works is: https://github.com/ggml-org/llama.cpp/releases/tag/b8742

The build where the performance degradation starts is: https://github.com/ggml-org/llama.cpp/releases/tag/b8744

I've been using the Windows x64 Vulkan build.


**Verification:**

I've also tested by adding the --chat-template parameter to use a different template using the same build, and performance goes back up to 98t/s.  However, that is not a fix, because all templates are broken with any of these newer builds for me.

Performance on other models is fine.


**What should happen:**

* You should be able to use the Windows x64 Vulkan build (or other builds, if those are effected too) and the default template should not unnecessarily degrade performance.

* Alternative templates shouldn't cause Gemma 4 to spit out its own template in responses.  I didn't identify which commit started that issue, but if that wasn't a problem then at least it would have been an option to avoid the performance issue temporarily.


### First Bad Commit

https://github.com/ggml-org/llama.cpp/commit/d7ff074c87ecacd57d5760e2f678866ba9fe7149

### Relevant log output

<details>
<summary>Logs</summary>


```console

```
</details>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Misc. bug: Gemma 4 template changes causing degraded inference speed #21784

Name and Version

Operating systems

Which llama.cpp modules do you know to be affected?

Command line

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Misc. bug: Gemma 4 template changes causing degraded inference speed #21784

Description

Name and Version

Operating systems

Which llama.cpp modules do you know to be affected?

Command line

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions