Eval bug: Segmentation fault when tokenizing with long sequences of repeated characters

### Name and Version

ggml_cuda_init: found 1 CUDA devices (Total VRAM: 81037 MiB):
  Device 0: NVIDIA A100 80GB PCIe, compute capability 8.0, VMM: yes, VRAM: 81037 MiB
version: 8376 (67a2209fa)
built with GNU 9.4.0 for Linux x86_64

### Operating systems

Linux

### GGML backends

CUDA

### Hardware

NVIDIA A100 80GB PCIe

### Models

[unsloth/Qwen3-Coder-30B-A3B-Instruct-GGUF](https://huggingface.co/unsloth/Qwen3-Coder-30B-A3B-Instruct-GGUF) Qwen3-Coder-30B-A3B-Instruct-UD-Q8_K_XL.gguf

### Problem description & steps to reproduce

The bug is very easy to reproduce with the given input `test.txt`, which contains 43,695 'A' characters.

I launched llama-server with
```sh
nohup llama-server -m "Qwen3-Coder-30B-A3B-Instruct-UD-Q8_K_XL.gguf" --port 8000 --jinja -ngl 99 --ctx-size $((64*1024)) --temp 0.7 --min-p 0.0 --top-p 0.8 --top-k 20 --rep
eat-penalty 1.05 > llama-server.log 2>&1 &
```
I sent a post request to tokenize a long input string
```sh
curl --request POST --url http://localhost:8000/tokenize --header "Content-Type: application/json" --data "{\"content\": \"$(cat test.txt)\"}"
```

This input file `test.txt` was generated by the test case generator `TestFusion` developed in our [STAR](https://star.inf.usi.ch/#/home) lab.

[test.txt](https://github.com/user-attachments/files/26322351/test.txt)

### First Bad Commit

The bug may relate with this pull request: https://github.com/ggml-org/llama.cpp/pull/17786

### Relevant log output

<details>
<summary>Logs</summary>


```console
curl: (52) Empty reply from server

[2]-  Segmentation fault      (core dumped) nohup llama-server -m "Qwen3-Coder-30B-A3B-Instruct-UD-Q8_K_XL.gguf" --port 8000 --jinja -ngl 99 --ctx-size $((64*1024)) --temp 0.7 --min-p 0.0 --top-p 0.8 --top-k 20 --repeat-penalty 1.05 > llama-server.log 2>&1
```
</details>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Eval bug: Segmentation fault when tokenizing with long sequences of repeated characters #21113

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Eval bug: Segmentation fault when tokenizing with long sequences of repeated characters #21113

Description

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions