Bug: llama-server crashes (segfault) when processing prompts with repeated identical characters

# Bug: llama-server crashes (segfault) when processing prompts with repeated identical characters

## Issue Summary

llama-server crashes with a segmentation fault when processing prompts containing a large number of repeated identical characters (e.g., `'A' * 10000`), but works perfectly fine with real natural language text of the same or larger size.

## Environment

- **llama.cpp version**: b7148-0543f928a
- **Backend**: ROCm (also reproduced on Vulkan with AMDVLK)
- **GPU**: AMD (96GB VRAM)
- **OS**: Ubuntu Linux
- **Model**: unsloth/gpt-oss-120b-GGUF (Q8_0)

## Command to Start Server

```bash
LLAMA_CHAT_TEMPLATE_KWARGS='{"reasoning_effort": "medium"}' ./llama-server \
  -m ~/model/unsloth/gpt-oss-120b-GGUF/new/gpt-oss-120b-Q8_0-00001-of-00002.gguf \
  --host 0.0.0.0 --port 11435 -ngl 99 --ctx-size 32768 \
  -b 256 -ub 128 --no-warmup --n-predict 8192 \
  --top-k 60 --top-p 0.9 --repeat-penalty 1.1 \
  --jinja --chat-template-kwargs '{"reasoning_effort": "medium"}' \
  --no-mmap --mlock -np 1 -sps 0.0 --cache-reuse 0
```

## Steps to Reproduce

1. Start llama-server with the above command
2. Send a request with repeated characters:

```bash
curl -X POST http://localhost:11435/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"messages":[{"role":"user","content":"'"$(python3 -c "print('A'*10000)")"' Say OK"}],"stream":true}'
```

3. Server crashes with: `段错误 (核心已转储)` / `Segmentation fault (core dumped)`

## Expected Behavior

Server should process the request (even if slowly) or return an error, not crash.

## Actual Behavior

Server immediately crashes with segfault. The crash occurs:

- At ~10,000 repeated characters threshold
- Regardless of backend (ROCm or Vulkan)
- Even when called from localhost

## Key Observation: Real Text Works Fine

The same server handles **real natural language text** of much larger sizes without any issues:

```python
# This works perfectly - 182,000 characters of real text
real_text = "The quick brown fox..." * 500
# Request succeeds, server processes 27,341 tokens without crash
```

```python
# This crashes - only 10,000 repeated characters
repeated_text = "A" * 10000
# Server crashes with segfault
```

## Server Logs Before Crash

```
main: model loaded
main: server is listening on http://0.0.0.0:11435
main: starting the main loop...
srv  update_slots: all slots are idle
./runllama.sh: 第 21 行: 4471 段错误 (核心已转储) ...
```

The crash happens immediately when processing the request - no prompt processing logs appear.

## Possible Causes

Based on investigation, this could be related to:

1. **Tokenizer handling of repeated tokens** - The BPE tokenizer may have edge cases with highly repetitive input
2. **DRY sampler** - The sampler chain includes `dry` which detects repeated sequences
3. **Repeat penalty calculation** - Edge case when all tokens are identical

## Additional Context

- The WebUI on the same server can handle large prompts (27,341 tokens) of real text
- The crash threshold is consistent (~10,000 repeated characters)
- This was tested with multiple configurations (with/without `-fa`, different batch sizes, etc.)

## Workaround

Use real/varied text content instead of repeated characters. For actual use cases with natural language, the server works correctly.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug: llama-server crashes (segfault) when processing prompts with repeated identical characters #17636

Bug: llama-server crashes (segfault) when processing prompts with repeated identical characters

Issue Summary

Environment

Command to Start Server

Steps to Reproduce

Expected Behavior

Actual Behavior

Key Observation: Real Text Works Fine

Server Logs Before Crash

Possible Causes

Additional Context

Workaround

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Bug: llama-server crashes (segfault) when processing prompts with repeated identical characters #17636

Description

Bug: llama-server crashes (segfault) when processing prompts with repeated identical characters

Issue Summary

Environment

Command to Start Server

Steps to Reproduce

Expected Behavior

Actual Behavior

Key Observation: Real Text Works Fine

Server Logs Before Crash

Possible Causes

Additional Context

Workaround

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions