Name and Version
ggml_cuda_init: found 1 CUDA devices (Total VRAM: 81037 MiB):
Device 0: NVIDIA A100 80GB PCIe, compute capability 8.0, VMM: yes, VRAM: 81037 MiB
version: 8376 (67a2209)
built with GNU 9.4.0 for Linux x86_64
Operating systems
Linux
GGML backends
CUDA
Hardware
NVIDIA A100 80GB PCIe
Models
unsloth/Qwen3-Coder-30B-A3B-Instruct-GGUF Qwen3-Coder-30B-A3B-Instruct-UD-Q8_K_XL.gguf
Problem description & steps to reproduce
The bug is very easy to reproduce with the given input test.txt, which contains 43,695 'A' characters.
I launched llama-server with
nohup llama-server -m "Qwen3-Coder-30B-A3B-Instruct-UD-Q8_K_XL.gguf" --port 8000 --jinja -ngl 99 --ctx-size $((64*1024)) --temp 0.7 --min-p 0.0 --top-p 0.8 --top-k 20 --rep
eat-penalty 1.05 > llama-server.log 2>&1 &
I sent a post request to tokenize a long input string
curl --request POST --url http://localhost:8000/tokenize --header "Content-Type: application/json" --data "{\"content\": \"$(cat test.txt)\"}"
This input file test.txt was generated by the test case generator TestFusion developed in our STAR lab.
test.txt
First Bad Commit
The bug may relate with this pull request: #17786
Relevant log output
Logs
curl: (52) Empty reply from server
[2]- Segmentation fault (core dumped) nohup llama-server -m "Qwen3-Coder-30B-A3B-Instruct-UD-Q8_K_XL.gguf" --port 8000 --jinja -ngl 99 --ctx-size $((64*1024)) --temp 0.7 --min-p 0.0 --top-p 0.8 --top-k 20 --repeat-penalty 1.05 > llama-server.log 2>&1
Name and Version
ggml_cuda_init: found 1 CUDA devices (Total VRAM: 81037 MiB):
Device 0: NVIDIA A100 80GB PCIe, compute capability 8.0, VMM: yes, VRAM: 81037 MiB
version: 8376 (67a2209)
built with GNU 9.4.0 for Linux x86_64
Operating systems
Linux
GGML backends
CUDA
Hardware
NVIDIA A100 80GB PCIe
Models
unsloth/Qwen3-Coder-30B-A3B-Instruct-GGUF Qwen3-Coder-30B-A3B-Instruct-UD-Q8_K_XL.gguf
Problem description & steps to reproduce
The bug is very easy to reproduce with the given input
test.txt, which contains 43,695 'A' characters.I launched llama-server with
I sent a post request to tokenize a long input string
This input file
test.txtwas generated by the test case generatorTestFusiondeveloped in our STAR lab.test.txt
First Bad Commit
The bug may relate with this pull request: #17786
Relevant log output
Logs