-
Notifications
You must be signed in to change notification settings - Fork 15.3k
Closed
Labels
Description
Name and Version
./build/bin/llama-cli --version
ggml_cuda_init: found 1 CUDA devices:
Device 0: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes
version: 7946 (7a4f97d19)
built with Clang 21.1.7 for Linux x86_64
Operating systems
Linux
GGML backends
CUDA
Hardware
Ryzen 5950X + NVIDIA RTX 3090
Models
Qwen-Coder-Next-80B-A3B (unsloth UD-Q6-XL)
Problem description & steps to reproduce
./build/bin/llama-server --jinja --model ~/llama/Qwen3-Coder-Next-GGUF/UD-Q6_K_XL/Qwen3-Coder
-Next-UD-Q6_K_XL-00001-of-00002.gguf --threads -1 -fa 1 -ctv q8_0 -ctk q8_0 --host 0.0.0.0 --temp 1.0 --top-k 40 --top-p 0.95 --min-p 0.01 -c 131072 --no-direct-io -fitt 384
First Bad Commit
No response
Relevant log output
slot print_timing: id 3 | task 0 |
prompt eval time = 57251.21 ms / 10634 tokens ( 5.38 ms per token, 185.74 tokens per second)
eval time = 4467.24 ms / 33 tokens ( 135.37 ms per token, 7.39 tokens per second)
total time = 61718.45 ms / 10667 tokens
slot release: id 3 | task 0 | stop processing: n_tokens = 10666, truncated = 0
srv update_slots: all slots are idle
srv log_server_r: done request: POST /v1/chat/completions 192.168.2.218 200
srv params_from_: Chat format: Qwen3 Coder
slot get_availabl: id 3 | task -1 | selected slot by LCP similarity, sim_best = 0.914 (> 0.100 thold), f_keep = 1.000
slot launch_slot_: id 3 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> ?to
p-p -> min-p -> ?xtc -> ?temp-ext -> dist
slot launch_slot_: id 3 | task 40 | processing task, is_child = 0
slot update_slots: id 3 | task 40 | new prompt, n_ctx_slot = 131072, n_keep = 0, task.n_tokens = 11675
slot update_slots: id 3 | task 40 | n_tokens = 10666, memory_seq_rm [10666, end)
slot update_slots: id 3 | task 40 | prompt processing progress, n_tokens = 11611, batch.n_tokens = 945, progress = 0.994518
slot update_slots: id 3 | task 40 | n_tokens = 11611, memory_seq_rm [11611, end)
slot update_slots: id 3 | task 40 | prompt processing progress, n_tokens = 11675, batch.n_tokens = 64, progress = 1.000000
slot update_slots: id 3 | task 40 | prompt done, n_tokens = 11675, batch.n_tokens = 64
slot init_sampler: id 3 | task 40 | init sampler, took 1.06 ms, tokens: text = 11675, total = 11675
slot update_slots: id 3 | task 40 | created context checkpoint 2 of 8 (pos_min = 11610, pos_max = 11610, size = 75.376 MiB)
0x0000737214b10813 in __GI___wait4 (pid=630236, stat_loc=0x0, options=0, usage=0x0) at ../sysdeps/unix/sysv/linux/wait4.c:30
warning: 30 ../sysdeps/unix/sysv/linux/wait4.c: No such file or directory
#0 0x0000737214b10813 in __GI___wait4 (pid=630236, stat_loc=0x0, options=0, usage=0x0) at ../sysdeps/unix/sysv/linux/wait4.
c:30
30 in ../sysdeps/unix/sysv/linux/wait4.c
#1 0x00007372179a14f6 in ggml_print_backtrace () from /home/morbo/git/llama.cpp/build/bin/libggml-base.so.0
#2 0x00007372179b6de6 in ggml_uncaught_exception() () from /home/morbo/git/llama.cpp/build/bin/libggml-base.so.0
#3 0x0000737214ebb0da in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
#4 0x0000737214ea5a55 in std::terminate() () from /lib/x86_64-linux-gnu/libstdc++.so.6
#5 0x0000737214ebb391 in __cxa_throw () from /lib/x86_64-linux-gnu/libstdc++.so.6
#6 0x0000737217cb5571 in llama_grammar_accept_token(llama_grammar&, int, std::__cxx11::basic_string<char, std::char_traits<
char>, std::allocator<char> > const&) () from /home/morbo/git/llama.cpp/build/bin/libllama.so.0
#7 0x0000737217cb4b0f in llama_grammar_accept_impl(llama_grammar&, int) () from /home/morbo/git/llama.cpp/build/bin/libllam
a.so.0
#8 0x00005e31b24060b9 in common_sampler_accept(common_sampler*, int, bool) ()
#9 0x00005e31b21b0a39 in server_context_impl::update_slots() ()
#10 0x00005e31b22645cc in server_queue::start_loop(long) ()
#11 0x00005e31b20e7498 in main ()
Reactions are currently unavailable