Skip to content

Regression: AI responds with {" when requested with tools and stream=true #9988

@mgoltzsche

Description

@mgoltzsche

LocalAI version:

LocalAI v4.3.1
container: localai/localai:v4.3.1-gpu-vulkan

Environment, CPU architecture, OS, and Version:

Ubuntu 24.04 host with an AMD Ryzen 7 5800X CPU and an AMD Radeon RX 6600 GPU

Linux max-machine 6.8.0-117-generic #117-Ubuntu SMP PREEMPT_DYNAMIC Tue May  5 19:26:24 UTC 2026 x86_64 x86_64 x86_64 GNU/Linux

Describe the bug

The LLM response only consists of {" when the chat completion API request specifies tools and enables streaming.
This is a regression since it worked in LocalAI v4.0.0 but stopped working at some point prior to LocalAI v4.3.1.

Relates to #9419 and #9363.

To Reproduce

  1. Start LocalAI, e.g. docker run -ti --rm --network=host --privileged -v "$(pwd)/data/models:/models" -v "$(pwd)/data/backends:/backends" localai/localai:v4.3.1-gpu-vulkan --address 127.0.0.1:8080
  2. Download the qwen3-4b model.
  3. Request chat completion with streaming enabled and at least one tool definition: curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{"model": "qwen3-4b", "messages": [{"role": "user", "content": "Hello"}], "stream": true, "tools": [{"name":"exec","type":"function","function":{"parameters":{"type":"object","properties":{"cmd":{"type":"string"}}}, "description":"execute a command"}}]}'
  4. Observe that the response only consists of {":
data: {"created":1779737479,"object":"chat.completion.chunk","id":"f05914e1-931e-4d92-920f-fa952f8d4734","model":"qwen3-4b","choices":[{"index":0,"finish_reason":null,"delta":{"role":"assistant","content":null}}]}

data: {"created":1779737479,"object":"chat.completion.chunk","id":"f05914e1-931e-4d92-920f-fa952f8d4734","model":"qwen3-4b","choices":[{"index":0,"finish_reason":null,"delta":{"content":"{\""}}]}

data: {"created":1779737479,"object":"chat.completion.chunk","id":"f05914e1-931e-4d92-920f-fa952f8d4734","model":"qwen3-4b","choices":[{"index":0,"finish_reason":"stop","delta":{"content":null}}]}

data: [DONE]

Expected behavior

The response should contain actual content: In the example a greeting would be appropriate. Alternatively, given the tool definition, the model could decide to call a tool, if requested to do so.
However, {" is not a useful response.

Logs

LocalAI logs
CPU info:
model name	: AMD Ryzen 7 5800X 8-Core Processor
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local user_shstk clzero irperf xsaveerptr rdpru wbnoinvd arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif v_spec_ctrl umip pku ospke vaes vpclmulqdq rdpid overflow_recov succor smca fsrm debug_swap ibpb_exit_to_user
CPU:    AVX    found OK
CPU:    AVX2   found OK
CPU: no AVX512 found
May 25 19:33:54 DEBUG GPU vendor detected via ghw vendor="amd" caller={caller.file="/build/pkg/xsysinfo/gpu.go"  caller.L=157 } 
May 25 19:33:54 DEBUG GPU vendor gpuVendor="amd" caller={caller.file="/build/pkg/system/state.go"  caller.L=77 } 
May 25 19:33:54 DEBUG VRAM detected via binary tools total_vram=8589934592 caller={caller.file="/build/pkg/xsysinfo/gpu.go"  caller.L=116 } 
May 25 19:33:54 DEBUG Total available VRAM vram=8589934592 caller={caller.file="/build/pkg/system/state.go"  caller.L=79 } 
May 25 19:33:54 INFO  Using forced capability run file capabilityRunFile="/run/localai/capability" capability="vulkan\n" env="" caller={caller.file="/build/pkg/system/capabilities.go"  caller.L=118 } 
May 25 19:33:54 INFO  Starting LocalAI threads=8 modelsPath="//models" caller={caller.file="/build/core/application/startup.go"  caller.L=39 } 
May 25 19:33:54 INFO  LocalAI version version="v4.3.1 (1dcd1ae915c69d79e3219b88b85b45a3639a3c74)" caller={caller.file="/build/core/application/startup.go"  caller.L=40 } 
May 25 19:33:54 INFO  LocalAI Assistant in-memory MCP server initialised tools=21 read_only=false caller={caller.file="/build/core/http/endpoints/mcp/localai_assistant.go"  caller.L=78 } 
...
May 25 19:33:56 INFO  LocalAI is started and running address="127.0.0.1:8080" caller={caller.file="/build/core/cli/run.go"  caller.L=560 } 
May 25 19:33:56 INFO  Agent pool started (standalone/LocalAGI mode) stateDir="//data" apiURL="http://127.0.0.1:8080" caller={caller.file="/build/core/services/agentpool/agent_pool.go"  caller.L=338 } 
May 25 19:33:56 DEBUG HTTP request method="GET" path="/api/operations" status=200 caller={caller.file="/build/core/http/app.go"  caller.L=203 } 
  ↳ repeated 2x
May 25 19:33:58 DEBUG HTTP request method="GET" path="/api/resources" status=200 caller={caller.file="/build/core/http/app.go"  caller.L=203 } 
May 25 19:34:06 DEBUG HTTP request method="GET" path="/api/operations" status=200 caller={caller.file="/build/core/http/app.go"  caller.L=203 } 
May 25 19:34:06 DEBUG Using reported capability reportedCapability="vulkan" capMap=map[amd:rocm-llama-cpp default:cpu-llama-cpp intel:intel-sycl-f16-llama-cpp metal:metal-llama-cpp nvidia:cuda12-llama-cpp nvidia-cuda-12:cuda12-llama-cpp nvidia-cuda-13:cuda13-llama-cpp nvidia-l4t:nvidia-l4t-arm64-llama-cpp nvidia-l4t-cuda-12:nvidia-l4t-arm64-llama-cpp nvidia-l4t-cuda-13:cuda13-nvidia-l4t-arm64-llama-cpp vulkan:vulkan-llama-cpp] caller={caller.file="/build/pkg/system/capabilities.go"  caller.L=71 } 
May 25 19:34:06 DEBUG Capability not in map, falling back to default reportedCapability="vulkan" capMap=map[default:cpu-ik-llama-cpp] caller={caller.file="/build/pkg/system/capabilities.go"  caller.L=81 } 
...
May 25 19:34:07 INFO  BackendLoader starting modelID="qwen3-4b" backend="llama-cpp" model="Qwen3-4B.Q4_K_M.gguf" caller={caller.file="/build/pkg/model/initializers.go"  caller.L=169 } 
May 25 19:34:07 DEBUG Loading model in memory from file file="/models/Qwen3-4B.Q4_K_M.gguf" caller={caller.file="/build/pkg/model/loader.go"  caller.L=336 } 
May 25 19:34:07 DEBUG Loading Model with gRPC modelID="qwen3-4b" file="/models/Qwen3-4B.Q4_K_M.gguf" backend="llama-cpp" options={llama-cpp Qwen3-4B.Q4_K_M.gguf qwen3-4b {{}} 0x333c9546e308 map[] 20 2 true} caller={caller.file="/build/pkg/model/initializers.go"  caller.L=54 } 
May 25 19:34:07 DEBUG Loading external backend uri="/backends/vulkan-llama-cpp/run.sh" caller={caller.file="/build/pkg/model/initializers.go"  caller.L=87 } 
May 25 19:34:07 DEBUG external backend is file file=&{run.sh 1668 493 {0 63914565083 0x943ecc0} {2304 26363832 1 33261 0 0 0 0 1668 4096 8 {1779662828 980616651} {1778968283 0} {1779306410 336789385} [0 0 0]}} caller={caller.file="/build/pkg/model/initializers.go"  caller.L=90 } 
May 25 19:34:07 DEBUG Loading GRPC Process process="/backends/vulkan-llama-cpp/run.sh" caller={caller.file="/build/pkg/model/process.go"  caller.L=145 } 
May 25 19:34:07 DEBUG GRPC Service will be running id="qwen3-4b" address="127.0.0.1:40959" caller={caller.file="/build/pkg/model/process.go"  caller.L=147 } 
May 25 19:34:07 DEBUG GRPC Service state dir dir="/tmp/go-processmanager1275839256" caller={caller.file="/build/pkg/model/process.go"  caller.L=171 } 
May 25 19:34:07 DEBUG GRPC Service Started caller={caller.file="/build/pkg/model/initializers.go"  caller.L=102 } 
May 25 19:34:07 DEBUG Wait for the service to start up caller={caller.file="/build/pkg/model/initializers.go"  caller.L=115 } 
May 25 19:34:07 DEBUG Options options=ContextSize:40960 Seed:1596861957 NBatch:512 MMap:true NGPULayers:99999999 Threads:8 FlashAttention:"auto" Options:"gpu" caller={caller.file="/build/pkg/model/initializers.go"  caller.L=116 } 
May 25 19:34:07 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="+++ realpath run.sh" caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 25 19:34:07 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="++ dirname /backends/vulkan-llama-cpp/run.sh" caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 25 19:34:07 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="+ CURDIR=/backends/vulkan-llama-cpp" caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 25 19:34:07 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="+ cd /" caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 25 19:34:07 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="+ echo 'CPU info:'" caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 25 19:34:07 DEBUG GRPC stdout id="qwen3-4b-127.0.0.1:40959" line="CPU info:" caller={caller.file="/build/pkg/model/process.go"  caller.L=200 } 
May 25 19:34:07 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="+ grep -e 'model\\sname' /proc/cpuinfo" caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 25 19:34:07 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="+ head -1" caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 25 19:34:07 DEBUG GRPC stdout id="qwen3-4b-127.0.0.1:40959" line="model name\t: AMD Ryzen 7 5800X 8-Core Processor" caller={caller.file="/build/pkg/model/process.go"  caller.L=200 } 
May 25 19:34:07 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="+ grep -e flags /proc/cpuinfo" caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 25 19:34:07 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="+ head -1" caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 25 19:34:07 DEBUG GRPC stdout id="qwen3-4b-127.0.0.1:40959" line="flags\t\t: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local user_shstk clzero irperf xsaveerptr rdpru wbnoinvd arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif v_spec_ctrl umip pku ospke vaes vpclmulqdq rdpid overflow_recov succor smca fsrm debug_swap ibpb_exit_to_user" caller={caller.file="/build/pkg/model/process.go"  caller.L=200 } 
May 25 19:34:07 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="+ BINARY=llama-cpp-fallback" caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 25 19:34:07 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="+ grep -q -e '\\savx\\s' /proc/cpuinfo" caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 25 19:34:07 DEBUG GRPC stdout id="qwen3-4b-127.0.0.1:40959" line="CPU:    AVX    found OK" caller={caller.file="/build/pkg/model/process.go"  caller.L=200 } 
May 25 19:34:07 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="+ echo 'CPU:    AVX    found OK'" caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 25 19:34:07 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="+ '[' -e /backends/vulkan-llama-cpp/llama-cpp-avx ']'" caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 25 19:34:07 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="+ BINARY=llama-cpp-avx" caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 25 19:34:07 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="+ grep -q -e '\\savx2\\s' /proc/cpuinfo" caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 25 19:34:07 DEBUG GRPC stdout id="qwen3-4b-127.0.0.1:40959" line="CPU:    AVX2   found OK" caller={caller.file="/build/pkg/model/process.go"  caller.L=200 } 
May 25 19:34:07 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="+ echo 'CPU:    AVX2   found OK'" caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 25 19:34:07 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="+ '[' -e /backends/vulkan-llama-cpp/llama-cpp-avx2 ']'" caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 25 19:34:07 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="+ BINARY=llama-cpp-avx2" caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 25 19:34:07 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="+ grep -q -e '\\savx512f\\s' /proc/cpuinfo" caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 25 19:34:07 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="+ '[' -n '' ']'" caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 25 19:34:07 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="++ uname" caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 25 19:34:07 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="+ '[' Linux == Darwin ']'" caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 25 19:34:07 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="+ export LD_LIBRARY_PATH=/backends/vulkan-llama-cpp/lib:" caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 25 19:34:07 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="+ LD_LIBRARY_PATH=/backends/vulkan-llama-cpp/lib:" caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 25 19:34:07 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="+ '[' -d /backends/vulkan-llama-cpp/lib/rocblas/library ']'" caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 25 19:34:07 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="+ '[' -f /backends/vulkan-llama-cpp/lib/ld.so ']'" caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 25 19:34:07 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="+ echo 'Using lib/ld.so'" caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 25 19:34:07 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="+ echo 'Using binary: llama-cpp-avx2'" caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 25 19:34:07 DEBUG GRPC stdout id="qwen3-4b-127.0.0.1:40959" line="Using lib/ld.so" caller={caller.file="/build/pkg/model/process.go"  caller.L=200 } 
May 25 19:34:07 DEBUG GRPC stdout id="qwen3-4b-127.0.0.1:40959" line="Using binary: llama-cpp-avx2" caller={caller.file="/build/pkg/model/process.go"  caller.L=200 } 
May 25 19:34:07 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="+ exec /backends/vulkan-llama-cpp/lib/ld.so /backends/vulkan-llama-cpp/llama-cpp-avx2 --addr 127.0.0.1:40959" caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 25 19:34:07 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="WARNING: All log messages before absl::InitializeLog() is called are written to STDERR" caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 25 19:34:07 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="I0000 00:00:1779737647.832791     221 config.cc:230] gRPC experiments enabled: call_status_override_on_cancellation, event_engine_dns, event_engine_listener, http2_stats_fix, monitoring_experiment, pick_first_new, trace_record_callops, work_serializer_clears_time_cache, work_serializer_dispatch" caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 25 19:34:07 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="I0000 00:00:1779737647.832990     221 ev_epoll1_linux.cc:125] grpc epoll fd: 4" caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 25 19:34:07 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="I0000 00:00:1779737647.833140     221 server_builder.cc:392] Synchronous server. Num CQs: 1, Min pollers: 1, Max Pollers: 2, CQ timeout (msec): 10000" caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 25 19:34:07 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="I0000 00:00:1779737647.834309     221 ev_epoll1_linux.cc:359] grpc epoll fd: 5" caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 25 19:34:07 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="I0000 00:00:1779737647.834696     221 tcp_socket_utils.cc:634] TCP_USER_TIMEOUT is available. TCP_USER_TIMEOUT will be used thereafter" caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 25 19:34:07 DEBUG GRPC stdout id="qwen3-4b-127.0.0.1:40959" line="Server listening on 127.0.0.1:40959" caller={caller.file="/build/pkg/model/process.go"  caller.L=200 } 
May 25 19:34:07 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="start_llama_server: starting llama server" caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 25 19:34:07 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="start_llama_server: waiting for model to be loaded" caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 25 19:34:08 INFO  Backend upgrade available (new build) backend="vulkan-llama-cpp" caller={caller.file="/build/core/application/upgrade_checker.go"  caller.L=197 } 
May 25 19:34:08 INFO  Backend upgrade available (new build) backend="vulkan-whisper" caller={caller.file="/build/core/application/upgrade_checker.go"  caller.L=197 } 
May 25 19:34:08 INFO  Backend upgrade available (new build) backend="piper" caller={caller.file="/build/core/application/upgrade_checker.go"  caller.L=197 } 
May 25 19:34:08 DEBUG HTTP request method="GET" path="/api/operations" status=200 caller={caller.file="/build/core/http/app.go"  caller.L=203 } 
May 25 19:34:08 DEBUG HTTP request method="GET" path="/api/resources" status=200 caller={caller.file="/build/core/http/app.go"  caller.L=203 } 
May 25 19:34:09 DEBUG HTTP request method="GET" path="/api/operations" status=200 caller={caller.file="/build/core/http/app.go"  caller.L=203 } 
May 25 19:34:09 DEBUG GRPC Service Ready caller={caller.file="/build/pkg/model/initializers.go"  caller.L=123 } 
May 25 19:34:09 DEBUG GRPC: Loading model with options options={{{} [] [] 0x333c95551c20} 0 [] Qwen3-4B.Q4_K_M.gguf 40960 1596861957 512 false false true false false false false 99999999   8 0 0 0 0 /models/Qwen3-4B.Q4_K_M.gguf   false 0 false   0     0 false    0 false false 0 0 0  false  0 0 0   0 0 0 0  auto false //models [] [] [gpu]   [] false [] } caller={caller.file="/build/pkg/model/initializers.go"  caller.L=146 } 
May 25 19:34:09 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="0.02.018.170 I system info: n_threads = 8, n_threads_batch = -1, total_threads = 16" caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 25 19:34:09 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="0.02.018.173 I " caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 25 19:34:09 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="0.02.018.194 I system_info: n_threads = 8 / 16 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | BMI2 = 1 | LLAMAFILE = 1 | OPENMP = 1 | REPACK = 1 | " caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 25 19:34:09 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="0.02.018.194 I " caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 25 19:34:09 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="0.02.018.196 I srv    load_model: loading model '/models/Qwen3-4B.Q4_K_M.gguf'" caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 25 19:34:09 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="0.02.018.212 I common_init_result: fitting params to device memory ..." caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 25 19:34:09 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="0.02.018.212 I common_init_result: (for bugs during this step try to reproduce them with -fit off, or provide --verbose logs if the bug only occurs with -fit on)" caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 25 19:34:10 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="0.02.206.779 W common_fit_params: failed to fit params to free device memory: n_gpu_layers already set by user to 99999999, abort" caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 25 19:34:10 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="0.02.295.976 W load: control-looking token: 128247 '</s>' was not control-type; this is probably a bug in the model. its type will be overridden" caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 25 19:34:10 DEBUG HTTP request method="GET" path="/api/operations" status=200 caller={caller.file="/build/core/http/app.go"  caller.L=203 } 
May 25 19:34:10 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="0.03.146.264 W common_init_from_params: warming up the model with an empty run - please wait ... (--no-warmup to disable)" caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 25 19:34:11 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="0.03.686.375 I srv    load_model: initializing slots, n_slots = 1" caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 25 19:34:11 DEBUG HTTP request method="GET" path="/api/operations" status=200 caller={caller.file="/build/core/http/app.go"  caller.L=203 } 
May 25 19:34:11 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="0.03.896.215 W common_speculative_init: no implementations specified for speculative decoding" caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 25 19:34:11 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="0.03.896.220 I slot   load_model: id  0 | task -1 | new slot, n_ctx = 40960" caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 25 19:34:11 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="0.03.896.237 I srv    load_model: prompt cache is enabled, size limit: no limit" caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 25 19:34:11 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="0.03.896.238 I srv    load_model: use `--cache-ram 0` to disable the prompt cache" caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 25 19:34:11 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="0.03.896.238 I srv    load_model: for more info see https://github.com/ggml-org/llama.cpp/pull/16391" caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 25 19:34:11 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="0.03.896.252 W srv          init: --cache-idle-slots requires --kv-unified, disabling" caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 25 19:34:11 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="0.03.901.174 I init: chat template, example_format: '<|im_start|>system" caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 25 19:34:11 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="You are a helpful assistant<|im_end|>" caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 25 19:34:11 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="<|im_start|>user" caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 25 19:34:11 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="Hello<|im_end|>" caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 25 19:34:11 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="<|im_start|>assistant" caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 25 19:34:11 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="Hi there<|im_end|>" caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 25 19:34:11 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="<|im_start|>user" caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 25 19:34:11 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="How are you?<|im_end|>" caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 25 19:34:11 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="<|im_start|>assistant" caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 25 19:34:11 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="'" caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 25 19:34:11 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="0.03.904.110 I srv          init: init: chat template, thinking = 1" caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 25 19:34:11 DEBUG Model already loaded in memory model="qwen3-4b" caller={caller.file="/build/pkg/model/loader.go"  caller.L=374 } 
May 25 19:34:11 DEBUG Checking model availability model="qwen3-4b" caller={caller.file="/build/pkg/model/loader.go"  caller.L=386 } 
May 25 19:34:11 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="0.03.918.162 I srv  parse_option: Using grammar: root-1-name ::= \"\\\"answer\\\"\"" caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 25 19:34:11 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="root-1 ::= \"{\" space \"\\\"arguments\\\"\" space \":\" space root-1-arguments \",\" space \"\\\"name\\\"\" space \":\" space root-1-name \"}\" space" caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 25 19:34:11 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="root-0 ::= \"{\" space \"\\\"arguments\\\"\" space \":\" space root-0-arguments \",\" space \"\\\"name\\\"\" space \":\" space root-0-name \"}\" space" caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 25 19:34:11 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="root ::= root-0 | root-1" caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 25 19:34:11 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="space ::= \" \"?" caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 25 19:34:11 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="freestring ::= (" caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 25 19:34:11 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="\t\t\t[^\\x00] |" caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 25 19:34:11 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="\t\t\t\"\\\\\" ([\"\\\\/bfnrt] | \"u\" [0-9a-fA-F] [0-9a-fA-F] [0-9a-fA-F] [0-9a-fA-F])" caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 25 19:34:11 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="\t\t  )* space" caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 25 19:34:11 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="string ::= \"\\\"\" (" caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 25 19:34:11 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="\t\t\t[^\"\\\\] |" caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 25 19:34:11 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="\t\t\t\"\\\\\" ([\"\\\\/bfnrt] | \"u\" [0-9a-fA-F] [0-9a-fA-F] [0-9a-fA-F] [0-9a-fA-F])" caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 25 19:34:11 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="\t\t  )* \"\\\"\" space" caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 25 19:34:11 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="root-0-arguments ::= \"{\" space \"\\\"cmd\\\"\" space \":\" space string \"}\" space" caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 25 19:34:11 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="root-0-name ::= \"\\\"\\\"\"" caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 25 19:34:11 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="root-1-arguments ::= \"{\" space \"\\\"message\\\"\" space \":\" space string \"}\" space" caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 25 19:34:11 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="0.03.918.167 I srv  parse_option: [TOOLS DEBUG] parse_options: Checking for tools in proto, tools().empty()=0, tools().size()=163" caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 25 19:34:11 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="0.03.918.167 I srv  parse_option: [TOOLS DEBUG] parse_options: Tools string from proto (first 500 chars): [{\"type\":\"function\",\"function\":{\"name\":\"\",\"description\":\"execute a command\",\"strict\":false,\"parameters\":{\"properties\":{\"cmd\":{\"type\":\"string\"}},\"type\":\"object\"}}}]" caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 25 19:34:11 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="0.03.918.187 I srv  parse_option: Extracted tools from proto: [{\"type\":\"function\",\"function\":{\"name\":\"\",\"description\":\"execute a command\",\"strict\":false,\"parameters\":{\"properties\":{\"cmd\":{\"type\":\"string\"}},\"type\":\"object\"}}}]" caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 25 19:34:11 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="0.03.918.187 I srv  parse_option: [TOOLS DEBUG] parse_options: Successfully parsed 1 tools from Go layer" caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 25 19:34:11 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="0.03.918.188 I srv  parse_option: [TOOLS DEBUG] parse_options: Tool 0: " caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 25 19:34:11 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="0.03.918.190 I srv  parse_option: [TOOLS DEBUG] parse_options: Tools successfully added to data, count: 1" caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 25 19:34:11 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="0.04.002.467 I start_llama_server: model loaded" caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 25 19:34:11 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="0.04.002.477 I slot get_availabl: id  0 | task -1 | selected slot by LRU, t_last = -1" caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 25 19:34:11 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="0.04.002.478 I srv  get_availabl: updating prompt cache" caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 25 19:34:11 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="0.04.002.482 I srv          load:  - looking for better prompt, base f_keep = -1.000, sim = 0.000" caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 25 19:34:11 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="0.04.002.484 I srv        update:  - cache state: 0 prompts, 0.000 MiB (limits: 0.000 MiB, 40960 tokens, 40960 est)" caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 25 19:34:11 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="0.04.002.485 I srv  get_availabl: prompt cache update took 0.01 ms" caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 25 19:34:11 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="0.04.002.543 I slot launch_slot_: id  0 | task 0 | processing task, is_child = 0" caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 25 19:34:12 DEBUG LLM result result="" caller={caller.file="/build/pkg/functions/parse.go"  caller.L=256 } 
May 25 19:34:12 DEBUG LLM result(processed) result="" caller={caller.file="/build/pkg/functions/parse.go"  caller.L=264 } 
May 25 19:34:12 DEBUG Sending chunk chunk="{\"created\":1779737647,\"object\":\"chat.completion.chunk\",\"id\":\"f3c531da-4964-43c9-8008-bb7f8758e817\",\"model\":\"qwen3-4b\",\"choices\":[{\"index\":0,\"finish_reason\":null,\"delta\":{\"role\":\"assistant\",\"content\":null}}]}" caller={caller.file="/build/core/http/endpoints/openai/chat.go"  caller.L=394 } 
May 25 19:34:12 DEBUG Sending chunk chunk="{\"created\":1779737647,\"object\":\"chat.completion.chunk\",\"id\":\"f3c531da-4964-43c9-8008-bb7f8758e817\",\"model\":\"qwen3-4b\",\"choices\":[{\"index\":0,\"finish_reason\":null,\"delta\":{\"content\":\"{\\\"\"}}]}" caller={caller.file="/build/core/http/endpoints/openai/chat.go"  caller.L=394 } 
May 25 19:34:12 DEBUG LLM result result="{\"" caller={caller.file="/build/pkg/functions/parse.go"  caller.L=256 } 
May 25 19:34:12 DEBUG LLM result(processed) result="{\"" caller={caller.file="/build/pkg/functions/parse.go"  caller.L=264 } 
May 25 19:34:12 DEBUG LLM result result="{\"arguments" caller={caller.file="/build/pkg/functions/parse.go"  caller.L=256 } 
May 25 19:34:12 DEBUG LLM result(processed) result="{\"arguments" caller={caller.file="/build/pkg/functions/parse.go"  caller.L=264 } 
May 25 19:34:12 DEBUG LLM result result="{\"arguments\":" caller={caller.file="/build/pkg/functions/parse.go"  caller.L=256 } 
May 25 19:34:12 DEBUG LLM result(processed) result="{\"arguments\":" caller={caller.file="/build/pkg/functions/parse.go"  caller.L=264 } 
May 25 19:34:12 DEBUG LLM result result="{\"arguments\": {\"" caller={caller.file="/build/pkg/functions/parse.go"  caller.L=256 } 
May 25 19:34:12 DEBUG LLM result(processed) result="{\"arguments\": {\"" caller={caller.file="/build/pkg/functions/parse.go"  caller.L=264 } 
May 25 19:34:12 DEBUG LLM result result="{\"arguments\": {\"message" caller={caller.file="/build/pkg/functions/parse.go"  caller.L=256 } 
May 25 19:34:12 DEBUG LLM result(processed) result="{\"arguments\": {\"message" caller={caller.file="/build/pkg/functions/parse.go"  caller.L=264 } 
May 25 19:34:12 DEBUG LLM result result="{\"arguments\": {\"message\":" caller={caller.file="/build/pkg/functions/parse.go"  caller.L=256 } 
May 25 19:34:12 DEBUG LLM result(processed) result="{\"arguments\": {\"message\":" caller={caller.file="/build/pkg/functions/parse.go"  caller.L=264 } 
May 25 19:34:12 DEBUG LLM result result="{\"arguments\": {\"message\": \"" caller={caller.file="/build/pkg/functions/parse.go"  caller.L=256 } 
May 25 19:34:12 DEBUG LLM result(processed) result="{\"arguments\": {\"message\": \"" caller={caller.file="/build/pkg/functions/parse.go"  caller.L=264 } 
May 25 19:34:12 DEBUG LLM result result="{\"arguments\": {\"message\": \"Hello" caller={caller.file="/build/pkg/functions/parse.go"  caller.L=256 } 
May 25 19:34:12 DEBUG LLM result(processed) result="{\"arguments\": {\"message\": \"Hello" caller={caller.file="/build/pkg/functions/parse.go"  caller.L=264 } 
May 25 19:34:12 DEBUG LLM result result="{\"arguments\": {\"message\": \"Hello\"}," caller={caller.file="/build/pkg/functions/parse.go"  caller.L=256 } 
May 25 19:34:12 DEBUG LLM result(processed) result="{\"arguments\": {\"message\": \"Hello\"}," caller={caller.file="/build/pkg/functions/parse.go"  caller.L=264 } 
May 25 19:34:12 DEBUG LLM result result="{\"arguments\": {\"message\": \"Hello\"}, \"" caller={caller.file="/build/pkg/functions/parse.go"  caller.L=256 } 
May 25 19:34:12 DEBUG LLM result(processed) result="{\"arguments\": {\"message\": \"Hello\"}, \"" caller={caller.file="/build/pkg/functions/parse.go"  caller.L=264 } 
May 25 19:34:12 DEBUG LLM result result="{\"arguments\": {\"message\": \"Hello\"}, \"name" caller={caller.file="/build/pkg/functions/parse.go"  caller.L=256 } 
May 25 19:34:12 DEBUG LLM result(processed) result="{\"arguments\": {\"message\": \"Hello\"}, \"name" caller={caller.file="/build/pkg/functions/parse.go"  caller.L=264 } 
May 25 19:34:12 DEBUG LLM result result="{\"arguments\": {\"message\": \"Hello\"}, \"name\":" caller={caller.file="/build/pkg/functions/parse.go"  caller.L=256 } 
May 25 19:34:12 DEBUG LLM result(processed) result="{\"arguments\": {\"message\": \"Hello\"}, \"name\":" caller={caller.file="/build/pkg/functions/parse.go"  caller.L=264 } 
May 25 19:34:12 DEBUG LLM result result="{\"arguments\": {\"message\": \"Hello\"}, \"name\": \"" caller={caller.file="/build/pkg/functions/parse.go"  caller.L=256 } 
May 25 19:34:12 DEBUG LLM result(processed) result="{\"arguments\": {\"message\": \"Hello\"}, \"name\": \"" caller={caller.file="/build/pkg/functions/parse.go"  caller.L=264 } 
May 25 19:34:12 DEBUG LLM result result="{\"arguments\": {\"message\": \"Hello\"}, \"name\": \"answer" caller={caller.file="/build/pkg/functions/parse.go"  caller.L=256 } 
May 25 19:34:12 DEBUG LLM result(processed) result="{\"arguments\": {\"message\": \"Hello\"}, \"name\": \"answer" caller={caller.file="/build/pkg/functions/parse.go"  caller.L=264 } 
May 25 19:34:12 DEBUG LLM result result="{\"arguments\": {\"message\": \"Hello\"}, \"name\": \"answer\"}" caller={caller.file="/build/pkg/functions/parse.go"  caller.L=256 } 
May 25 19:34:12 DEBUG LLM result(processed) result="{\"arguments\": {\"message\": \"Hello\"}, \"name\": \"answer\"}" caller={caller.file="/build/pkg/functions/parse.go"  caller.L=264 } 
May 25 19:34:12 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="0.04.593.529 I slot print_timing: id  0 | task 0 | " caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 25 19:34:12 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="prompt eval time =     297.67 ms /   185 tokens (    1.61 ms per token,   621.49 tokens per second)" caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 25 19:34:12 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="       eval time =     293.29 ms /    16 tokens (   18.33 ms per token,    54.55 tokens per second)" caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 25 19:34:12 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="      total time =     590.96 ms /   201 tokens" caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 25 19:34:12 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="0.04.593.556 I slot      release: id  0 | task 0 | stop processing: n_tokens = 200, truncated = 0" caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 25 19:34:12 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="0.04.593.563 I srv  update_slots: all slots are idle" caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 25 19:34:12 DEBUG LLM result result="{\"arguments\": {\"message\": \"Hello\"}, \"name\": \"answer\"}" caller={caller.file="/build/pkg/functions/parse.go"  caller.L=256 } 
May 25 19:34:12 DEBUG LLM result(processed) result="{\"arguments\": {\"message\": \"Hello\"}, \"name\": \"answer\"}" caller={caller.file="/build/pkg/functions/parse.go"  caller.L=264 } 
May 25 19:34:12 DEBUG [ChatDeltas] streaming completed, accumulated deltas from C++ autoparser total_deltas=15 caller={caller.file="/build/core/backend/llm.go"  caller.L=257 } 
May 25 19:34:12 DEBUG [ChatDeltas] received deltas from backend total_deltas=15 content_chunks=15 reasoning_chunks=0 tool_call_chunks=0 caller={caller.file="/build/pkg/functions/chat_deltas.go"  caller.L=31 } 
May 25 19:34:12 DEBUG [ChatDeltas] deltas present but no tool calls found, falling back to text parsing caller={caller.file="/build/pkg/functions/chat_deltas.go"  caller.L=67 } 
May 25 19:34:12 DEBUG [ChatDeltas] no pre-parsed tool calls, falling back to Go-side text parsing caller={caller.file="/build/core/http/endpoints/openai/chat_stream_workers.go"  caller.L=339 } 
May 25 19:34:12 DEBUG ParseTextContent result="{\"arguments\": {\"message\": \"Hello\"}, \"name\": \"answer\"}" caller={caller.file="/build/pkg/functions/parse.go"  caller.L=270 } 
May 25 19:34:12 DEBUG CaptureLLMResult config=[] caller={caller.file="/build/pkg/functions/parse.go"  caller.L=271 } 
May 25 19:34:12 DEBUG LLM result result="{\"arguments\": {\"message\": \"Hello\"}, \"name\": \"answer\"}" caller={caller.file="/build/pkg/functions/parse.go"  caller.L=256 } 
May 25 19:34:12 DEBUG LLM result(processed) result="{\"arguments\": {\"message\": \"Hello\"}, \"name\": \"answer\"}" caller={caller.file="/build/pkg/functions/parse.go"  caller.L=264 } 
May 25 19:34:12 DEBUG LLM result result="{\"arguments\": {\"message\": \"Hello\"}, \"name\": \"answer\"}" caller={caller.file="/build/pkg/functions/parse.go"  caller.L=866 } 
May 25 19:34:12 DEBUG LLM result(function cleanup) result="{\"arguments\": {\"message\": \"Hello\"}, \"name\": \"answer\"}" caller={caller.file="/build/pkg/functions/parse.go"  caller.L=874 } 
May 25 19:34:12 DEBUG Function return result="{\"arguments\": {\"message\": \"Hello\"}, \"name\": \"answer\"}" parsed=[map[arguments:map[message:Hello] name:answer]] caller={caller.file="/build/pkg/functions/parse.go"  caller.L=902 } 
May 25 19:34:12 DEBUG [ParseFunctionCall] trying PEG parser caller={caller.file="/build/pkg/functions/parse.go"  caller.L=1004 } 
May 25 19:34:12 DEBUG [PEG] starting PEG tool call parsing caller={caller.file="/build/pkg/functions/peg_integration.go"  caller.L=22 } 
May 25 19:34:12 DEBUG [PEG] auto-detecting format across all presets caller={caller.file="/build/pkg/functions/peg_integration.go"  caller.L=97 } 
May 25 19:34:12 DEBUG [PEG] parse succeeded content_len=53 reasoning_len=0 tool_calls=0 caller={caller.file="/build/pkg/functions/peg_integration.go"  caller.L=522 } 
  ↳ repeated 7x
May 25 19:34:12 DEBUG [PEG] no tool calls found by any format caller={caller.file="/build/pkg/functions/peg_integration.go"  caller.L=116 } 
May 25 19:34:12 DEBUG [ParseFunctionCall] PEG parser found no tool calls caller={caller.file="/build/pkg/functions/parse.go"  caller.L=1011 } 
May 25 19:34:12 DEBUG [ChatDeltas] final tool call decision tool_calls=1 text_content="" caller={caller.file="/build/core/http/endpoints/openai/chat_stream_workers.go"  caller.L=346 } 
May 25 19:34:12 DEBUG No choices in the response, skipping caller={caller.file="/build/core/http/endpoints/openai/chat.go"  caller.L=370 } 
May 25 19:34:12 DEBUG Stream ended caller={caller.file="/build/core/http/endpoints/openai/chat.go"  caller.L=576 } 
May 25 19:34:12 INFO  HTTP request method="POST" path="/v1/chat/completions" status=200 caller={caller.file="/build/core/http/app.go"  caller.L=205 }

Additional context

The bug can also be reproduced using other models such as gemma-3-4b-it.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions