Skip to content

Regression: Model responds with tool call JSON object string within the content field when stream+tools enabled #10052

@mgoltzsche

Description

@mgoltzsche

LocalAI version:

LocalAI v4.3.3
container: localai/localai:v4.3.3-gpu-vulkan

Environment, CPU architecture, OS, and Version:

Ubuntu 24.04 host with an AMD Ryzen 7 5800X CPU and an AMD Radeon RX 6600 GPU

Linux max-machine 6.8.0-117-generic #117-Ubuntu SMP PREEMPT_DYNAMIC Tue May  5 19:26:24 UTC 2026 x86_64 x86_64 x86_64 GNU/Linux

Describe the bug
The chat completion endpoint returns the JSON-formatted tool call object without an id as string within the content field in addition to a tool call when stream=true and tools provided.
(Also, it always calls a tool, in doubt it calls the answer tool even if the client didn't define it.)

Relates to #9985 and #9988.

To Reproduce

  1. Start LocalAI, e.g. docker run -ti --rm --network=host --privileged -v "$(pwd)/data/models:/models" -v "$(pwd)/data/backends:/backends" localai/localai:v4.3.3-gpu-vulkan --address 127.0.0.1:8080 --debug
  2. Download the qwen3-4b model.
  3. Request chat completion with streaming enabled and at least one tool definition: curl -s http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{"model": "qwen3-4b", "messages": [{"role": "user", "content": "Say Hello"}], "stream": true, "tools": [{"name":"exec","type":"function","function":{"parameters":{"type":"object","properties":{"cmd":{"type":"string"}}}, "description":"execute a command"}}]}' | sed -E 's/^data: //' | jq .choices[].delta
  4. Observe that the response contains the tool call as JSON object string (without id field) in addition to the actual tool call:
{
  "role": "assistant",
  "content": null
}
{
  "content": "{\""
}
{
  "content": "a"
}
{
  "content": "rg"
}
{
  "content": "uments"
}
{
  "content": "\":"
}
{
  "content": " {\""
}
{
  "content": "message"
}
{
  "content": "\":"
}
{
  "content": " \""
}
{
  "content": "Hello"
}
{
  "content": ","
}
{
  "content": " how"
}
{
  "content": " can"
}
{
  "content": " I"
}
{
  "content": " assist"
}
{
  "content": " you"
}
{
  "content": " today"
}
{
  "content": "?\""
}
{
  "content": "},"
}
{
  "content": " \""
}
{
  "content": "name"
}
{
  "content": "\":"
}
{
  "content": " \""
}
{
  "content": "a"
}
{
  "content": "ns"
}
{
  "content": "we"
}
{
  "content": "r"
}
{
  "content": "\"}"
}
{
  "role": "assistant",
  "content": null,
  "tool_calls": [
    {
      "index": 0,
      "id": "53db6833-387e-4ba8-9204-5cc8adcaf9f2",
      "type": "function",
      "function": {
        "name": "answer",
        "arguments": "{\"message\":\"Hello, how can I assist you today?\"}"
      }
    }
  ]
}
{
  "content": null
}

Expected behavior

The endpoint should not respond with a JSON object but the actual message (Hello, how can I assist you today? in this case) within the content field.
Ideally, the endpoint should not return a call of a tool that the client didn't define (answer; but that's not a priority and behaves like that for a while already).

Logs

LocalAI 4.3.3 logs
CPU info:
model name	: AMD Ryzen 7 5800X 8-Core Processor
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local user_shstk clzero irperf xsaveerptr rdpru wbnoinvd arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif v_spec_ctrl umip pku ospke vaes vpclmulqdq rdpid overflow_recov succor smca fsrm debug_swap ibpb_exit_to_user
CPU:    AVX    found OK
CPU:    AVX2   found OK
CPU: no AVX512 found
May 28 21:10:24 DEBUG GPU vendor detected via ghw vendor="amd" caller={caller.file="/build/pkg/xsysinfo/gpu.go"  caller.L=164 } 
May 28 21:10:24 DEBUG GPU vendor gpuVendor="amd" caller={caller.file="/build/pkg/system/state.go"  caller.L=77 } 
May 28 21:10:25 DEBUG VRAM detected via binary tools total_vram=8589934592 caller={caller.file="/build/pkg/xsysinfo/gpu.go"  caller.L=123 } 
May 28 21:10:25 DEBUG Total available VRAM vram=8589934592 caller={caller.file="/build/pkg/system/state.go"  caller.L=79 } 
May 28 21:10:25 INFO  Using forced capability run file capabilityRunFile="/run/localai/capability" capability="vulkan\n" env="" caller={caller.file="/build/pkg/system/capabilities.go"  caller.L=118 } 
May 28 21:10:25 INFO  Starting LocalAI threads=8 modelsPath="//models" caller={caller.file="/build/core/application/startup.go"  caller.L=46 } 
May 28 21:10:25 INFO  LocalAI version version="v4.3.3 (b81a6d01b35c36f3b1731230a8a6ba11bca5f06f)" caller={caller.file="/build/core/application/startup.go"  caller.L=47 } 
May 28 21:10:25 INFO  LocalAI Assistant in-memory MCP server initialised tools=29 read_only=false caller={caller.file="/build/core/http/endpoints/mcp/localai_assistant.go"  caller.L=78 } 
May 28 21:10:25 DEBUG CPU capabilities capabilities=[3dnowprefetch abm adx aes aperfmperf apic arat avic avx avx2 bmi1 bmi2 bpext cat_l3 cdp_l3 clflush clflushopt clwb clzero cmov cmp_legacy constant_tsc cpb cpuid cqm cqm_llc cqm_mbm_local cqm_mbm_total cqm_occup_llc cr8_legacy cx16 cx8 de debug_swap decodeassists erms extapic extd_apicid f16c flushbyasid fma fpu fsgsbase fsrm fxsr fxsr_opt ht hw_pstate ibpb ibpb_exit_to_user ibrs ibs invpcid irperf lahf_lm lbrv lm mba mca mce misalignsse mmx mmxext monitor movbe msr mtrr mwaitx nonstop_tsc nopl npt nrip_save nx ospke osvw overflow_recov pae pat pausefilter pclmulqdq pdpe1gb perfctr_core perfctr_llc perfctr_nb pfthreshold pge pku pni popcnt pse pse36 rapl rdpid rdpru rdrand rdseed rdt_a rdtscp rep_good sep sha_ni skinit smap smca smep ssbd sse sse2 sse4_1 sse4_2 sse4a ssse3 stibp succor svm_lock syscall tce topoext tsc tsc_scale umip user_shstk v_spec_ctrl v_vmsave_vmload vaes vgif vmcb_clean vme vmmcall vpclmulqdq wbnoinvd wdt x2apic xgetbv1 xsave xsavec xsaveerptr xsaveopt xsaves] caller={caller.file="/build/core/application/startup.go"  caller.L=55 } 
May 28 21:10:25 DEBUG GPU count count=1 caller={caller.file="/build/core/application/startup.go"  caller.L=60 } 
May 28 21:10:25 DEBUG GPU gpu="card #1 @0000:0d:00.0 -> driver: 'amdgpu' class: 'Display controller' vendor: 'Advanced Micro Devices, Inc. [AMD/ATI]' product: 'Navi 23 [Radeon RX 6600/6600 XT/6600M]'" caller={caller.file="/build/core/application/startup.go"  caller.L=62 } 
May 28 21:10:25 INFO  stats: using in-memory ring buffer (no-auth single-user mode) caller={caller.file="/build/core/application/startup.go"  caller.L=180 } 
May 28 21:10:25 INFO  stats: fallback user wired local_user_id="a1f59fa9-2ed5-494c-b0e7-8801a455a395" caller={caller.file="/build/core/application/startup.go"  caller.L=190 } 
May 28 21:10:25 INFO  pii: filter enabled patterns=6 config_path="" persisted_overrides=0 caller={caller.file="/build/core/application/startup.go"  caller.L=225 } 
May 28 21:10:25 DEBUG agent_tasks.json not found, starting with empty tasks caller={caller.file="/build/core/services/agentpool/job_persister_file.go"  caller.L=72 } 
May 28 21:10:25 INFO  Loaded tasks from persister count=0 caller={caller.file="/build/core/services/agentpool/agent_jobs.go"  caller.L=141 } 
May 28 21:10:25 DEBUG agent_jobs.json not found, starting with empty jobs caller={caller.file="/build/core/services/agentpool/job_persister_file.go"  caller.L=99 } 
May 28 21:10:25 INFO  Loaded jobs from persister count=0 caller={caller.file="/build/core/services/agentpool/agent_jobs.go"  caller.L=150 } 
May 28 21:10:25 INFO  AgentJobService started retention_days=30 caller={caller.file="/build/core/services/agentpool/agent_jobs.go"  caller.L=1439 } 
May 28 21:10:25 DEBUG [inference_defaults] applying defaults for model modelID="qwen3-4b" family=map[min_p:0 repeat_penalty:1 temperature:0.6 top_k:20 top_p:0.95] caller={caller.file="/build/core/config/inference_defaults.go"  caller.L=90 } 
May 28 21:10:25 DEBUG [gguf] guessDefaultsFromFile: NGPULayers set NGPULayers=0xa8d050ee250 modelName="Models Qwen Qwen3 4B" caller={caller.file="/build/core/config/gguf.go"  caller.L=48 } 
May 28 21:10:25 DEBUG [gguf] guessDefaultsFromFile: template already set name="qwen3-4b" modelName="Models Qwen Qwen3 4B" caller={caller.file="/build/core/config/gguf.go"  caller.L=69 } 
May 28 21:10:25 DEBUG No system backends found caller={caller.file="/build/core/gallery/backends.go"  caller.L=588 } 
May 28 21:10:25 DEBUG Registering backend name="vulkan-llama-cpp" runFile="/backends/vulkan-llama-cpp/run.sh" caller={caller.file="/build/core/gallery/backends.go"  caller.L=698 } 
May 28 21:10:25 DEBUG Registering backend name="llama-cpp" runFile="/backends/vulkan-llama-cpp/run.sh" caller={caller.file="/build/core/gallery/backends.go"  caller.L=698 } 
May 28 21:10:25 INFO  Preloading models path="//models" caller={caller.file="/build/core/config/model_config_loader.go"  caller.L=306 } 

  Model name: qwen3-4b                                                        

...
May 28 21:10:25 INFO  core/startup process completed! caller={caller.file="/build/core/application/startup.go"  caller.L=488 } 
May 28 21:10:25 DEBUG Enabling CSRF middleware (Sec-Fetch-Site mode) caller={caller.file="/build/core/http/app.go"  caller.L=307 } 
May 28 21:10:25 INFO  LocalAI is started and running address="127.0.0.1:8080" caller={caller.file="/build/core/cli/run.go"  caller.L=570 } 
May 28 21:10:25 INFO  Agent pool started (standalone/LocalAGI mode) stateDir="//data" apiURL="http://127.0.0.1:8080" caller={caller.file="/build/core/services/agentpool/agent_pool.go"  caller.L=338 } 
May 28 21:10:26 DEBUG HTTP request method="GET" path="/api/operations" status=200 caller={caller.file="/build/core/http/app.go"  caller.L=201 } 
May 28 21:10:32 DEBUG context local model name not found, setting to the first model first model name="qwen3-4b" caller={caller.file="/build/core/http/middleware/request.go"  caller.L=115 } 
May 28 21:10:32 DEBUG [inference_defaults] applying defaults for model modelID="qwen3-4b" family=map[min_p:0 repeat_penalty:1 temperature:0.6 top_k:20 top_p:0.95] caller={caller.file="/build/core/config/inference_defaults.go"  caller.L=90 } 
May 28 21:10:33 DEBUG HTTP request method="GET" path="/api/operations" status=200 caller={caller.file="/build/core/http/app.go"  caller.L=201 } 
May 28 21:10:33 DEBUG [gguf] guessDefaultsFromFile: NGPULayers set NGPULayers=0xa8d050ee250 modelName="Models Qwen Qwen3 4B" caller={caller.file="/build/core/config/gguf.go"  caller.L=48 } 
May 28 21:10:33 DEBUG [gguf] guessDefaultsFromFile: template already set name="qwen3-4b" modelName="Models Qwen Qwen3 4B" caller={caller.file="/build/core/config/gguf.go"  caller.L=69 } 
May 28 21:10:33 DEBUG input.Input input="<nil>" caller={caller.file="/build/core/http/middleware/request.go"  caller.L=525 } 
May 28 21:10:33 DEBUG Chat endpoint configuration read config=&{/models/qwen3-4b.yaml {%- if tools %}
...
May 28 21:10:33 DEBUG Tool call routing decision shouldUseFn=true len(input.Functions)=1 len(input.Tools)=1 config.ShouldUseFunctions()=true config.FunctionToCall()="" caller={caller.file="/build/core/http/endpoints/openai/chat.go"  caller.L=244 } 
May 28 21:10:33 DEBUG Response needs to process functions caller={caller.file="/build/core/http/endpoints/openai/chat.go"  caller.L=311 } 
May 28 21:10:33 DEBUG Parameters config=&{/models/qwen3-4b.yaml {%- if tools %}
...
May 28 21:10:33 DEBUG Stream request received caller={caller.file="/build/core/http/endpoints/openai/chat.go"  caller.L=385 } 
May 28 21:10:33 INFO  BackendLoader starting modelID="qwen3-4b" backend="llama-cpp" model="Qwen3-4B.Q4_K_M.gguf" caller={caller.file="/build/pkg/model/initializers.go"  caller.L=169 } 
May 28 21:10:33 DEBUG Loading model in memory from file file="/models/Qwen3-4B.Q4_K_M.gguf" caller={caller.file="/build/pkg/model/loader.go"  caller.L=336 } 
May 28 21:10:33 DEBUG Loading Model with gRPC modelID="qwen3-4b" file="/models/Qwen3-4B.Q4_K_M.gguf" backend="llama-cpp" options={llama-cpp Qwen3-4B.Q4_K_M.gguf qwen3-4b 0xa8d0538c7e0 0xa8d0488c008 map[] 20 2 true} caller={caller.file="/build/pkg/model/initializers.go"  caller.L=54 } 
May 28 21:10:33 DEBUG Loading external backend uri="/backends/vulkan-llama-cpp/run.sh" caller={caller.file="/build/pkg/model/initializers.go"  caller.L=87 } 
May 28 21:10:33 DEBUG external backend is file file=&{run.sh 1668 493 {0 63915522359 0x9477280} {2304 26364732 1 33261 0 0 0 0 1668 4096 8 {1780002602 670885365} {1779925559 0} {1780002566 92829890} [0 0 0]}} caller={caller.file="/build/pkg/model/initializers.go"  caller.L=90 } 
May 28 21:10:33 DEBUG Loading GRPC Process process="/backends/vulkan-llama-cpp/run.sh" caller={caller.file="/build/pkg/model/process.go"  caller.L=145 } 
May 28 21:10:33 DEBUG GRPC Service will be running id="qwen3-4b" address="127.0.0.1:45241" caller={caller.file="/build/pkg/model/process.go"  caller.L=147 } 
May 28 21:10:33 DEBUG GRPC Service state dir dir="/tmp/go-processmanager2472996243" caller={caller.file="/build/pkg/model/process.go"  caller.L=171 } 
May 28 21:10:33 DEBUG GRPC Service Started caller={caller.file="/build/pkg/model/initializers.go"  caller.L=102 } 
May 28 21:10:33 DEBUG Wait for the service to start up caller={caller.file="/build/pkg/model/initializers.go"  caller.L=115 } 
May 28 21:10:33 DEBUG Options options=ContextSize:40960  Seed:186017436  NBatch:512  MMap:true  NGPULayers:99999999  Threads:8  FlashAttention:"auto"  Options:"use_jinja:true" caller={caller.file="/build/pkg/model/initializers.go"  caller.L=116 } 
May 28 21:10:33 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:45241" line="+++ realpath run.sh" caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 28 21:10:33 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:45241" line="++ dirname /backends/vulkan-llama-cpp/run.sh" caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 28 21:10:33 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:45241" line="+ CURDIR=/backends/vulkan-llama-cpp" caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 28 21:10:33 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:45241" line="+ cd /" caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 28 21:10:33 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:45241" line="+ echo 'CPU info:'" caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 28 21:10:33 DEBUG GRPC stdout id="qwen3-4b-127.0.0.1:45241" line="CPU info:" caller={caller.file="/build/pkg/model/process.go"  caller.L=200 } 
May 28 21:10:33 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:45241" line="+ grep -e 'model\\sname' /proc/cpuinfo" caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 28 21:10:33 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:45241" line="+ head -1" caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 28 21:10:33 DEBUG GRPC stdout id="qwen3-4b-127.0.0.1:45241" line="model name\t: AMD Ryzen 7 5800X 8-Core Processor" caller={caller.file="/build/pkg/model/process.go"  caller.L=200 } 
May 28 21:10:33 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:45241" line="+ grep -e flags /proc/cpuinfo" caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 28 21:10:33 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:45241" line="+ head -1" caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 28 21:10:33 DEBUG GRPC stdout id="qwen3-4b-127.0.0.1:45241" line="flags\t\t: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local user_shstk clzero irperf xsaveerptr rdpru wbnoinvd arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif v_spec_ctrl umip pku ospke vaes vpclmulqdq rdpid overflow_recov succor smca fsrm debug_swap ibpb_exit_to_user" caller={caller.file="/build/pkg/model/process.go"  caller.L=200 } 
May 28 21:10:33 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:45241" line="+ BINARY=llama-cpp-fallback" caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 28 21:10:33 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:45241" line="+ grep -q -e '\\savx\\s' /proc/cpuinfo" caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 28 21:10:33 DEBUG GRPC stdout id="qwen3-4b-127.0.0.1:45241" line="CPU:    AVX    found OK" caller={caller.file="/build/pkg/model/process.go"  caller.L=200 }
...
May 28 21:10:35 DEBUG GRPC Service Ready caller={caller.file="/build/pkg/model/initializers.go"  caller.L=123 } 
May 28 21:10:35 DEBUG GRPC: Loading model with options options={{{} [] [] 0xa8d04d205f8} 0 [] Qwen3-4B.Q4_K_M.gguf 40960 186017436 512 false false true false false false false 99999999   8 0 0 0 0 /models/Qwen3-4B.Q4_K_M.gguf   false 0 false   0     0 false    0 false false 0 0 0  false  0 0 0   0 0 0 0  auto false //models [] [] [use_jinja:true]   [] false []  <nil>} caller={caller.file="/build/pkg/model/initializers.go"  caller.L=146 } 
May 28 21:10:35 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:45241" line="0.02.022.470 I system info: n_threads = 8, n_threads_batch = -1, total_threads = 16" caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 28 21:10:35 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:45241" line="0.02.022.472 I " caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 28 21:10:35 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:45241" line="0.02.022.493 I system_info: n_threads = 8 / 16 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | BMI2 = 1 | LLAMAFILE = 1 | OPENMP = 1 | REPACK = 1 | " caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 28 21:10:35 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:45241" line="0.02.022.493 I " caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 28 21:10:35 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:45241" line="0.02.022.494 I srv    load_model: loading model '/models/Qwen3-4B.Q4_K_M.gguf'" caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 28 21:10:35 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:45241" line="0.02.022.520 I common_init_result: fitting params to device memory ..." caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 28 21:10:35 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:45241" line="0.02.022.520 I common_init_result: (for bugs during this step try to reproduce them with -fit off, or provide --verbose logs if the bug only occurs with -fit on)" caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 28 21:10:35 DEBUG No system backends found caller={caller.file="/build/core/gallery/backends.go"  caller.L=588 } 
  ↳ repeated 1x
May 28 21:10:35 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:45241" line="0.02.207.115 W common_fit_params: failed to fit params to free device memory: n_gpu_layers already set by user to 99999999, abort" caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 28 21:10:35 DEBUG Using reported capability reportedCapability="vulkan" capMap=map[amd:rocm-llama-cpp default:cpu-llama-cpp intel:intel-sycl-f16-llama-cpp metal:metal-llama-cpp nvidia:cuda12-llama-cpp nvidia-cuda-12:cuda12-llama-cpp nvidia-cuda-13:cuda13-llama-cpp nvidia-l4t:nvidia-l4t-arm64-llama-cpp nvidia-l4t-cuda-12:nvidia-l4t-arm64-llama-cpp nvidia-l4t-cuda-13:cuda13-nvidia-l4t-arm64-llama-cpp vulkan:vulkan-llama-cpp] caller={caller.file="/build/pkg/system/capabilities.go"  caller.L=71 } 
May 28 21:10:35 DEBUG Capability not in map, falling back to default reportedCapability="vulkan" capMap=map[default:cpu-ik-llama-cpp] caller={caller.file="/build/pkg/system/capabilities.go"  caller.L=81 } 
May 28 21:10:35 DEBUG Using reported capability reportedCapability="vulkan" capMap=map[amd:rocm-turboquant default:cpu-turboquant intel:intel-sycl-f16-turboquant nvidia:cuda12-turboquant nvidia-cuda-12:cuda12-turboquant nvidia-cuda-13:cuda13-turboquant nvidia-l4t:nvidia-l4t-arm64-turboquant nvidia-l4t-cuda-12:nvidia-l4t-arm64-turboquant nvidia-l4t-cuda-13:cuda13-nvidia-l4t-arm64-turboquant vulkan:vulkan-turboquant] caller={caller.file="/build/pkg/system/capabilities.go"  caller.L=71 } 
...
May 28 21:10:35 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:45241" line="0.02.301.776 W load: control-looking token: 128247 '</s>' was not control-type; this is probably a bug in the model. its type will be overridden" caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 28 21:10:36 DEBUG All backends up to date caller={caller.file="/build/core/application/upgrade_checker.go"  caller.L=185 } 
May 28 21:10:36 DEBUG HTTP request method="GET" path="/api/operations" status=200 caller={caller.file="/build/core/http/app.go"  caller.L=201 } 
May 28 21:10:36 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:45241" line="0.03.272.054 I common_init_from_params: warming up the model with an empty run - please wait ... (--no-warmup to disable)" caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 28 21:10:37 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:45241" line="0.03.871.437 I srv    load_model: initializing slots, n_slots = 1" caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 28 21:10:37 DEBUG HTTP request method="GET" path="/api/operations" status=200 caller={caller.file="/build/core/http/app.go"  caller.L=201 } 
May 28 21:10:37 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:45241" line="0.04.210.614 W common_speculative_init: no implementations specified for speculative decoding" caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 28 21:10:37 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:45241" line="0.04.210.619 I slot   load_model: id  0 | task -1 | new slot, n_ctx = 40960" caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 28 21:10:37 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:45241" line="0.04.210.635 I srv    load_model: prompt cache is enabled, size limit: no limit" caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 28 21:10:37 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:45241" line="0.04.210.636 I srv    load_model: use `--cache-ram 0` to disable the prompt cache" caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 28 21:10:37 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:45241" line="0.04.210.636 I srv    load_model: for more info see https://github.com/ggml-org/llama.cpp/pull/16391" caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 28 21:10:37 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:45241" line="0.04.210.636 I srv    load_model: context checkpoints enabled, max = 32, min spacing = 256" caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 28 21:10:37 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:45241" line="0.04.210.649 I srv          init: idle slots will be saved to prompt cache and cleared upon starting a new task" caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 28 21:10:37 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:45241" line="0.04.220.967 I init: chat template, example_format: '<|im_start|>system" caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 28 21:10:37 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:45241" line="You are a helpful assistant<|im_end|>" caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 28 21:10:37 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:45241" line="<|im_start|>user" caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 28 21:10:37 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:45241" line="Hello<|im_end|>" caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 28 21:10:37 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:45241" line="<|im_start|>assistant" caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 28 21:10:37 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:45241" line="Hi there<|im_end|>" caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 28 21:10:37 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:45241" line="<|im_start|>user" caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 28 21:10:37 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:45241" line="How are you?<|im_end|>" caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 28 21:10:37 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:45241" line="<|im_start|>assistant" caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 28 21:10:37 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:45241" line="'" caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 28 21:10:37 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:45241" line="0.04.226.932 I srv          init: init: chat template, thinking = 1" caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 28 21:10:37 DEBUG Model already loaded in memory model="qwen3-4b" caller={caller.file="/build/pkg/model/loader.go"  caller.L=374 } 
May 28 21:10:37 DEBUG Checking model availability model="qwen3-4b" caller={caller.file="/build/pkg/model/loader.go"  caller.L=386 } 
May 28 21:10:37 DEBUG [gguf] DetectThinkingSupportFromBackend: thinking support detected supports_thinking=true thinking_forced_open=false thinking_start_token="" caller={caller.file="/build/core/config/gguf.go"  caller.L=199 } 
May 28 21:10:37 DEBUG [gguf] DetectThinkingSupportFromBackend: tool format markers detected format_type="json_native" section_start="" func_name_prefix="" caller={caller.file="/build/core/config/gguf.go"  caller.L=171 } 
May 28 21:10:37 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:45241" line="0.04.251.499 I srv  parse_option: Using grammar: space ::= \" \"?" caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 28 21:10:37 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:45241" line="freestring ::= (" caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 28 21:10:37 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:45241" line="\t\t\t[^\\x00] |" caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 28 21:10:37 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:45241" line="\t\t\t\"\\\\\" ([\"\\\\/bfnrt] | \"u\" [0-9a-fA-F] [0-9a-fA-F] [0-9a-fA-F] [0-9a-fA-F])" caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 28 21:10:37 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:45241" line="\t\t  )* space" caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 28 21:10:37 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:45241" line="string ::= \"\\\"\" (" caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 28 21:10:37 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:45241" line="\t\t\t[^\"\\\\] |" caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 28 21:10:37 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:45241" line="\t\t\t\"\\\\\" ([\"\\\\/bfnrt] | \"u\" [0-9a-fA-F] [0-9a-fA-F] [0-9a-fA-F] [0-9a-fA-F])" caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 28 21:10:37 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:45241" line="\t\t  )* \"\\\"\" space" caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 28 21:10:37 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:45241" line="root-0-arguments ::= \"{\" space \"\\\"cmd\\\"\" space \":\" space string \"}\" space" caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 28 21:10:37 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:45241" line="root-0-name ::= \"\\\"\\\"\"" caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 28 21:10:37 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:45241" line="root-0 ::= \"{\" space \"\\\"arguments\\\"\" space \":\" space root-0-arguments \",\" space \"\\\"name\\\"\" space \":\" space root-0-name \"}\" space" caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 28 21:10:37 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:45241" line="root-1-arguments ::= \"{\" space \"\\\"message\\\"\" space \":\" space string \"}\" space" caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 28 21:10:37 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:45241" line="root-1-name ::= \"\\\"answer\\\"\"" caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 28 21:10:37 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:45241" line="root-1 ::= \"{\" space \"\\\"arguments\\\"\" space \":\" space root-1-arguments \",\" space \"\\\"name\\\"\" space \":\" space root-1-name \"}\" space" caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 28 21:10:37 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:45241" line="root ::= root-0 | root-1" caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 28 21:10:37 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:45241" line="0.04.251.503 I srv  parse_option: [TOOLS DEBUG] parse_options: Checking for tools in proto, tools().empty()=0, tools().size()=163" caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 28 21:10:37 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:45241" line="0.04.251.504 I srv  parse_option: [TOOLS DEBUG] parse_options: Tools string from proto (first 500 chars): [{\"type\":\"function\",\"function\":{\"name\":\"\",\"description\":\"execute a command\",\"strict\":false,\"parameters\":{\"properties\":{\"cmd\":{\"type\":\"string\"}},\"type\":\"object\"}}}]" caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 28 21:10:37 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:45241" line="0.04.251.514 I srv  parse_option: Extracted tools from proto: [{\"type\":\"function\",\"function\":{\"name\":\"\",\"description\":\"execute a command\",\"strict\":false,\"parameters\":{\"properties\":{\"cmd\":{\"type\":\"string\"}},\"type\":\"object\"}}}]" caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 28 21:10:37 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:45241" line="0.04.251.515 I srv  parse_option: [TOOLS DEBUG] parse_options: Successfully parsed 1 tools from Go layer" caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 28 21:10:37 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:45241" line="0.04.251.535 I srv  parse_option: [TOOLS DEBUG] parse_options: Tool 0: " caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 28 21:10:37 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:45241" line="0.04.251.538 I srv  parse_option: [TOOLS DEBUG] parse_options: Tools successfully added to data, count: 1" caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 28 21:10:37 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:45241" line="0.04.251.568 I srv  PredictStrea: [CONTENT DEBUG] PredictStream: Message 0 FINAL STATE: content type=string, has_value=1" caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 28 21:10:37 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:45241" line="0.04.251.569 I srv  PredictStrea: [CONTENT DEBUG] PredictStream: Running final safety check on 1 messages" caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 28 21:10:37 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:45241" line="0.04.251.570 I srv  PredictStrea: [CONTENT DEBUG] PredictStream: Safety check message 0: content OK, type=string" caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 28 21:10:37 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:45241" line="0.04.251.576 I srv  PredictStrea: [TOOLS DEBUG] PredictStream: has_grammar_from_go=1, data.contains(\"tools\")=1, data.contains(\"grammar\")=1" caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 28 21:10:37 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:45241" line="0.04.251.576 I srv  PredictStrea: [TOOLS DEBUG] PredictStream: grammar type=string, empty=0" caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 28 21:10:37 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:45241" line="0.04.251.577 I srv  PredictStrea: Grammar provided from Go layer - using it instead of template-generated grammar" caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 28 21:10:37 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:45241" line="0.04.251.586 I srv  PredictStrea: [CONTENT DEBUG] PredictStream: Before oaicompat_chat_params_parse - checking 1 messages" caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 28 21:10:37 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:45241" line="0.04.251.587 I srv  PredictStrea: [CONTENT DEBUG] PredictStream: BEFORE TEMPLATE - Message 0 (role=user): content type=string" caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 28 21:10:37 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:45241" line="0.04.258.871 I srv  params_from_: Chat format: peg-native" caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 28 21:10:37 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:45241" line="0.04.303.014 I start_llama_server: model loaded" caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 28 21:10:37 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:45241" line="0.04.303.024 I slot get_availabl: id  0 | task -1 | selected slot by LRU, t_last = -1" caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 28 21:10:37 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:45241" line="0.04.303.024 I srv  get_availabl: updating prompt cache" caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 28 21:10:37 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:45241" line="0.04.303.028 I srv          load:  - looking for better prompt, base f_keep = -1.000, sim = 0.000" caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 28 21:10:37 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:45241" line="0.04.303.029 I srv        update:  - cache state: 0 prompts, 0.000 MiB (limits: 0.000 MiB, 40960 tokens, 40960 est)" caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 28 21:10:37 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:45241" line="0.04.303.030 I srv  get_availabl: prompt cache update took 0.00 ms" caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 28 21:10:37 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:45241" line="0.04.303.129 I slot launch_slot_: id  0 | task 0 | processing task, is_child = 0" caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 28 21:10:37 DEBUG LLM result result="" caller={caller.file="/build/pkg/functions/parse.go"  caller.L=256 } 
May 28 21:10:37 DEBUG LLM result(processed) result="" caller={caller.file="/build/pkg/functions/parse.go"  caller.L=264 } 
May 28 21:10:37 DEBUG Sending chunk chunk="{\"created\":1780002633,\"object\":\"chat.completion.chunk\",\"id\":\"5ccc918f-8c53-4832-981c-352093c59c71\",\"model\":\"qwen3-4b\",\"choices\":[{\"index\":0,\"finish_reason\":null,\"delta\":{\"role\":\"assistant\",\"content\":null}}]}" caller={caller.file="/build/core/http/endpoints/openai/chat.go"  caller.L=504 } 
May 28 21:10:37 DEBUG Sending chunk chunk="{\"created\":1780002633,\"object\":\"chat.completion.chunk\",\"id\":\"5ccc918f-8c53-4832-981c-352093c59c71\",\"model\":\"qwen3-4b\",\"choices\":[{\"index\":0,\"finish_reason\":null,\"delta\":{\"content\":\"{\\\"\"}}]}" caller={caller.file="/build/core/http/endpoints/openai/chat.go"  caller.L=504 } 
May 28 21:10:37 DEBUG LLM result result="{\"" caller={caller.file="/build/pkg/functions/parse.go"  caller.L=256 } 
May 28 21:10:37 DEBUG LLM result(processed) result="{\"" caller={caller.file="/build/pkg/functions/parse.go"  caller.L=264 } 
May 28 21:10:37 DEBUG LLM result result="{\"a" caller={caller.file="/build/pkg/functions/parse.go"  caller.L=256 } 
May 28 21:10:37 DEBUG LLM result(processed) result="{\"a" caller={caller.file="/build/pkg/functions/parse.go"  caller.L=264 } 
May 28 21:10:37 DEBUG Sending chunk chunk="{\"created\":1780002633,\"object\":\"chat.completion.chunk\",\"id\":\"5ccc918f-8c53-4832-981c-352093c59c71\",\"model\":\"qwen3-4b\",\"choices\":[{\"index\":0,\"finish_reason\":null,\"delta\":{\"content\":\"a\"}}]}" caller={caller.file="/build/core/http/endpoints/openai/chat.go"  caller.L=504 } 
May 28 21:10:37 DEBUG LLM result result="{\"arg" caller={caller.file="/build/pkg/functions/parse.go"  caller.L=256 } 
May 28 21:10:37 DEBUG LLM result(processed) result="{\"arg" caller={caller.file="/build/pkg/functions/parse.go"  caller.L=264 } 
May 28 21:10:37 DEBUG Sending chunk chunk="{\"created\":1780002633,\"object\":\"chat.completion.chunk\",\"id\":\"5ccc918f-8c53-4832-981c-352093c59c71\",\"model\":\"qwen3-4b\",\"choices\":[{\"index\":0,\"finish_reason\":null,\"delta\":{\"content\":\"rg\"}}]}" caller={caller.file="/build/core/http/endpoints/openai/chat.go"  caller.L=504 } 
May 28 21:10:37 DEBUG LLM result result="{\"arguments" caller={caller.file="/build/pkg/functions/parse.go"  caller.L=256 } 
May 28 21:10:37 DEBUG LLM result(processed) result="{\"arguments" caller={caller.file="/build/pkg/functions/parse.go"  caller.L=264 } 
May 28 21:10:37 DEBUG Sending chunk chunk="{\"created\":1780002633,\"object\":\"chat.completion.chunk\",\"id\":\"5ccc918f-8c53-4832-981c-352093c59c71\",\"model\":\"qwen3-4b\",\"choices\":[{\"index\":0,\"finish_reason\":null,\"delta\":{\"content\":\"uments\"}}]}" caller={caller.file="/build/core/http/endpoints/openai/chat.go"  caller.L=504 } 
May 28 21:10:37 DEBUG LLM result result="{\"arguments\":" caller={caller.file="/build/pkg/functions/parse.go"  caller.L=256 } 
May 28 21:10:37 DEBUG LLM result(processed) result="{\"arguments\":" caller={caller.file="/build/pkg/functions/parse.go"  caller.L=264 } 
May 28 21:10:37 DEBUG Sending chunk chunk="{\"created\":1780002633,\"object\":\"chat.completion.chunk\",\"id\":\"5ccc918f-8c53-4832-981c-352093c59c71\",\"model\":\"qwen3-4b\",\"choices\":[{\"index\":0,\"finish_reason\":null,\"delta\":{\"content\":\"\\\":\"}}]}" caller={caller.file="/build/core/http/endpoints/openai/chat.go"  caller.L=504 } 
May 28 21:10:37 DEBUG LLM result result="{\"arguments\": {\"" caller={caller.file="/build/pkg/functions/parse.go"  caller.L=256 } 
May 28 21:10:37 DEBUG LLM result(processed) result="{\"arguments\": {\"" caller={caller.file="/build/pkg/functions/parse.go"  caller.L=264 } 
May 28 21:10:37 DEBUG Sending chunk chunk="{\"created\":1780002633,\"object\":\"chat.completion.chunk\",\"id\":\"5ccc918f-8c53-4832-981c-352093c59c71\",\"model\":\"qwen3-4b\",\"choices\":[{\"index\":0,\"finish_reason\":null,\"delta\":{\"content\":\" {\\\"\"}}]}" caller={caller.file="/build/core/http/endpoints/openai/chat.go"  caller.L=504 } 
May 28 21:10:37 DEBUG LLM result result="{\"arguments\": {\"message" caller={caller.file="/build/pkg/functions/parse.go"  caller.L=256 } 
May 28 21:10:37 DEBUG LLM result(processed) result="{\"arguments\": {\"message" caller={caller.file="/build/pkg/functions/parse.go"  caller.L=264 } 
May 28 21:10:37 DEBUG Sending chunk chunk="{\"created\":1780002633,\"object\":\"chat.completion.chunk\",\"id\":\"5ccc918f-8c53-4832-981c-352093c59c71\",\"model\":\"qwen3-4b\",\"choices\":[{\"index\":0,\"finish_reason\":null,\"delta\":{\"content\":\"message\"}}]}" caller={caller.file="/build/core/http/endpoints/openai/chat.go"  caller.L=504 } 
May 28 21:10:37 DEBUG LLM result result="{\"arguments\": {\"message\":" caller={caller.file="/build/pkg/functions/parse.go"  caller.L=256 } 
May 28 21:10:37 DEBUG LLM result(processed) result="{\"arguments\": {\"message\":" caller={caller.file="/build/pkg/functions/parse.go"  caller.L=264 } 
May 28 21:10:37 DEBUG Sending chunk chunk="{\"created\":1780002633,\"object\":\"chat.completion.chunk\",\"id\":\"5ccc918f-8c53-4832-981c-352093c59c71\",\"model\":\"qwen3-4b\",\"choices\":[{\"index\":0,\"finish_reason\":null,\"delta\":{\"content\":\"\\\":\"}}]}" caller={caller.file="/build/core/http/endpoints/openai/chat.go"  caller.L=504 } 
May 28 21:10:38 DEBUG LLM result result="{\"arguments\": {\"message\": \"" caller={caller.file="/build/pkg/functions/parse.go"  caller.L=256 } 
May 28 21:10:38 DEBUG LLM result(processed) result="{\"arguments\": {\"message\": \"" caller={caller.file="/build/pkg/functions/parse.go"  caller.L=264 } 
May 28 21:10:38 DEBUG Sending chunk chunk="{\"created\":1780002633,\"object\":\"chat.completion.chunk\",\"id\":\"5ccc918f-8c53-4832-981c-352093c59c71\",\"model\":\"qwen3-4b\",\"choices\":[{\"index\":0,\"finish_reason\":null,\"delta\":{\"content\":\" \\\"\"}}]}" caller={caller.file="/build/core/http/endpoints/openai/chat.go"  caller.L=504 } 
May 28 21:10:38 DEBUG LLM result result="{\"arguments\": {\"message\": \"Hello" caller={caller.file="/build/pkg/functions/parse.go"  caller.L=256 } 
May 28 21:10:38 DEBUG LLM result(processed) result="{\"arguments\": {\"message\": \"Hello" caller={caller.file="/build/pkg/functions/parse.go"  caller.L=264 } 
May 28 21:10:38 DEBUG Sending chunk chunk="{\"created\":1780002633,\"object\":\"chat.completion.chunk\",\"id\":\"5ccc918f-8c53-4832-981c-352093c59c71\",\"model\":\"qwen3-4b\",\"choices\":[{\"index\":0,\"finish_reason\":null,\"delta\":{\"content\":\"Hello\"}}]}" caller={caller.file="/build/core/http/endpoints/openai/chat.go"  caller.L=504 } 
May 28 21:10:38 DEBUG LLM result result="{\"arguments\": {\"message\": \"Hello!\"" caller={caller.file="/build/pkg/functions/parse.go"  caller.L=256 } 
May 28 21:10:38 DEBUG LLM result(processed) result="{\"arguments\": {\"message\": \"Hello!\"" caller={caller.file="/build/pkg/functions/parse.go"  caller.L=264 } 
May 28 21:10:38 DEBUG Sending chunk chunk="{\"created\":1780002633,\"object\":\"chat.completion.chunk\",\"id\":\"5ccc918f-8c53-4832-981c-352093c59c71\",\"model\":\"qwen3-4b\",\"choices\":[{\"index\":0,\"finish_reason\":null,\"delta\":{\"content\":\"!\\\"\"}}]}" caller={caller.file="/build/core/http/endpoints/openai/chat.go"  caller.L=504 } 
May 28 21:10:38 DEBUG LLM result result="{\"arguments\": {\"message\": \"Hello!\"}," caller={caller.file="/build/pkg/functions/parse.go"  caller.L=256 } 
May 28 21:10:38 DEBUG LLM result(processed) result="{\"arguments\": {\"message\": \"Hello!\"}," caller={caller.file="/build/pkg/functions/parse.go"  caller.L=264 } 
May 28 21:10:38 DEBUG Sending chunk chunk="{\"created\":1780002633,\"object\":\"chat.completion.chunk\",\"id\":\"5ccc918f-8c53-4832-981c-352093c59c71\",\"model\":\"qwen3-4b\",\"choices\":[{\"index\":0,\"finish_reason\":null,\"delta\":{\"content\":\"},\"}}]}" caller={caller.file="/build/core/http/endpoints/openai/chat.go"  caller.L=504 } 
May 28 21:10:38 DEBUG LLM result result="{\"arguments\": {\"message\": \"Hello!\"}, \"" caller={caller.file="/build/pkg/functions/parse.go"  caller.L=256 } 
May 28 21:10:38 DEBUG LLM result(processed) result="{\"arguments\": {\"message\": \"Hello!\"}, \"" caller={caller.file="/build/pkg/functions/parse.go"  caller.L=264 } 
May 28 21:10:38 DEBUG Sending chunk chunk="{\"created\":1780002633,\"object\":\"chat.completion.chunk\",\"id\":\"5ccc918f-8c53-4832-981c-352093c59c71\",\"model\":\"qwen3-4b\",\"choices\":[{\"index\":0,\"finish_reason\":null,\"delta\":{\"content\":\" \\\"\"}}]}" caller={caller.file="/build/core/http/endpoints/openai/chat.go"  caller.L=504 } 
May 28 21:10:38 DEBUG LLM result result="{\"arguments\": {\"message\": \"Hello!\"}, \"name" caller={caller.file="/build/pkg/functions/parse.go"  caller.L=256 } 
May 28 21:10:38 DEBUG LLM result(processed) result="{\"arguments\": {\"message\": \"Hello!\"}, \"name" caller={caller.file="/build/pkg/functions/parse.go"  caller.L=264 } 
May 28 21:10:38 DEBUG Sending chunk chunk="{\"created\":1780002633,\"object\":\"chat.completion.chunk\",\"id\":\"5ccc918f-8c53-4832-981c-352093c59c71\",\"model\":\"qwen3-4b\",\"choices\":[{\"index\":0,\"finish_reason\":null,\"delta\":{\"content\":\"name\"}}]}" caller={caller.file="/build/core/http/endpoints/openai/chat.go"  caller.L=504 } 
May 28 21:10:38 DEBUG LLM result result="{\"arguments\": {\"message\": \"Hello!\"}, \"name\":" caller={caller.file="/build/pkg/functions/parse.go"  caller.L=256 } 
May 28 21:10:38 DEBUG LLM result(processed) result="{\"arguments\": {\"message\": \"Hello!\"}, \"name\":" caller={caller.file="/build/pkg/functions/parse.go"  caller.L=264 } 
May 28 21:10:38 DEBUG Sending chunk chunk="{\"created\":1780002633,\"object\":\"chat.completion.chunk\",\"id\":\"5ccc918f-8c53-4832-981c-352093c59c71\",\"model\":\"qwen3-4b\",\"choices\":[{\"index\":0,\"finish_reason\":null,\"delta\":{\"content\":\"\\\":\"}}]}" caller={caller.file="/build/core/http/endpoints/openai/chat.go"  caller.L=504 } 
May 28 21:10:38 DEBUG LLM result result="{\"arguments\": {\"message\": \"Hello!\"}, \"name\": \"" caller={caller.file="/build/pkg/functions/parse.go"  caller.L=256 } 
May 28 21:10:38 DEBUG LLM result(processed) result="{\"arguments\": {\"message\": \"Hello!\"}, \"name\": \"" caller={caller.file="/build/pkg/functions/parse.go"  caller.L=264 } 
May 28 21:10:38 DEBUG Sending chunk chunk="{\"created\":1780002633,\"object\":\"chat.completion.chunk\",\"id\":\"5ccc918f-8c53-4832-981c-352093c59c71\",\"model\":\"qwen3-4b\",\"choices\":[{\"index\":0,\"finish_reason\":null,\"delta\":{\"content\":\" \\\"\"}}]}" caller={caller.file="/build/core/http/endpoints/openai/chat.go"  caller.L=504 } 
May 28 21:10:38 DEBUG LLM result result="{\"arguments\": {\"message\": \"Hello!\"}, \"name\": \"a" caller={caller.file="/build/pkg/functions/parse.go"  caller.L=256 } 
May 28 21:10:38 DEBUG LLM result(processed) result="{\"arguments\": {\"message\": \"Hello!\"}, \"name\": \"a" caller={caller.file="/build/pkg/functions/parse.go"  caller.L=264 } 
May 28 21:10:38 DEBUG Sending chunk chunk="{\"created\":1780002633,\"object\":\"chat.completion.chunk\",\"id\":\"5ccc918f-8c53-4832-981c-352093c59c71\",\"model\":\"qwen3-4b\",\"choices\":[{\"index\":0,\"finish_reason\":null,\"delta\":{\"content\":\"a\"}}]}" caller={caller.file="/build/core/http/endpoints/openai/chat.go"  caller.L=504 } 
May 28 21:10:38 DEBUG LLM result result="{\"arguments\": {\"message\": \"Hello!\"}, \"name\": \"ans" caller={caller.file="/build/pkg/functions/parse.go"  caller.L=256 } 
May 28 21:10:38 DEBUG LLM result(processed) result="{\"arguments\": {\"message\": \"Hello!\"}, \"name\": \"ans" caller={caller.file="/build/pkg/functions/parse.go"  caller.L=264 } 
May 28 21:10:38 DEBUG Sending chunk chunk="{\"created\":1780002633,\"object\":\"chat.completion.chunk\",\"id\":\"5ccc918f-8c53-4832-981c-352093c59c71\",\"model\":\"qwen3-4b\",\"choices\":[{\"index\":0,\"finish_reason\":null,\"delta\":{\"content\":\"ns\"}}]}" caller={caller.file="/build/core/http/endpoints/openai/chat.go"  caller.L=504 } 
May 28 21:10:38 DEBUG LLM result result="{\"arguments\": {\"message\": \"Hello!\"}, \"name\": \"answe" caller={caller.file="/build/pkg/functions/parse.go"  caller.L=256 } 
May 28 21:10:38 DEBUG LLM result(processed) result="{\"arguments\": {\"message\": \"Hello!\"}, \"name\": \"answe" caller={caller.file="/build/pkg/functions/parse.go"  caller.L=264 } 
May 28 21:10:38 DEBUG Sending chunk chunk="{\"created\":1780002633,\"object\":\"chat.completion.chunk\",\"id\":\"5ccc918f-8c53-4832-981c-352093c59c71\",\"model\":\"qwen3-4b\",\"choices\":[{\"index\":0,\"finish_reason\":null,\"delta\":{\"content\":\"we\"}}]}" caller={caller.file="/build/core/http/endpoints/openai/chat.go"  caller.L=504 } 
May 28 21:10:38 DEBUG LLM result result="{\"arguments\": {\"message\": \"Hello!\"}, \"name\": \"answer" caller={caller.file="/build/pkg/functions/parse.go"  caller.L=256 } 
May 28 21:10:38 DEBUG LLM result(processed) result="{\"arguments\": {\"message\": \"Hello!\"}, \"name\": \"answer" caller={caller.file="/build/pkg/functions/parse.go"  caller.L=264 } 
May 28 21:10:38 DEBUG Sending chunk chunk="{\"created\":1780002633,\"object\":\"chat.completion.chunk\",\"id\":\"5ccc918f-8c53-4832-981c-352093c59c71\",\"model\":\"qwen3-4b\",\"choices\":[{\"index\":0,\"finish_reason\":null,\"delta\":{\"content\":\"r\"}}]}" caller={caller.file="/build/core/http/endpoints/openai/chat.go"  caller.L=504 } 
May 28 21:10:38 DEBUG LLM result result="{\"arguments\": {\"message\": \"Hello!\"}, \"name\": \"answer\"}" caller={caller.file="/build/pkg/functions/parse.go"  caller.L=256 } 
May 28 21:10:38 DEBUG LLM result(processed) result="{\"arguments\": {\"message\": \"Hello!\"}, \"name\": \"answer\"}" caller={caller.file="/build/pkg/functions/parse.go"  caller.L=264 } 
May 28 21:10:38 DEBUG Sending chunk chunk="{\"created\":1780002633,\"object\":\"chat.completion.chunk\",\"id\":\"5ccc918f-8c53-4832-981c-352093c59c71\",\"model\":\"qwen3-4b\",\"choices\":[{\"index\":0,\"finish_reason\":null,\"delta\":{\"content\":\"\\\"}\"}}]}" caller={caller.file="/build/core/http/endpoints/openai/chat.go"  caller.L=504 } 
May 28 21:10:38 DEBUG Sending chunk chunk="{\"created\":1780002633,\"object\":\"chat.completion.chunk\",\"id\":\"5ccc918f-8c53-4832-981c-352093c59c71\",\"model\":\"qwen3-4b\",\"choices\":[{\"index\":0,\"finish_reason\":null,\"delta\":{\"role\":\"assistant\",\"content\":null,\"tool_calls\":[{\"index\":0,\"id\":\"5ccc918f-8c53-4832-981c-352093c59c71\",\"type\":\"function\",\"function\":{\"name\":\"answer\",\"arguments\":\"{\\\"message\\\":\\\"Hello!\\\"}\"}}]}}]}" caller={caller.file="/build/core/http/endpoints/openai/chat.go"  caller.L=504 } 
May 28 21:10:38 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:45241" line="0.04.908.858 I slot print_timing: id  0 | task 0 | prompt eval time =     155.70 ms /    10 tokens (   15.57 ms per token,    64.23 tokens per second)" caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 28 21:10:38 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:45241" line="0.04.908.860 I slot print_timing: id  0 | task 0 |        eval time =     450.01 ms /    22 tokens (   20.46 ms per token,    48.89 tokens per second)" caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 28 21:10:38 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:45241" line="0.04.908.861 I slot print_timing: id  0 | task 0 |       total time =     605.71 ms /    32 tokens" caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 28 21:10:38 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:45241" line="0.04.908.861 I slot print_timing: id  0 | task 0 |    graphs reused =         21" caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 28 21:10:38 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:45241" line="0.04.908.872 I slot      release: id  0 | task 0 | stop processing: n_tokens = 31, truncated = 0" caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 28 21:10:38 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:45241" line="0.04.908.877 I srv  update_slots: all slots are idle" caller={caller.file="/build/pkg/model/process.go"  caller.L=187 } 
May 28 21:10:38 DEBUG LLM result result="{\"arguments\": {\"message\": \"Hello!\"}, \"name\": \"answer\"}" caller={caller.file="/build/pkg/functions/parse.go"  caller.L=256 } 
May 28 21:10:38 DEBUG LLM result(processed) result="{\"arguments\": {\"message\": \"Hello!\"}, \"name\": \"answer\"}" caller={caller.file="/build/pkg/functions/parse.go"  caller.L=264 } 
May 28 21:10:38 DEBUG LLM result result="{\"arguments\": {\"message\": \"Hello!\"}, \"name\": \"answer\"}" caller={caller.file="/build/pkg/functions/parse.go"  caller.L=256 } 
May 28 21:10:38 DEBUG LLM result(processed) result="{\"arguments\": {\"message\": \"Hello!\"}, \"name\": \"answer\"}" caller={caller.file="/build/pkg/functions/parse.go"  caller.L=264 } 
May 28 21:10:38 DEBUG [ChatDeltas] streaming completed, accumulated deltas from C++ autoparser total_deltas=21 caller={caller.file="/build/core/backend/llm.go"  caller.L=257 } 
May 28 21:10:38 DEBUG [ChatDeltas] received deltas from backend total_deltas=21 content_chunks=21 reasoning_chunks=0 tool_call_chunks=0 caller={caller.file="/build/pkg/functions/chat_deltas.go"  caller.L=31 } 
May 28 21:10:38 DEBUG [ChatDeltas] deltas present but no tool calls found, falling back to text parsing caller={caller.file="/build/pkg/functions/chat_deltas.go"  caller.L=67 } 
May 28 21:10:38 DEBUG [ChatDeltas] no pre-parsed tool calls, falling back to Go-side text parsing caller={caller.file="/build/core/http/endpoints/openai/chat_stream_workers.go"  caller.L=446 } 
May 28 21:10:38 DEBUG ParseTextContent result="" caller={caller.file="/build/pkg/functions/parse.go"  caller.L=270 } 
May 28 21:10:38 DEBUG CaptureLLMResult config=[] caller={caller.file="/build/pkg/functions/parse.go"  caller.L=271 } 
May 28 21:10:38 DEBUG LLM result result="" caller={caller.file="/build/pkg/functions/parse.go"  caller.L=256 } 
May 28 21:10:38 DEBUG LLM result(processed) result="" caller={caller.file="/build/pkg/functions/parse.go"  caller.L=264 } 
May 28 21:10:38 DEBUG LLM result result="" caller={caller.file="/build/pkg/functions/parse.go"  caller.L=876 } 
May 28 21:10:38 DEBUG LLM result(function cleanup) result="" caller={caller.file="/build/pkg/functions/parse.go"  caller.L=884 } 
May 28 21:10:38 DEBUG Function return result="" parsed=[] caller={caller.file="/build/pkg/functions/parse.go"  caller.L=912 } 
May 28 21:10:38 DEBUG [ParseFunctionCall] trying PEG parser caller={caller.file="/build/pkg/functions/parse.go"  caller.L=1014 } 
May 28 21:10:38 DEBUG [PEG] starting PEG tool call parsing caller={caller.file="/build/pkg/functions/peg_integration.go"  caller.L=22 } 
May 28 21:10:38 DEBUG [PEG] using auto-detected markers from C++ backend format_type="json_native" section_start="" section_end="" per_call_start="<tool_call>\n" per_call_end="</tool_call>" func_name_prefix="" func_name_suffix="" func_close="" arg_name_prefix="" arg_name_suffix="" arg_value_prefix="" arg_value_suffix="" arg_separator="" name_field="name" args_field="arguments" id_field="" reasoning_start="<think>\n" reasoning_end="\n</think>\n\n" caller={caller.file="/build/pkg/functions/peg_integration.go"  caller.L=27 } 
May 28 21:10:38 DEBUG [PEG] parse succeeded content_len=0 reasoning_len=0 tool_calls=0 caller={caller.file="/build/pkg/functions/peg_integration.go"  caller.L=522 } 
May 28 21:10:38 DEBUG [PEG] markers-based parser found no tool calls caller={caller.file="/build/pkg/functions/peg_integration.go"  caller.L=54 } 
May 28 21:10:38 DEBUG [PEG] auto-detecting format across all presets caller={caller.file="/build/pkg/functions/peg_integration.go"  caller.L=97 } 
May 28 21:10:38 DEBUG [PEG] parse succeeded content_len=0 reasoning_len=0 tool_calls=0 caller={caller.file="/build/pkg/functions/peg_integration.go"  caller.L=522 } 
  ↳ repeated 7x
May 28 21:10:38 DEBUG [PEG] no tool calls found by any format caller={caller.file="/build/pkg/functions/peg_integration.go"  caller.L=116 } 
May 28 21:10:38 DEBUG [ParseFunctionCall] PEG parser found no tool calls caller={caller.file="/build/pkg/functions/parse.go"  caller.L=1021 } 
May 28 21:10:38 DEBUG [ChatDeltas] final tool call decision tool_calls=0 text_content="" caller={caller.file="/build/core/http/endpoints/openai/chat_stream_workers.go"  caller.L=460 } 
May 28 21:10:38 DEBUG No choices in the response, skipping caller={caller.file="/build/core/http/endpoints/openai/chat.go"  caller.L=440 } 
  ↳ repeated 3x
May 28 21:10:38 DEBUG Stream ended caller={caller.file="/build/core/http/endpoints/openai/chat.go"  caller.L=714 } 
May 28 21:10:38 INFO  HTTP request method="POST" path="/v1/chat/completions" status=200 caller={caller.file="/build/core/http/app.go"  caller.L=203 }

Additional context

The bug can also be reproduced using other models such as gemma-3-4b-it on llama-cpp.

Before running the steps to reproduce, I deleted all models and backends by wiping my local ./data directory to ensure I am using the latest llama-cpp backend.

When using LocalAI 4.0.0 the response looks as follows (not great but slightly better: the response content is not a JSON object but the tool call is duplicated and the undefined answer tool is already called):

{
  "role": "assistant",
  "content": null,
  "tool_calls": [
    {
      "index": 0,
      "id": "b915e248-b02d-4abd-a163-f2732699347a",
      "type": "function",
      "function": {
        "name": "answer",
        "arguments": "{\"message\":\"Hello, how can I assist you today?\"}"
      }
    }
  ]
}
{
  "role": "assistant",
  "content": null,
  "tool_calls": [
    {
      "index": 1,
      "id": "b915e248-b02d-4abd-a163-f2732699347a",
      "type": "function",
      "function": {
        "name": "answer",
        "arguments": "{\"message\":\"Hello, how can I assist you today?\"}"
      }
    }
  ]
}
{
  "role": "assistant",
  "content": null
}
{
  "content": "Hello, how can I assist you today?"
}

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions