CPU info:
model name : AMD Ryzen 7 5800X 8-Core Processor
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local user_shstk clzero irperf xsaveerptr rdpru wbnoinvd arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif v_spec_ctrl umip pku ospke vaes vpclmulqdq rdpid overflow_recov succor smca fsrm debug_swap ibpb_exit_to_user
CPU: AVX found OK
CPU: AVX2 found OK
CPU: no AVX512 found
May 25 19:33:54 DEBUG GPU vendor detected via ghw vendor="amd" caller={caller.file="/build/pkg/xsysinfo/gpu.go" caller.L=157 }
May 25 19:33:54 DEBUG GPU vendor gpuVendor="amd" caller={caller.file="/build/pkg/system/state.go" caller.L=77 }
May 25 19:33:54 DEBUG VRAM detected via binary tools total_vram=8589934592 caller={caller.file="/build/pkg/xsysinfo/gpu.go" caller.L=116 }
May 25 19:33:54 DEBUG Total available VRAM vram=8589934592 caller={caller.file="/build/pkg/system/state.go" caller.L=79 }
May 25 19:33:54 INFO Using forced capability run file capabilityRunFile="/run/localai/capability" capability="vulkan\n" env="" caller={caller.file="/build/pkg/system/capabilities.go" caller.L=118 }
May 25 19:33:54 INFO Starting LocalAI threads=8 modelsPath="//models" caller={caller.file="/build/core/application/startup.go" caller.L=39 }
May 25 19:33:54 INFO LocalAI version version="v4.3.1 (1dcd1ae915c69d79e3219b88b85b45a3639a3c74)" caller={caller.file="/build/core/application/startup.go" caller.L=40 }
May 25 19:33:54 INFO LocalAI Assistant in-memory MCP server initialised tools=21 read_only=false caller={caller.file="/build/core/http/endpoints/mcp/localai_assistant.go" caller.L=78 }
...
May 25 19:33:56 INFO LocalAI is started and running address="127.0.0.1:8080" caller={caller.file="/build/core/cli/run.go" caller.L=560 }
May 25 19:33:56 INFO Agent pool started (standalone/LocalAGI mode) stateDir="//data" apiURL="http://127.0.0.1:8080" caller={caller.file="/build/core/services/agentpool/agent_pool.go" caller.L=338 }
May 25 19:33:56 DEBUG HTTP request method="GET" path="/api/operations" status=200 caller={caller.file="/build/core/http/app.go" caller.L=203 }
↳ repeated 2x
May 25 19:33:58 DEBUG HTTP request method="GET" path="/api/resources" status=200 caller={caller.file="/build/core/http/app.go" caller.L=203 }
May 25 19:34:06 DEBUG HTTP request method="GET" path="/api/operations" status=200 caller={caller.file="/build/core/http/app.go" caller.L=203 }
May 25 19:34:06 DEBUG Using reported capability reportedCapability="vulkan" capMap=map[amd:rocm-llama-cpp default:cpu-llama-cpp intel:intel-sycl-f16-llama-cpp metal:metal-llama-cpp nvidia:cuda12-llama-cpp nvidia-cuda-12:cuda12-llama-cpp nvidia-cuda-13:cuda13-llama-cpp nvidia-l4t:nvidia-l4t-arm64-llama-cpp nvidia-l4t-cuda-12:nvidia-l4t-arm64-llama-cpp nvidia-l4t-cuda-13:cuda13-nvidia-l4t-arm64-llama-cpp vulkan:vulkan-llama-cpp] caller={caller.file="/build/pkg/system/capabilities.go" caller.L=71 }
May 25 19:34:06 DEBUG Capability not in map, falling back to default reportedCapability="vulkan" capMap=map[default:cpu-ik-llama-cpp] caller={caller.file="/build/pkg/system/capabilities.go" caller.L=81 }
...
May 25 19:34:07 INFO BackendLoader starting modelID="qwen3-4b" backend="llama-cpp" model="Qwen3-4B.Q4_K_M.gguf" caller={caller.file="/build/pkg/model/initializers.go" caller.L=169 }
May 25 19:34:07 DEBUG Loading model in memory from file file="/models/Qwen3-4B.Q4_K_M.gguf" caller={caller.file="/build/pkg/model/loader.go" caller.L=336 }
May 25 19:34:07 DEBUG Loading Model with gRPC modelID="qwen3-4b" file="/models/Qwen3-4B.Q4_K_M.gguf" backend="llama-cpp" options={llama-cpp Qwen3-4B.Q4_K_M.gguf qwen3-4b {{}} 0x333c9546e308 map[] 20 2 true} caller={caller.file="/build/pkg/model/initializers.go" caller.L=54 }
May 25 19:34:07 DEBUG Loading external backend uri="/backends/vulkan-llama-cpp/run.sh" caller={caller.file="/build/pkg/model/initializers.go" caller.L=87 }
May 25 19:34:07 DEBUG external backend is file file=&{run.sh 1668 493 {0 63914565083 0x943ecc0} {2304 26363832 1 33261 0 0 0 0 1668 4096 8 {1779662828 980616651} {1778968283 0} {1779306410 336789385} [0 0 0]}} caller={caller.file="/build/pkg/model/initializers.go" caller.L=90 }
May 25 19:34:07 DEBUG Loading GRPC Process process="/backends/vulkan-llama-cpp/run.sh" caller={caller.file="/build/pkg/model/process.go" caller.L=145 }
May 25 19:34:07 DEBUG GRPC Service will be running id="qwen3-4b" address="127.0.0.1:40959" caller={caller.file="/build/pkg/model/process.go" caller.L=147 }
May 25 19:34:07 DEBUG GRPC Service state dir dir="/tmp/go-processmanager1275839256" caller={caller.file="/build/pkg/model/process.go" caller.L=171 }
May 25 19:34:07 DEBUG GRPC Service Started caller={caller.file="/build/pkg/model/initializers.go" caller.L=102 }
May 25 19:34:07 DEBUG Wait for the service to start up caller={caller.file="/build/pkg/model/initializers.go" caller.L=115 }
May 25 19:34:07 DEBUG Options options=ContextSize:40960 Seed:1596861957 NBatch:512 MMap:true NGPULayers:99999999 Threads:8 FlashAttention:"auto" Options:"gpu" caller={caller.file="/build/pkg/model/initializers.go" caller.L=116 }
May 25 19:34:07 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="+++ realpath run.sh" caller={caller.file="/build/pkg/model/process.go" caller.L=187 }
May 25 19:34:07 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="++ dirname /backends/vulkan-llama-cpp/run.sh" caller={caller.file="/build/pkg/model/process.go" caller.L=187 }
May 25 19:34:07 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="+ CURDIR=/backends/vulkan-llama-cpp" caller={caller.file="/build/pkg/model/process.go" caller.L=187 }
May 25 19:34:07 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="+ cd /" caller={caller.file="/build/pkg/model/process.go" caller.L=187 }
May 25 19:34:07 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="+ echo 'CPU info:'" caller={caller.file="/build/pkg/model/process.go" caller.L=187 }
May 25 19:34:07 DEBUG GRPC stdout id="qwen3-4b-127.0.0.1:40959" line="CPU info:" caller={caller.file="/build/pkg/model/process.go" caller.L=200 }
May 25 19:34:07 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="+ grep -e 'model\\sname' /proc/cpuinfo" caller={caller.file="/build/pkg/model/process.go" caller.L=187 }
May 25 19:34:07 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="+ head -1" caller={caller.file="/build/pkg/model/process.go" caller.L=187 }
May 25 19:34:07 DEBUG GRPC stdout id="qwen3-4b-127.0.0.1:40959" line="model name\t: AMD Ryzen 7 5800X 8-Core Processor" caller={caller.file="/build/pkg/model/process.go" caller.L=200 }
May 25 19:34:07 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="+ grep -e flags /proc/cpuinfo" caller={caller.file="/build/pkg/model/process.go" caller.L=187 }
May 25 19:34:07 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="+ head -1" caller={caller.file="/build/pkg/model/process.go" caller.L=187 }
May 25 19:34:07 DEBUG GRPC stdout id="qwen3-4b-127.0.0.1:40959" line="flags\t\t: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local user_shstk clzero irperf xsaveerptr rdpru wbnoinvd arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif v_spec_ctrl umip pku ospke vaes vpclmulqdq rdpid overflow_recov succor smca fsrm debug_swap ibpb_exit_to_user" caller={caller.file="/build/pkg/model/process.go" caller.L=200 }
May 25 19:34:07 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="+ BINARY=llama-cpp-fallback" caller={caller.file="/build/pkg/model/process.go" caller.L=187 }
May 25 19:34:07 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="+ grep -q -e '\\savx\\s' /proc/cpuinfo" caller={caller.file="/build/pkg/model/process.go" caller.L=187 }
May 25 19:34:07 DEBUG GRPC stdout id="qwen3-4b-127.0.0.1:40959" line="CPU: AVX found OK" caller={caller.file="/build/pkg/model/process.go" caller.L=200 }
May 25 19:34:07 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="+ echo 'CPU: AVX found OK'" caller={caller.file="/build/pkg/model/process.go" caller.L=187 }
May 25 19:34:07 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="+ '[' -e /backends/vulkan-llama-cpp/llama-cpp-avx ']'" caller={caller.file="/build/pkg/model/process.go" caller.L=187 }
May 25 19:34:07 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="+ BINARY=llama-cpp-avx" caller={caller.file="/build/pkg/model/process.go" caller.L=187 }
May 25 19:34:07 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="+ grep -q -e '\\savx2\\s' /proc/cpuinfo" caller={caller.file="/build/pkg/model/process.go" caller.L=187 }
May 25 19:34:07 DEBUG GRPC stdout id="qwen3-4b-127.0.0.1:40959" line="CPU: AVX2 found OK" caller={caller.file="/build/pkg/model/process.go" caller.L=200 }
May 25 19:34:07 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="+ echo 'CPU: AVX2 found OK'" caller={caller.file="/build/pkg/model/process.go" caller.L=187 }
May 25 19:34:07 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="+ '[' -e /backends/vulkan-llama-cpp/llama-cpp-avx2 ']'" caller={caller.file="/build/pkg/model/process.go" caller.L=187 }
May 25 19:34:07 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="+ BINARY=llama-cpp-avx2" caller={caller.file="/build/pkg/model/process.go" caller.L=187 }
May 25 19:34:07 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="+ grep -q -e '\\savx512f\\s' /proc/cpuinfo" caller={caller.file="/build/pkg/model/process.go" caller.L=187 }
May 25 19:34:07 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="+ '[' -n '' ']'" caller={caller.file="/build/pkg/model/process.go" caller.L=187 }
May 25 19:34:07 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="++ uname" caller={caller.file="/build/pkg/model/process.go" caller.L=187 }
May 25 19:34:07 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="+ '[' Linux == Darwin ']'" caller={caller.file="/build/pkg/model/process.go" caller.L=187 }
May 25 19:34:07 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="+ export LD_LIBRARY_PATH=/backends/vulkan-llama-cpp/lib:" caller={caller.file="/build/pkg/model/process.go" caller.L=187 }
May 25 19:34:07 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="+ LD_LIBRARY_PATH=/backends/vulkan-llama-cpp/lib:" caller={caller.file="/build/pkg/model/process.go" caller.L=187 }
May 25 19:34:07 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="+ '[' -d /backends/vulkan-llama-cpp/lib/rocblas/library ']'" caller={caller.file="/build/pkg/model/process.go" caller.L=187 }
May 25 19:34:07 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="+ '[' -f /backends/vulkan-llama-cpp/lib/ld.so ']'" caller={caller.file="/build/pkg/model/process.go" caller.L=187 }
May 25 19:34:07 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="+ echo 'Using lib/ld.so'" caller={caller.file="/build/pkg/model/process.go" caller.L=187 }
May 25 19:34:07 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="+ echo 'Using binary: llama-cpp-avx2'" caller={caller.file="/build/pkg/model/process.go" caller.L=187 }
May 25 19:34:07 DEBUG GRPC stdout id="qwen3-4b-127.0.0.1:40959" line="Using lib/ld.so" caller={caller.file="/build/pkg/model/process.go" caller.L=200 }
May 25 19:34:07 DEBUG GRPC stdout id="qwen3-4b-127.0.0.1:40959" line="Using binary: llama-cpp-avx2" caller={caller.file="/build/pkg/model/process.go" caller.L=200 }
May 25 19:34:07 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="+ exec /backends/vulkan-llama-cpp/lib/ld.so /backends/vulkan-llama-cpp/llama-cpp-avx2 --addr 127.0.0.1:40959" caller={caller.file="/build/pkg/model/process.go" caller.L=187 }
May 25 19:34:07 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="WARNING: All log messages before absl::InitializeLog() is called are written to STDERR" caller={caller.file="/build/pkg/model/process.go" caller.L=187 }
May 25 19:34:07 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="I0000 00:00:1779737647.832791 221 config.cc:230] gRPC experiments enabled: call_status_override_on_cancellation, event_engine_dns, event_engine_listener, http2_stats_fix, monitoring_experiment, pick_first_new, trace_record_callops, work_serializer_clears_time_cache, work_serializer_dispatch" caller={caller.file="/build/pkg/model/process.go" caller.L=187 }
May 25 19:34:07 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="I0000 00:00:1779737647.832990 221 ev_epoll1_linux.cc:125] grpc epoll fd: 4" caller={caller.file="/build/pkg/model/process.go" caller.L=187 }
May 25 19:34:07 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="I0000 00:00:1779737647.833140 221 server_builder.cc:392] Synchronous server. Num CQs: 1, Min pollers: 1, Max Pollers: 2, CQ timeout (msec): 10000" caller={caller.file="/build/pkg/model/process.go" caller.L=187 }
May 25 19:34:07 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="I0000 00:00:1779737647.834309 221 ev_epoll1_linux.cc:359] grpc epoll fd: 5" caller={caller.file="/build/pkg/model/process.go" caller.L=187 }
May 25 19:34:07 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="I0000 00:00:1779737647.834696 221 tcp_socket_utils.cc:634] TCP_USER_TIMEOUT is available. TCP_USER_TIMEOUT will be used thereafter" caller={caller.file="/build/pkg/model/process.go" caller.L=187 }
May 25 19:34:07 DEBUG GRPC stdout id="qwen3-4b-127.0.0.1:40959" line="Server listening on 127.0.0.1:40959" caller={caller.file="/build/pkg/model/process.go" caller.L=200 }
May 25 19:34:07 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="start_llama_server: starting llama server" caller={caller.file="/build/pkg/model/process.go" caller.L=187 }
May 25 19:34:07 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="start_llama_server: waiting for model to be loaded" caller={caller.file="/build/pkg/model/process.go" caller.L=187 }
May 25 19:34:08 INFO Backend upgrade available (new build) backend="vulkan-llama-cpp" caller={caller.file="/build/core/application/upgrade_checker.go" caller.L=197 }
May 25 19:34:08 INFO Backend upgrade available (new build) backend="vulkan-whisper" caller={caller.file="/build/core/application/upgrade_checker.go" caller.L=197 }
May 25 19:34:08 INFO Backend upgrade available (new build) backend="piper" caller={caller.file="/build/core/application/upgrade_checker.go" caller.L=197 }
May 25 19:34:08 DEBUG HTTP request method="GET" path="/api/operations" status=200 caller={caller.file="/build/core/http/app.go" caller.L=203 }
May 25 19:34:08 DEBUG HTTP request method="GET" path="/api/resources" status=200 caller={caller.file="/build/core/http/app.go" caller.L=203 }
May 25 19:34:09 DEBUG HTTP request method="GET" path="/api/operations" status=200 caller={caller.file="/build/core/http/app.go" caller.L=203 }
May 25 19:34:09 DEBUG GRPC Service Ready caller={caller.file="/build/pkg/model/initializers.go" caller.L=123 }
May 25 19:34:09 DEBUG GRPC: Loading model with options options={{{} [] [] 0x333c95551c20} 0 [] Qwen3-4B.Q4_K_M.gguf 40960 1596861957 512 false false true false false false false 99999999 8 0 0 0 0 /models/Qwen3-4B.Q4_K_M.gguf false 0 false 0 0 false 0 false false 0 0 0 false 0 0 0 0 0 0 0 auto false //models [] [] [gpu] [] false [] } caller={caller.file="/build/pkg/model/initializers.go" caller.L=146 }
May 25 19:34:09 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="0.02.018.170 I system info: n_threads = 8, n_threads_batch = -1, total_threads = 16" caller={caller.file="/build/pkg/model/process.go" caller.L=187 }
May 25 19:34:09 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="0.02.018.173 I " caller={caller.file="/build/pkg/model/process.go" caller.L=187 }
May 25 19:34:09 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="0.02.018.194 I system_info: n_threads = 8 / 16 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | BMI2 = 1 | LLAMAFILE = 1 | OPENMP = 1 | REPACK = 1 | " caller={caller.file="/build/pkg/model/process.go" caller.L=187 }
May 25 19:34:09 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="0.02.018.194 I " caller={caller.file="/build/pkg/model/process.go" caller.L=187 }
May 25 19:34:09 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="0.02.018.196 I srv load_model: loading model '/models/Qwen3-4B.Q4_K_M.gguf'" caller={caller.file="/build/pkg/model/process.go" caller.L=187 }
May 25 19:34:09 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="0.02.018.212 I common_init_result: fitting params to device memory ..." caller={caller.file="/build/pkg/model/process.go" caller.L=187 }
May 25 19:34:09 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="0.02.018.212 I common_init_result: (for bugs during this step try to reproduce them with -fit off, or provide --verbose logs if the bug only occurs with -fit on)" caller={caller.file="/build/pkg/model/process.go" caller.L=187 }
May 25 19:34:10 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="0.02.206.779 W common_fit_params: failed to fit params to free device memory: n_gpu_layers already set by user to 99999999, abort" caller={caller.file="/build/pkg/model/process.go" caller.L=187 }
May 25 19:34:10 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="0.02.295.976 W load: control-looking token: 128247 '</s>' was not control-type; this is probably a bug in the model. its type will be overridden" caller={caller.file="/build/pkg/model/process.go" caller.L=187 }
May 25 19:34:10 DEBUG HTTP request method="GET" path="/api/operations" status=200 caller={caller.file="/build/core/http/app.go" caller.L=203 }
May 25 19:34:10 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="0.03.146.264 W common_init_from_params: warming up the model with an empty run - please wait ... (--no-warmup to disable)" caller={caller.file="/build/pkg/model/process.go" caller.L=187 }
May 25 19:34:11 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="0.03.686.375 I srv load_model: initializing slots, n_slots = 1" caller={caller.file="/build/pkg/model/process.go" caller.L=187 }
May 25 19:34:11 DEBUG HTTP request method="GET" path="/api/operations" status=200 caller={caller.file="/build/core/http/app.go" caller.L=203 }
May 25 19:34:11 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="0.03.896.215 W common_speculative_init: no implementations specified for speculative decoding" caller={caller.file="/build/pkg/model/process.go" caller.L=187 }
May 25 19:34:11 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="0.03.896.220 I slot load_model: id 0 | task -1 | new slot, n_ctx = 40960" caller={caller.file="/build/pkg/model/process.go" caller.L=187 }
May 25 19:34:11 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="0.03.896.237 I srv load_model: prompt cache is enabled, size limit: no limit" caller={caller.file="/build/pkg/model/process.go" caller.L=187 }
May 25 19:34:11 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="0.03.896.238 I srv load_model: use `--cache-ram 0` to disable the prompt cache" caller={caller.file="/build/pkg/model/process.go" caller.L=187 }
May 25 19:34:11 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="0.03.896.238 I srv load_model: for more info see https://github.com/ggml-org/llama.cpp/pull/16391" caller={caller.file="/build/pkg/model/process.go" caller.L=187 }
May 25 19:34:11 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="0.03.896.252 W srv init: --cache-idle-slots requires --kv-unified, disabling" caller={caller.file="/build/pkg/model/process.go" caller.L=187 }
May 25 19:34:11 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="0.03.901.174 I init: chat template, example_format: '<|im_start|>system" caller={caller.file="/build/pkg/model/process.go" caller.L=187 }
May 25 19:34:11 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="You are a helpful assistant<|im_end|>" caller={caller.file="/build/pkg/model/process.go" caller.L=187 }
May 25 19:34:11 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="<|im_start|>user" caller={caller.file="/build/pkg/model/process.go" caller.L=187 }
May 25 19:34:11 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="Hello<|im_end|>" caller={caller.file="/build/pkg/model/process.go" caller.L=187 }
May 25 19:34:11 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="<|im_start|>assistant" caller={caller.file="/build/pkg/model/process.go" caller.L=187 }
May 25 19:34:11 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="Hi there<|im_end|>" caller={caller.file="/build/pkg/model/process.go" caller.L=187 }
May 25 19:34:11 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="<|im_start|>user" caller={caller.file="/build/pkg/model/process.go" caller.L=187 }
May 25 19:34:11 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="How are you?<|im_end|>" caller={caller.file="/build/pkg/model/process.go" caller.L=187 }
May 25 19:34:11 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="<|im_start|>assistant" caller={caller.file="/build/pkg/model/process.go" caller.L=187 }
May 25 19:34:11 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="'" caller={caller.file="/build/pkg/model/process.go" caller.L=187 }
May 25 19:34:11 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="0.03.904.110 I srv init: init: chat template, thinking = 1" caller={caller.file="/build/pkg/model/process.go" caller.L=187 }
May 25 19:34:11 DEBUG Model already loaded in memory model="qwen3-4b" caller={caller.file="/build/pkg/model/loader.go" caller.L=374 }
May 25 19:34:11 DEBUG Checking model availability model="qwen3-4b" caller={caller.file="/build/pkg/model/loader.go" caller.L=386 }
May 25 19:34:11 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="0.03.918.162 I srv parse_option: Using grammar: root-1-name ::= \"\\\"answer\\\"\"" caller={caller.file="/build/pkg/model/process.go" caller.L=187 }
May 25 19:34:11 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="root-1 ::= \"{\" space \"\\\"arguments\\\"\" space \":\" space root-1-arguments \",\" space \"\\\"name\\\"\" space \":\" space root-1-name \"}\" space" caller={caller.file="/build/pkg/model/process.go" caller.L=187 }
May 25 19:34:11 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="root-0 ::= \"{\" space \"\\\"arguments\\\"\" space \":\" space root-0-arguments \",\" space \"\\\"name\\\"\" space \":\" space root-0-name \"}\" space" caller={caller.file="/build/pkg/model/process.go" caller.L=187 }
May 25 19:34:11 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="root ::= root-0 | root-1" caller={caller.file="/build/pkg/model/process.go" caller.L=187 }
May 25 19:34:11 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="space ::= \" \"?" caller={caller.file="/build/pkg/model/process.go" caller.L=187 }
May 25 19:34:11 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="freestring ::= (" caller={caller.file="/build/pkg/model/process.go" caller.L=187 }
May 25 19:34:11 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="\t\t\t[^\\x00] |" caller={caller.file="/build/pkg/model/process.go" caller.L=187 }
May 25 19:34:11 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="\t\t\t\"\\\\\" ([\"\\\\/bfnrt] | \"u\" [0-9a-fA-F] [0-9a-fA-F] [0-9a-fA-F] [0-9a-fA-F])" caller={caller.file="/build/pkg/model/process.go" caller.L=187 }
May 25 19:34:11 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="\t\t )* space" caller={caller.file="/build/pkg/model/process.go" caller.L=187 }
May 25 19:34:11 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="string ::= \"\\\"\" (" caller={caller.file="/build/pkg/model/process.go" caller.L=187 }
May 25 19:34:11 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="\t\t\t[^\"\\\\] |" caller={caller.file="/build/pkg/model/process.go" caller.L=187 }
May 25 19:34:11 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="\t\t\t\"\\\\\" ([\"\\\\/bfnrt] | \"u\" [0-9a-fA-F] [0-9a-fA-F] [0-9a-fA-F] [0-9a-fA-F])" caller={caller.file="/build/pkg/model/process.go" caller.L=187 }
May 25 19:34:11 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="\t\t )* \"\\\"\" space" caller={caller.file="/build/pkg/model/process.go" caller.L=187 }
May 25 19:34:11 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="root-0-arguments ::= \"{\" space \"\\\"cmd\\\"\" space \":\" space string \"}\" space" caller={caller.file="/build/pkg/model/process.go" caller.L=187 }
May 25 19:34:11 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="root-0-name ::= \"\\\"\\\"\"" caller={caller.file="/build/pkg/model/process.go" caller.L=187 }
May 25 19:34:11 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="root-1-arguments ::= \"{\" space \"\\\"message\\\"\" space \":\" space string \"}\" space" caller={caller.file="/build/pkg/model/process.go" caller.L=187 }
May 25 19:34:11 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="0.03.918.167 I srv parse_option: [TOOLS DEBUG] parse_options: Checking for tools in proto, tools().empty()=0, tools().size()=163" caller={caller.file="/build/pkg/model/process.go" caller.L=187 }
May 25 19:34:11 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="0.03.918.167 I srv parse_option: [TOOLS DEBUG] parse_options: Tools string from proto (first 500 chars): [{\"type\":\"function\",\"function\":{\"name\":\"\",\"description\":\"execute a command\",\"strict\":false,\"parameters\":{\"properties\":{\"cmd\":{\"type\":\"string\"}},\"type\":\"object\"}}}]" caller={caller.file="/build/pkg/model/process.go" caller.L=187 }
May 25 19:34:11 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="0.03.918.187 I srv parse_option: Extracted tools from proto: [{\"type\":\"function\",\"function\":{\"name\":\"\",\"description\":\"execute a command\",\"strict\":false,\"parameters\":{\"properties\":{\"cmd\":{\"type\":\"string\"}},\"type\":\"object\"}}}]" caller={caller.file="/build/pkg/model/process.go" caller.L=187 }
May 25 19:34:11 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="0.03.918.187 I srv parse_option: [TOOLS DEBUG] parse_options: Successfully parsed 1 tools from Go layer" caller={caller.file="/build/pkg/model/process.go" caller.L=187 }
May 25 19:34:11 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="0.03.918.188 I srv parse_option: [TOOLS DEBUG] parse_options: Tool 0: " caller={caller.file="/build/pkg/model/process.go" caller.L=187 }
May 25 19:34:11 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="0.03.918.190 I srv parse_option: [TOOLS DEBUG] parse_options: Tools successfully added to data, count: 1" caller={caller.file="/build/pkg/model/process.go" caller.L=187 }
May 25 19:34:11 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="0.04.002.467 I start_llama_server: model loaded" caller={caller.file="/build/pkg/model/process.go" caller.L=187 }
May 25 19:34:11 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="0.04.002.477 I slot get_availabl: id 0 | task -1 | selected slot by LRU, t_last = -1" caller={caller.file="/build/pkg/model/process.go" caller.L=187 }
May 25 19:34:11 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="0.04.002.478 I srv get_availabl: updating prompt cache" caller={caller.file="/build/pkg/model/process.go" caller.L=187 }
May 25 19:34:11 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="0.04.002.482 I srv load: - looking for better prompt, base f_keep = -1.000, sim = 0.000" caller={caller.file="/build/pkg/model/process.go" caller.L=187 }
May 25 19:34:11 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="0.04.002.484 I srv update: - cache state: 0 prompts, 0.000 MiB (limits: 0.000 MiB, 40960 tokens, 40960 est)" caller={caller.file="/build/pkg/model/process.go" caller.L=187 }
May 25 19:34:11 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="0.04.002.485 I srv get_availabl: prompt cache update took 0.01 ms" caller={caller.file="/build/pkg/model/process.go" caller.L=187 }
May 25 19:34:11 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="0.04.002.543 I slot launch_slot_: id 0 | task 0 | processing task, is_child = 0" caller={caller.file="/build/pkg/model/process.go" caller.L=187 }
May 25 19:34:12 DEBUG LLM result result="" caller={caller.file="/build/pkg/functions/parse.go" caller.L=256 }
May 25 19:34:12 DEBUG LLM result(processed) result="" caller={caller.file="/build/pkg/functions/parse.go" caller.L=264 }
May 25 19:34:12 DEBUG Sending chunk chunk="{\"created\":1779737647,\"object\":\"chat.completion.chunk\",\"id\":\"f3c531da-4964-43c9-8008-bb7f8758e817\",\"model\":\"qwen3-4b\",\"choices\":[{\"index\":0,\"finish_reason\":null,\"delta\":{\"role\":\"assistant\",\"content\":null}}]}" caller={caller.file="/build/core/http/endpoints/openai/chat.go" caller.L=394 }
May 25 19:34:12 DEBUG Sending chunk chunk="{\"created\":1779737647,\"object\":\"chat.completion.chunk\",\"id\":\"f3c531da-4964-43c9-8008-bb7f8758e817\",\"model\":\"qwen3-4b\",\"choices\":[{\"index\":0,\"finish_reason\":null,\"delta\":{\"content\":\"{\\\"\"}}]}" caller={caller.file="/build/core/http/endpoints/openai/chat.go" caller.L=394 }
May 25 19:34:12 DEBUG LLM result result="{\"" caller={caller.file="/build/pkg/functions/parse.go" caller.L=256 }
May 25 19:34:12 DEBUG LLM result(processed) result="{\"" caller={caller.file="/build/pkg/functions/parse.go" caller.L=264 }
May 25 19:34:12 DEBUG LLM result result="{\"arguments" caller={caller.file="/build/pkg/functions/parse.go" caller.L=256 }
May 25 19:34:12 DEBUG LLM result(processed) result="{\"arguments" caller={caller.file="/build/pkg/functions/parse.go" caller.L=264 }
May 25 19:34:12 DEBUG LLM result result="{\"arguments\":" caller={caller.file="/build/pkg/functions/parse.go" caller.L=256 }
May 25 19:34:12 DEBUG LLM result(processed) result="{\"arguments\":" caller={caller.file="/build/pkg/functions/parse.go" caller.L=264 }
May 25 19:34:12 DEBUG LLM result result="{\"arguments\": {\"" caller={caller.file="/build/pkg/functions/parse.go" caller.L=256 }
May 25 19:34:12 DEBUG LLM result(processed) result="{\"arguments\": {\"" caller={caller.file="/build/pkg/functions/parse.go" caller.L=264 }
May 25 19:34:12 DEBUG LLM result result="{\"arguments\": {\"message" caller={caller.file="/build/pkg/functions/parse.go" caller.L=256 }
May 25 19:34:12 DEBUG LLM result(processed) result="{\"arguments\": {\"message" caller={caller.file="/build/pkg/functions/parse.go" caller.L=264 }
May 25 19:34:12 DEBUG LLM result result="{\"arguments\": {\"message\":" caller={caller.file="/build/pkg/functions/parse.go" caller.L=256 }
May 25 19:34:12 DEBUG LLM result(processed) result="{\"arguments\": {\"message\":" caller={caller.file="/build/pkg/functions/parse.go" caller.L=264 }
May 25 19:34:12 DEBUG LLM result result="{\"arguments\": {\"message\": \"" caller={caller.file="/build/pkg/functions/parse.go" caller.L=256 }
May 25 19:34:12 DEBUG LLM result(processed) result="{\"arguments\": {\"message\": \"" caller={caller.file="/build/pkg/functions/parse.go" caller.L=264 }
May 25 19:34:12 DEBUG LLM result result="{\"arguments\": {\"message\": \"Hello" caller={caller.file="/build/pkg/functions/parse.go" caller.L=256 }
May 25 19:34:12 DEBUG LLM result(processed) result="{\"arguments\": {\"message\": \"Hello" caller={caller.file="/build/pkg/functions/parse.go" caller.L=264 }
May 25 19:34:12 DEBUG LLM result result="{\"arguments\": {\"message\": \"Hello\"}," caller={caller.file="/build/pkg/functions/parse.go" caller.L=256 }
May 25 19:34:12 DEBUG LLM result(processed) result="{\"arguments\": {\"message\": \"Hello\"}," caller={caller.file="/build/pkg/functions/parse.go" caller.L=264 }
May 25 19:34:12 DEBUG LLM result result="{\"arguments\": {\"message\": \"Hello\"}, \"" caller={caller.file="/build/pkg/functions/parse.go" caller.L=256 }
May 25 19:34:12 DEBUG LLM result(processed) result="{\"arguments\": {\"message\": \"Hello\"}, \"" caller={caller.file="/build/pkg/functions/parse.go" caller.L=264 }
May 25 19:34:12 DEBUG LLM result result="{\"arguments\": {\"message\": \"Hello\"}, \"name" caller={caller.file="/build/pkg/functions/parse.go" caller.L=256 }
May 25 19:34:12 DEBUG LLM result(processed) result="{\"arguments\": {\"message\": \"Hello\"}, \"name" caller={caller.file="/build/pkg/functions/parse.go" caller.L=264 }
May 25 19:34:12 DEBUG LLM result result="{\"arguments\": {\"message\": \"Hello\"}, \"name\":" caller={caller.file="/build/pkg/functions/parse.go" caller.L=256 }
May 25 19:34:12 DEBUG LLM result(processed) result="{\"arguments\": {\"message\": \"Hello\"}, \"name\":" caller={caller.file="/build/pkg/functions/parse.go" caller.L=264 }
May 25 19:34:12 DEBUG LLM result result="{\"arguments\": {\"message\": \"Hello\"}, \"name\": \"" caller={caller.file="/build/pkg/functions/parse.go" caller.L=256 }
May 25 19:34:12 DEBUG LLM result(processed) result="{\"arguments\": {\"message\": \"Hello\"}, \"name\": \"" caller={caller.file="/build/pkg/functions/parse.go" caller.L=264 }
May 25 19:34:12 DEBUG LLM result result="{\"arguments\": {\"message\": \"Hello\"}, \"name\": \"answer" caller={caller.file="/build/pkg/functions/parse.go" caller.L=256 }
May 25 19:34:12 DEBUG LLM result(processed) result="{\"arguments\": {\"message\": \"Hello\"}, \"name\": \"answer" caller={caller.file="/build/pkg/functions/parse.go" caller.L=264 }
May 25 19:34:12 DEBUG LLM result result="{\"arguments\": {\"message\": \"Hello\"}, \"name\": \"answer\"}" caller={caller.file="/build/pkg/functions/parse.go" caller.L=256 }
May 25 19:34:12 DEBUG LLM result(processed) result="{\"arguments\": {\"message\": \"Hello\"}, \"name\": \"answer\"}" caller={caller.file="/build/pkg/functions/parse.go" caller.L=264 }
May 25 19:34:12 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="0.04.593.529 I slot print_timing: id 0 | task 0 | " caller={caller.file="/build/pkg/model/process.go" caller.L=187 }
May 25 19:34:12 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="prompt eval time = 297.67 ms / 185 tokens ( 1.61 ms per token, 621.49 tokens per second)" caller={caller.file="/build/pkg/model/process.go" caller.L=187 }
May 25 19:34:12 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line=" eval time = 293.29 ms / 16 tokens ( 18.33 ms per token, 54.55 tokens per second)" caller={caller.file="/build/pkg/model/process.go" caller.L=187 }
May 25 19:34:12 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line=" total time = 590.96 ms / 201 tokens" caller={caller.file="/build/pkg/model/process.go" caller.L=187 }
May 25 19:34:12 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="0.04.593.556 I slot release: id 0 | task 0 | stop processing: n_tokens = 200, truncated = 0" caller={caller.file="/build/pkg/model/process.go" caller.L=187 }
May 25 19:34:12 DEBUG GRPC stderr id="qwen3-4b-127.0.0.1:40959" line="0.04.593.563 I srv update_slots: all slots are idle" caller={caller.file="/build/pkg/model/process.go" caller.L=187 }
May 25 19:34:12 DEBUG LLM result result="{\"arguments\": {\"message\": \"Hello\"}, \"name\": \"answer\"}" caller={caller.file="/build/pkg/functions/parse.go" caller.L=256 }
May 25 19:34:12 DEBUG LLM result(processed) result="{\"arguments\": {\"message\": \"Hello\"}, \"name\": \"answer\"}" caller={caller.file="/build/pkg/functions/parse.go" caller.L=264 }
May 25 19:34:12 DEBUG [ChatDeltas] streaming completed, accumulated deltas from C++ autoparser total_deltas=15 caller={caller.file="/build/core/backend/llm.go" caller.L=257 }
May 25 19:34:12 DEBUG [ChatDeltas] received deltas from backend total_deltas=15 content_chunks=15 reasoning_chunks=0 tool_call_chunks=0 caller={caller.file="/build/pkg/functions/chat_deltas.go" caller.L=31 }
May 25 19:34:12 DEBUG [ChatDeltas] deltas present but no tool calls found, falling back to text parsing caller={caller.file="/build/pkg/functions/chat_deltas.go" caller.L=67 }
May 25 19:34:12 DEBUG [ChatDeltas] no pre-parsed tool calls, falling back to Go-side text parsing caller={caller.file="/build/core/http/endpoints/openai/chat_stream_workers.go" caller.L=339 }
May 25 19:34:12 DEBUG ParseTextContent result="{\"arguments\": {\"message\": \"Hello\"}, \"name\": \"answer\"}" caller={caller.file="/build/pkg/functions/parse.go" caller.L=270 }
May 25 19:34:12 DEBUG CaptureLLMResult config=[] caller={caller.file="/build/pkg/functions/parse.go" caller.L=271 }
May 25 19:34:12 DEBUG LLM result result="{\"arguments\": {\"message\": \"Hello\"}, \"name\": \"answer\"}" caller={caller.file="/build/pkg/functions/parse.go" caller.L=256 }
May 25 19:34:12 DEBUG LLM result(processed) result="{\"arguments\": {\"message\": \"Hello\"}, \"name\": \"answer\"}" caller={caller.file="/build/pkg/functions/parse.go" caller.L=264 }
May 25 19:34:12 DEBUG LLM result result="{\"arguments\": {\"message\": \"Hello\"}, \"name\": \"answer\"}" caller={caller.file="/build/pkg/functions/parse.go" caller.L=866 }
May 25 19:34:12 DEBUG LLM result(function cleanup) result="{\"arguments\": {\"message\": \"Hello\"}, \"name\": \"answer\"}" caller={caller.file="/build/pkg/functions/parse.go" caller.L=874 }
May 25 19:34:12 DEBUG Function return result="{\"arguments\": {\"message\": \"Hello\"}, \"name\": \"answer\"}" parsed=[map[arguments:map[message:Hello] name:answer]] caller={caller.file="/build/pkg/functions/parse.go" caller.L=902 }
May 25 19:34:12 DEBUG [ParseFunctionCall] trying PEG parser caller={caller.file="/build/pkg/functions/parse.go" caller.L=1004 }
May 25 19:34:12 DEBUG [PEG] starting PEG tool call parsing caller={caller.file="/build/pkg/functions/peg_integration.go" caller.L=22 }
May 25 19:34:12 DEBUG [PEG] auto-detecting format across all presets caller={caller.file="/build/pkg/functions/peg_integration.go" caller.L=97 }
May 25 19:34:12 DEBUG [PEG] parse succeeded content_len=53 reasoning_len=0 tool_calls=0 caller={caller.file="/build/pkg/functions/peg_integration.go" caller.L=522 }
↳ repeated 7x
May 25 19:34:12 DEBUG [PEG] no tool calls found by any format caller={caller.file="/build/pkg/functions/peg_integration.go" caller.L=116 }
May 25 19:34:12 DEBUG [ParseFunctionCall] PEG parser found no tool calls caller={caller.file="/build/pkg/functions/parse.go" caller.L=1011 }
May 25 19:34:12 DEBUG [ChatDeltas] final tool call decision tool_calls=1 text_content="" caller={caller.file="/build/core/http/endpoints/openai/chat_stream_workers.go" caller.L=346 }
May 25 19:34:12 DEBUG No choices in the response, skipping caller={caller.file="/build/core/http/endpoints/openai/chat.go" caller.L=370 }
May 25 19:34:12 DEBUG Stream ended caller={caller.file="/build/core/http/endpoints/openai/chat.go" caller.L=576 }
May 25 19:34:12 INFO HTTP request method="POST" path="/v1/chat/completions" status=200 caller={caller.file="/build/core/http/app.go" caller.L=205 }
LocalAI version:
LocalAI v4.3.1
container: localai/localai:v4.3.1-gpu-vulkan
Environment, CPU architecture, OS, and Version:
Ubuntu 24.04 host with an AMD Ryzen 7 5800X CPU and an AMD Radeon RX 6600 GPU
Describe the bug
The LLM response only consists of
{"when the chat completion API request specifies tools and enables streaming.This is a regression since it worked in LocalAI v4.0.0 but stopped working at some point prior to LocalAI v4.3.1.
Relates to #9419 and #9363.
To Reproduce
docker run -ti --rm --network=host --privileged -v "$(pwd)/data/models:/models" -v "$(pwd)/data/backends:/backends" localai/localai:v4.3.1-gpu-vulkan --address 127.0.0.1:8080qwen3-4bmodel.curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{"model": "qwen3-4b", "messages": [{"role": "user", "content": "Hello"}], "stream": true, "tools": [{"name":"exec","type":"function","function":{"parameters":{"type":"object","properties":{"cmd":{"type":"string"}}}, "description":"execute a command"}}]}'{":Expected behavior
The response should contain actual content: In the example a greeting would be appropriate. Alternatively, given the tool definition, the model could decide to call a tool, if requested to do so.
However,
{"is not a useful response.Logs
LocalAI logs
Additional context
The bug can also be reproduced using other models such as
gemma-3-4b-it.