feat(local-ai): update image docker.io/localai/localai v4.3.6 → v4.4.0#49004
Merged
truecharts-admin merged 1 commit intoJun 11, 2026
Merged
Conversation
📝 Linting results:✔️ Linting [charts/stable/local-ai]: Passed - Took 0 seconds ✅ Linting: Passed - Took 1 seconds |
Crow-Control
approved these changes
Jun 11, 2026
Crow-Control
left a comment
Member
There was a problem hiding this comment.
Auto approved automated PR
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR contains the following updates:
d62ab7b→78a86bfWarning
Some dependencies could not be looked up. Check the Dependency Dashboard for more information.
Add the preset
:preserveSemverRangesto your config if you don't want to pin your dependencies.Release Notes
mudler/LocalAI (docker.io/localai/localai)
v4.4.0Compare Source
🎉 LocalAI 4.4.0 Release! 🚀
LocalAI 4.4.0 is out!
This is a big, multimodal-and-distributed release. Two brand-new audio backends land - parakeet.cpp (NVIDIA NeMo Parakeet ASR) and CrispASR (a multi-architecture ASR and TTS engine) - alongside native object detection + segmentation (
rfdetr-cpp), video understanding inllama-cpp, and LTX-2 video generation instablediffusion-ggml. Distributed mode grows up: prefix-cache-aware routing is on by default, and file transfers become resumable. There's a new intelligent middleware layer for request routing, PII filtering and cloud-model proxying, a security hardening pass that closes a credential-leak class across every outbound HTTP client, an interactivelocal-ai chatCLI, RAG source citations for agents, and a long run of reasoning / tool-call streaming fixes.📌 TL;DR
parakeet-cpp(NeMo FastConformer TDT/CTC/RNNT, streaming, word/segment timestamps) andcrispasr(many ASR architectures + TTS in one binary).llama-cppvia mtmd, and video generation via LTX-2 instablediffusion-ggml.rfdetr-cppbackend (RF-DETR), 32 prebuilt GGUFs, bbox + per-detection PNG masks.pkg/httpclientrefuses cross-host credential-leaking redirects across every outbound client (GHSA-3mj3-57v2-4636).instructions+ a genericparamsmap plumbed end to end (Qwen3-TTS VoiceDesign / CustomVoice, Chatterbox).local-ai chat/models,/model,/clear.Sources:block from the Knowledge Base.🚀 New Features & Major Enhancements
🎙️ Audio Gets Serious: Two New ASR Backends
This release doubles down on speech-to-text with two independent, cgo-less Go backends (purego,
CGO_ENABLED=0), each shipping a full CI matrix, gallery importer and docs.parakeet-cpp- NVIDIA NeMo Parakeet (#10084). Wraps parakeet.cpp, a C++/ggml port of NeMo Parakeet (FastConformer TDT/CTC/RNNT/hybrid) that matches the upstream PyTorch models on CPU. Text transcription, OpenAI-compatible word timestamps, and cache-aware streaming (16 kHz PCM chunks,<EOU>/<EOB>utterance boundaries). GGUFs for all 10 Parakeet models × 5 quants ship inmudler/parakeet-cpp-gguf. Follow-ups in this cycle made it production-grade:get_segment_offsets(sentence-punctuation boundaries by default, opt-insegment_gap_thresholdsilence splitting in encoder frames). StreamingFinalResultsegments now carrystart/endwhen the library exposes the ABI v4 JSON entry points.nemotron-3.5-asrmultilingual streaming (#10199) + per-request language selection.crispasr- many architectures + TTS in one backend (#10099). Wraps CrispASR (a whisper.cpp/ggml fork, MIT) through its session C-ABI. One backend serves ASR or TTS depending on the loaded model, with the architecture auto-detected from the GGUF (or forced viabackend:). The gallery gains 36-crispasrentries (32 ASR + 4 TTS):backend:/codec:/speaker:/voice:model options.🧭 Intelligent Middleware: Routing, PII Filtering & Cloud Proxies
A new middleware layer (#9802) analyzes, routes, filters and transforms chat requests before they hit a model.
/v1/chat/completions, Anthropic/v1/messages, and/v1/completions. A per-model PII pattern editor lives in the model config UI.proxy_connect+proxy_trafficaudit events and restores its listener fromruntime_settings.jsonon restart.Usage stats are recorded end to end and surfaced in REST, the UI, and MCP. Outbound clients used by this path were also the trigger for the security pass below.
🛰️ Distributed Mode v4
Distributed mode keeps maturing across routing, security and resilience.
Prefix-cache-aware routing, on by default (#10071). Routing now biases toward the replica that already holds the relevant KV/prefix cache, as a load-guarded hint that never routes worse than today's round-robin. A generic prefix tree (
pkg/radixtree) maps cumulative prompt-prefix hashes to nodes;core/services/nodes/prefixcacheturns the rendered prompt into a deterministic xxhash chain and makes a filter-then-score decision (narrow to load-eligible replicas, then prefer the longest-prefix match), feeding apreferredNodeIDinto the existing atomicSELECT ... FOR UPDATEpick. Observations sync across frontends over NATS. Round-robin is the floor; disable with--distributed-prefix-cache=false.NATS JWT auth + TLS/mTLS (#10159). Previously anyone with access to the NATS port could publish backend-install messages or agent jobs (an SSRF / accidental-exposure risk). This adds JWT authentication and TLS/mTLS options, with workers acquiring and auto-refreshing their NATS credentials. Complemented by worker file-transfer registration-token enforcement (#10183).
Resumable file transfers (#10109). Large model GGUFs over flaky/throttled links no longer restart from byte 0. The worker's
PUT /v1/files/<key>honorsContent-Range(308/416 resume semantics,X-Content-SHA256binding, final-hash verification) and the master-side stager HEAD-probes for the last accepted offset and resumes, switching to an outer time budget (LOCALAI_FILE_TRANSFER_BUDGET, default 1h) with exponential backoff.ds4 layer-split distributed inference (#10098). Manual layer-split inference for the ds4 backend: a coordinator owns layers
0:Kand listens; workers dial in and own higher ranges, each loading only its slice of the GGUF (a new dependency-freeds4-workerbinary, driven vialocal-ai worker ds4-distributed). Fully back-compatible whends4_roleis absent.Operational glue. Boot-time gallery prefetch via
LOCALAI_PREFETCH_MODELS(#10108); a gatedX-LocalAI-Noderesponse header for attribution (#9976); plus fixes: self-heal stale "model not loaded" routing (#10181), stage directory-based models to remote nodes (#10175), in-flight tracking for non-LLM methods - VAD, diarize, voice (#10238), reconciler survives frontend restarts (#9981), cross-replica OpCache sync (#9983), and the reinstall/upgrade UI no longer sticks on "reinstalling" (#10214).🎥 Video, Both Directions
Video input / understanding in
llama-cpp(#10216). Video-capable multimodal models (e.g. SmolVLM2-Video) can now be sent a video in a chat request, mirroring the existing image and audio paths. Tracks the upstream mtmd video landing (ggml-org/llama.cpp#24269);grpc-server.cppforwardsrequest->videos()into the mtmdfilesvector on both the template and non-template paths, and the React chat UI acceptsvideo/*, renders an inline<video controls>player, and emitsvideo_urlcontent parts.allow_videois auto-gated by whether the loaded mmproj supports it. ffmpeg/ffprobe (already in the runtime image) extract frames.Video generation via LTX-2 (#9980).
stablediffusion-ggmlwiresaudio_vae_pathandembeddings_connectors_paththrough to the upstream LTX-2 fields, with a newgallery/ltx-ggml.yamltemplate (T2V / I2V / FLF2V recipes) and six LTX-2.3 22B GGUF gallery entries (dev + distilled, UD-Q4_K_M / Q4_K_M / Q8_0), each bundling the text encoder + video VAE + audio VAE + embeddings connectors. Follow-up fixes wired thediffusion_modelflag andvae_decode_only:falsefor the i2v/flf2v paths (#9986, #9987) and muxed LTX-2 audio into the output MP4 (#9990).👁️ Native Object Detection + Segmentation:
rfdetr-cppA new Go native gRPC backend (#10028) dlopens
librfdetr.so(built from mudler/rf-detr.cpp) and exposes the RF-DETR pipeline through LocalAI'sDetectRPC. Supports all 5 detection variants (Nano…Large) and 3 segmentation variants (SegNano/SegSmall/SegMedium) at F32/F16/Q8_0/Q4_K, with 32 prebuilt GGUFs on HuggingFace. Detection returns bbox + class_name + confidence; segmentation adds per-detection PNG-encoded masks. Matches PyTorch on CPU (sub-pixel bbox match, mask IoU 0.99+), with an HF gallery importer that auto-routes GGUF repos to the native backend.🗣️ TTS: Per-Request Instructions & Params
The OpenAI-compatible
/v1/audio/speechinstructionsfield was silently dropped at the HTTP→gRPC boundary, so style/voice could only come from static YAML. PR #10172 plumbs a generic per-requestinstructionsstring and an optional backend-specificparamsmap end to end (proto, schema,core/backend/tts.go), unlocking per-line emotion/style (Qwen3-TTS CustomVoice, Chatterbox) and describe-a-voice (Qwen3-TTS VoiceDesign) from a single model config. Fully backward compatible - emptyinstructionsfalls back to YAML.Also: Qwen3-TTS request-language normalization for flexible matching (#10174), and LocalVQE v1.3 with input/output spectrogram views in the Audio Transform UI (#10113).
🧠 Reasoning & Tool-Call Streaming Hardening
A focused run of correctness fixes for reasoning models and streaming tool calls:
reasoning_efforthonored per request and forwarded to the backend so jinja models can act on it (#10082, #10184).<think>parsing: stop<think>leaking into content in pure-content mode (#9991), stop a prefilled<think>from swallowing tag-less answers (#10225), and don't auto-enable self-spec MTP for draft-only assistant GGUFs (#10208).💻
local-ai chat+ 📚 RAG Citations + 🛰️ Realtimelocal-ai chatcommand connects to a running server over the OpenAI-compatible API, streams completions, and supports/models,/model <name>,/clear,/exit. Keepslocal-ai runfocused on the server lifecycle. (Fixes #1535.)Sources:block listing the original documents - deduplicated per source, with the citation-free version saved to long-term memory. (Closes #9331.)LOCALAI_WEBRTC_NAT_1TO1_IPS/LOCALAI_WEBRTC_ICE_INTERFACESknobs fix/v1/realtimecalls dropping a few seconds in under Docker host networking (unroutabledocker0/vethcandidates)./api/operationspoller across UI consumers (#10029) and a React bundle code-split (#10042).🧩 Backend Capability Registration & Startup Speed
BackendCapabilities(#10107), and add face/speaker-recognition constants registeringinsightface+speaker-recognition(#10110).Click for the full changelog below!
What's Changed
Bug fixes 🐛
Exciting New Features 🎉
🧠 Models
📖 Documentation and examples
👒 Dependencies
3f40e73c367ad9f0c1b1819f28c7348c26aa340dby @localai-bhttps://github.com/mudler/LocalAI/pull/10097/10097ba00a8a88c4c5810a3d1fed6b7b8fa2b44b82fdcby @localai-bhttps://github.com/mudler/LocalAI/pull/10095/10095d2797b86670622b6538123b4aeb5fbb6be2653c5by @localai-bhttps://github.com/mudler/LocalAI/pull/10094/10094d6588daa800058dfa54f1d7ea695b1a810c8ae18by @localai-bhttps://github.com/mudler/LocalAI/pull/10093/10093cb45f68068081af01e7092e91b038ee353eb56beby @localai-bhttps://github.com/mudler/LocalAI/pull/10116/10116fe69461618ffc50ba8afa65c25cc6c6e34d4537fby @localai-bhttps://github.com/mudler/LocalAI/pull/10117/10117be65ac7511b30379b003626c15224798929e33d4by @localai-bhttps://github.com/mudler/LocalAI/pull/10118/10118399739d5c5978351f39e3454bfbfbab4f369088fby @localai-bhttps://github.com/mudler/LocalAI/pull/10119/1011923ee03506a91ac3d3f0071b40e66a430eebdfa1dby @localai-bhttps://github.com/mudler/LocalAI/pull/10130/101307948df8ac1070f5f6881b8d34675821893eb97d6by @localai-bhttps://github.com/mudler/LocalAI/pull/10127/101278a7c48209d7882a7ce79a6b306270e4703194543by @localai-bhttps://github.com/mudler/LocalAI/pull/10129/101295dcb71166686799f0d873eab7386234302d05ecfby @localai-bhttps://github.com/mudler/LocalAI/pull/10128/1012805e60432bcb5bc2113f8c395a41e86497c11504aby @localai-bhttps://github.com/mudler/LocalAI/pull/10115/101159edf17c3ada66e0f881dcff155492867db7ac4cfby @localai-bhttps://github.com/mudler/LocalAI/pull/10141/101412d40a8b2adcdf8b5b0ca0535f3bb7801b6ba13e5by @localai-bhttps://github.com/mudler/LocalAI/pull/10144/10144610e664ba7cfe3af46125ed1b5a1184fccb51bcdby @localai-bhttps://github.com/mudler/LocalAI/pull/10140/101405c394fdc8b564eff6faacc50a139529d875f0e36by @localai-bhttps://github.com/mudler/LocalAI/pull/10143/10143477c0e82e2699b35a65fd0a1ed6fe66b41087dfeby @localai-bhttps://github.com/mudler/LocalAI/pull/10142/1014294a220cd6745e6e3f8de62870b66fd5b9bc92700by @localai-bhttps://github.com/mudler/LocalAI/pull/10168/101681f9ee88e09c258053fa59d5e05e23dfb10fa0b13by @localai-bhttps://github.com/mudler/LocalAI/pull/10166/1016613d54e110e1538e0f0bc3af0680b9ab246cfb48dby @localai-bhttps://github.com/mudler/LocalAI/pull/10145/10145136e5d36c17083da0321fd96512dc7b263f94a44by @localai-bhttps://github.com/mudler/LocalAI/pull/10165/10165b11fe5bca78ad8b342dd559a43d76df3984bb447by @localai-bhttps://github.com/mudler/LocalAI/pull/10167/101671520eda980564241434b791ce2bbbd128c4be9eaby @localai-bhttps://github.com/mudler/LocalAI/pull/10180/101807c158fbb4aec1bdc9c81d6ca0e785139f4826faeby @localai-bhttps://github.com/mudler/LocalAI/pull/10179/1017999613cb720b65036237d44b52f753b51f75c2797by @localai-bhttps://github.com/mudler/LocalAI/pull/10178/101780.22.1by @localai-bhttps://github.com/mudler/LocalAI/pull/10188/10188843600590f96a31467a5199f827c253f34c110f7by @localai-bhttps://github.com/mudler/LocalAI/pull/10198/101986b9de3dbaa21ae95ea80638e5ee836795cc48c93by @localai-bhttps://github.com/mudler/LocalAI/pull/10190/10190abd0087dcc92ec5ad1f96f9fd86c49eb26a5ce67by @localai-bhttps://github.com/mudler/LocalAI/pull/10204/10204a8ec021f2750a473ff4a8f3883bc9fdf5feafa84by @localai-bhttps://github.com/mudler/LocalAI/pull/10202/1020231e82494c0a3913c919c1027fa70500fbf4c07ddby @localai-bhttps://github.com/mudler/LocalAI/pull/10191/10191e270af73b94c9a5c37ec516230219ed4580e1db6by @localai-bhttps://github.com/mudler/LocalAI/pull/10212/10212b3d56d0ba1bd437886079e339118e8e75bb79ee7by @localai-bhttps://github.com/mudler/LocalAI/pull/10211/102119e3b928fd8c9d14dbf15a8768b9fdd7e5c721d66by @localai-bhttps://github.com/mudler/LocalAI/pull/10210/10210c463029c205c2ec8d7ab6c0df4a3f52979091286by @localai-bhttps://github.com/mudler/LocalAI/pull/10189/10189f7838a306687f22c281d29c250f879a4ab3df2d7by @localai-bhttps://github.com/mudler/LocalAI/pull/10177/10177512d07cb08f234b704b5a5959aa9e2d4c466eeb0by @localai-bhttps://github.com/mudler/LocalAI/pull/10224/102242768b6251548b78b6610e95edad13f888ad95982by @localai-bhttps://github.com/mudler/LocalAI/pull/10219/1021919bdfe22d255d5b4dff39d449318b9bc5ea2317fby @localai-bhttps://github.com/mudler/LocalAI/pull/10222/1022297cad527d247edefc904e6c40c4cf5ee78bed055by @localai-bhttps://github.com/mudler/LocalAI/pull/10221/10221df7638d8229a243af8a4b5a8ae557e0d74e0a0aeby @localai-bhttps://github.com/mudler/LocalAI/pull/10220/10220e6f8112f3ba126eed3ff5b30cdd08085414a7516by @localai-bhttps://github.com/mudler/LocalAI/pull/10233/1023391bafb5acd5a6cf00b1e55ef68bf40ddd207bee7by @localai-bhttps://github.com/mudler/LocalAI/pull/10234/10234039e20a2db9e87b2477c76cc04905f3e1acad77fby @localai-bhttps://github.com/mudler/LocalAI/pull/10223/10223c29f6653a516a3001d923944dad8892072cc7334by @localai-bhttps://github.com/mudler/LocalAI/pull/10236/10236Other Changes
🙌 New Contributors
Enjoy!
Full Changelog: mudler/LocalAI@v4.3.0...v4.4.0
Configuration
📅 Schedule: (UTC)
🚦 Automerge: Enabled.
♻ Rebasing: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.
🔕 Ignore: Close this PR and you won't be reminded about this update again.
This PR has been generated by Renovate Bot.