mtmd: refactor video subproc handling by ngxson · Pull Request #24316 · ggml-org/llama.cpp

ngxson · 2026-06-08T19:43:24Z

Overview

Refactor mtmd_helper_video, add a RAII wrapper subprocess_handle to make it a bit safer to work with

Requirements

I have read and agree with the contributing guidelines
AI usage disclosure: yes

mudler

I've tested and fixes also the issue that #24313 was fixing too. Thanks @ngxson

Upstream replaced the ad-hoc video stdin handling with a proper RAII refactor (ggml-org/llama.cpp#24316, "mtmd: refactor video subproc handling"), which includes the same `sp->stdin_file = nullptr` guard our patch added (plus join-before-destroy ordering). Re-pin LLAMA_VERSION to that branch head and drop patches/0001 - it's now redundant. Verified e2e with gemma-4-e2b-it-qat-q4_0: no crash, video frames decode and the model answers correctly (red clip -> "Red", blue -> "Blue"). NOTE: #24316 is not yet merged, so this pins to its branch-head commit (28ca1e60). Re-pin to the squash-merge commit on master once it lands, otherwise `git fetch` may lose the commit after the branch is deleted. Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* chore(llama-cpp): bump to 8f83d6c for mtmd video input support Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(llama-cpp): forward video input to mtmd (template + non-template paths) Wire request->videos() into grpc-server.cpp mirroring the existing image and audio handling: a video_data build + non-template files extraction, and input_video chat chunks on the tokenizer-template path. allow_video is auto-set at model load by the vendored upstream chat_params. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(ui): add video attachment support to the chat UI Mirror the image/audio attachment path for video: emit video_url content parts, accept video/* in the picker, keep video files as base64, show a film icon badge, and render attached video inline with a <video> player. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(llama-cpp): patch mtmd video stdin double-close (heap crash) Upstream mtmd video input (ggml-org/llama.cpp#24269) double-fcloses the ffmpeg/ffprobe stdin FILE: feed_stdin() fclose()s the FILE returned by subprocess_stdin() (which is sp->stdin_file), then subprocess_destroy() fclose()s the same pointer again -> heap corruption that aborts the backend on any base64 input_video request (the CLI --video file path is unaffected). Vendor a one-line fix (null sp->stdin_file after fclose) via prepare.sh's patches/ until upstream merges it. Verified e2e with gemma-4-e2b-it-qat-q4_0: video frames decode via ffmpeg and the model answers correctly (red clip -> 'Red', blue -> 'Blue'). Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * chore(llama-cpp): re-pin to upstream #24316, drop vendored stdin patch Upstream replaced the ad-hoc video stdin handling with a proper RAII refactor (ggml-org/llama.cpp#24316, "mtmd: refactor video subproc handling"), which includes the same `sp->stdin_file = nullptr` guard our patch added (plus join-before-destroy ordering). Re-pin LLAMA_VERSION to that branch head and drop patches/0001 - it's now redundant. Verified e2e with gemma-4-e2b-it-qat-q4_0: no crash, video frames decode and the model answers correctly (red clip -> "Red", blue -> "Blue"). NOTE: #24316 is not yet merged, so this pins to its branch-head commit (28ca1e60). Re-pin to the squash-merge commit on master once it lands, otherwise `git fetch` may lose the commit after the branch is deleted. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>

Noeda · 2026-06-09T01:15:16Z

            cmd.push_back("-ss");
            cmd.push_back(seek_buf);
        }



I think either in this PR or a follow-up PR you might want to try adding -nostdin to ffmpeg. The test clip in the repo test-3.mp4 encodes fine without it, but a longer clip I tested gets stuck midway, with ffmpegs just sitting there and nothing progressing.

I think might be because my clip is larger than the test clip. test-3.mp4 from llama.cpp repo is 647K but my clip is about 11 megabytes. Without -nostdin, test-3.mp4 still works fine but my larger test clip doesn't.

Suggested change

cmd.push_back("-nostdin");

https://ffmpeg.org/ffmpeg.html (search for -nostdin, no convenient anchor URL).

-stdin
Enable interaction on standard input. On by default unless standard input is used as an input. To explicitly disable interaction you need to specify -nostdin.

Disabling interaction on standard input is useful, for example, if ffmpeg is in the background process group. Roughly the same result can be achieved with ffmpeg ... < /dev/null but it requires a shell.

I am wondering if ffmpeg is being dumdum and not recognizing the cache:pipe:0 as "standard input is being used as source". Maybe something there tries to race reading stdin in some bad interaction way, didn't bother tracing syscalls or anything.

GitHub is rejecting the .mp4 I used for testing, so I uploaded it here: https://submarination.space/public/subnautica2_test_clip_2026_06_08.mp4 (~11mb, 42 seconds of audioless clip of some Subnautica 2 gameplay)

Confirm that I got the same problem with the clip, and the -nostdin fixed it

OPerepadia · 2026-06-09T09:32:09Z

I can confirm this PR fixed the corrupted double-linked list error that appeared when attaching a video

Co-authored-by: Mikko Juola <mikjuo@gmail.com>

* upstream/HEAD: (329 commits) vendor : update LibreSSL to 4.3.2 (ggml-org#24397) Remove padding and multiple D2D copies for MTP (ggml-org#24086) chat: fix LFM2/LFM2.5 ignoring json_schema (ggml-org#24377) CUDA: Fix ssm_scan_f32 data-races (ggml-org#24360) ci : bump komac version (ggml-org#24396) speculative : fix "ngram-map-k4v" name in logging (ggml-org#24253) webui: implement pinned conversations support (ggml-org#21387) graph: Fix granite speech model inference by applying embedding scale when deepstack is not used (ggml-org#24357) ci : fix windows release (ggml-org#24369) ui: add opt-in run_javascript frontend tool (ggml-org#24244) mtmd: build_vit batching (ggml-org#24352) vulkan: reduce iq1 shared memory usage for mul_mm (ggml-org#24287) vulkan: add `v_dot2_f32_f16` support in matrix-matrix multiplication and Flash Attention (ggml-org#24123) ui: Fix excessive style recalculation on hover (ggml-org#24243) mtmd: refactor video subproc handling (ggml-org#24316) server: log prompts to directory (ggml-org#22031) ui: fix mobile chat form overflow and bust stale bundle cache (ggml-org#24158) ggml : add GGML_OP_COL2IM_1D (ggml-org#24206) server : do not clear slots without unified KV cache (ggml-org#24190) models : fix plamo2 attention_key/value_length regression (ggml-org#24317) ...

mtmd: refactor video subproc handling

28ca1e6

ngxson requested a review from a team as a code owner June 8, 2026 19:43

github-actions Bot added the examples label Jun 8, 2026

ngxson mentioned this pull request Jun 8, 2026

mtmd: fix double-close of ffmpeg/ffprobe stdin in video helper #24313

Closed

CISC approved these changes Jun 8, 2026

View reviewed changes

mudler approved these changes Jun 8, 2026

View reviewed changes

localai-bot mentioned this pull request Jun 8, 2026

feat(llama-cpp): video input support (mtmd #24269) mudler/LocalAI#10216

Merged

ngxson added the merge ready A maintainer can use this label to indicate that they consider the changes final and ready to merge. label Jun 8, 2026

Noeda reviewed Jun 9, 2026

View reviewed changes

ggerganov approved these changes Jun 9, 2026

View reviewed changes

ngxson removed the merge ready A maintainer can use this label to indicate that they consider the changes final and ready to merge. label Jun 9, 2026

Update tools/mtmd/mtmd-helper.cpp

79e8a4d

Co-authored-by: Mikko Juola <mikjuo@gmail.com>

ngxson requested review from CISC and ggerganov June 9, 2026 09:52

ggerganov merged commit 9682e35 into master Jun 9, 2026
25 checks passed

thomas-0816 mentioned this pull request Jun 9, 2026

Eval bug: Video transcribe crash, Qwen3-Omni-30B-A3B #24322

Closed

warshanks mentioned this pull request Jun 10, 2026

Misc. bug: mtmd video input hangs on Windows — probe() deadlocks on faststart MP4, decode emits 0 frames when MOOV at end #24429

Open

ngxson deleted the xsn/video_refactor_subproc branch June 13, 2026 12:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mtmd: refactor video subproc handling#24316

mtmd: refactor video subproc handling#24316
ggerganov merged 2 commits into
masterfrom
xsn/video_refactor_subproc

ngxson commented Jun 8, 2026

Uh oh!

mudler left a comment

Uh oh!

Noeda Jun 9, 2026

Uh oh!

ngxson Jun 9, 2026

Uh oh!

OPerepadia commented Jun 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Conversation

ngxson commented Jun 8, 2026

Overview

Requirements

Uh oh!

mudler left a comment

Choose a reason for hiding this comment

Uh oh!

Noeda Jun 9, 2026

Choose a reason for hiding this comment

Uh oh!

ngxson Jun 9, 2026

Choose a reason for hiding this comment

Uh oh!

OPerepadia commented Jun 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants