mtmd: Expose helper_decode_image_chunk by mattjcly · Pull Request #13366 · ggml-org/llama.cpp

mattjcly · 2025-05-07T21:54:25Z

New API

Decoding-only helper

mtmd_helper_decode_image_chunk: Split out from mtmd_helper_eval_chunk_single. Same logic as before, but use as a standalone function enables clients to use mtmd_encode at some prior time, cache these embeddings, and then send them in later to mtmd_helper_decode_image_chunk to decode the embeddings without having to re-encode the image (expensive)

Edit: removed below APIs that were in original PR

Output embedding copy

mtmd_get_output_embd_copy: Allows client to embed with mtmd_encode, then get a copy of the embd to hold onto past the lifetime of the embeddings within the mtmd_context. Useful for caching these embedings, and sending into mtmd_helper_decode_image later

`mtmd_image_tokens` management functions

mtmd_image_tokens_copy: Allows clients to get a copy of mtmd_image_tokens from mtmd_input_chunk, for later use to send alongside pre-computed embeddings to mtmd_helper_decode_image.
mtmd_image_tokens_free: For use to free an mtmd_image_tokens *, as can be recieved from mtmd_image_tokens_copy
image_tokens_ptr(made public, existed privately in mtmd.cpp before): Enables auto memory management of mtmd_image_tokens *

@ngxson I'm thinking that maybe there's a way to avoid the need to expose new API for mtmd_image_tokens, since I feel like the statement "for later use to send alongside pre-computed embeddings" about mtmd_image_tokens_copy could potential be weak and the API of mtmd_helper_decode_image could be reworked not need this object in full? But it also seemed like the simplest conversion to enable decoupled embedding + decoding

…/free

tools/mtmd/mtmd.cpp

ngxson

I think this can be make more simple: In the application code, you can handle the embedding copy as I said. This way, you can even have a CPP struct with std::vector<float> which makes memory management much easier. The mtmd API already provided enough function allowing you to do that, so I think we should not extend it more.

A struct in your app could look like this:

struct my_image {
  std::vector<float> embeddings; // the encoded embeddings
  mtmd_input_chunk * chunk; // the chunk containing mtmd_image_tokens
}

tools/mtmd/mtmd.cpp

tools/mtmd/mtmd.h

ngxson

Nice, thanks! 💯 💯

Btw @mattjcly one nice-to-have thing that I'm thinking about, currently mtmd_helper_decode_image_chunk run non-stop while it actually support smaller batch under the hood.

This can lead to a poor UX where user hits "stop" button on the UI, but mtmd_helper_decode_image_chunk still tries to decode the whole image which may takes some extra seconds to finish.

I'm thinking about another version of mtmd_helper_decode_image_chunk (ofc will add it in another PR) which support interrupt-ability. I'm thinking about maybe exposing the i_batch and n_batch to the public API. Do you have any other ideas?

Edit: another idea could be to add a helper that does pre/post batch preparation, then you can llama_decode(prepared_image_batch) in the user code ; but still this may look quite cumbersome 😞

mattjcly · 2025-05-08T18:42:30Z

I'm thinking about another version of mtmd_helper_decode_image_chunk (ofc will add it in another PR) which support interrupt-ability. I'm thinking about maybe exposing the i_batch and n_batch to the public API. Do you have any other ideas?

I like this - I think that 1) having a point where the decoding can be stopped in between batches would be great 2) having a way to, as a user, get progress information during image decoding in the mutli-batch case (other than just the current log) would be great.

maybe exposing the i_batch and n_batch to the public API

Interesting. How would you envision this as the method of supporting interrupt-ability from the client-side? Just trying to understand more

ngxson · 2025-05-09T12:03:02Z

Interesting. How would you envision this as the method of supporting interrupt-ability from the client-side? Just trying to understand more

The most intuitive way to to provide to application code the notion of "a list of batches" instead of a one-do-all API call. A pseudo code looks like this:

list_batches = mtmd_generate_decode_batches()
for batch in list_batches:
  llama_decode(batch)

Then if you want the interrupt-ability:

list_batches = mtmd_generate_decode_batches()
for batch in list_batches:
  if check_user_interrupt():
    break  # stop the decode
  llama_decode(batch)

I'm thinking about this line, maybe this will be implemented as a cpp-only API to make it easier to manage batch allocation

* origin/master: (39 commits) server : vision support via libmtmd (ggml-org#12898) sycl : implementation of reordered Q4_0 MMVQ for Intel GPUs (ggml-org#12858) metal : optimize MoE for large batches (ggml-org#13388) CUDA: FA support for Deepseek (Ampere or newer) (ggml-org#13306) llama : do not crash if there is no CPU backend (ggml-org#13395) CUDA: fix crash on large batch size for MoE models (ggml-org#13384) imatrix : Add --parse-special for enabling parsing of special tokens in imatrix calculation (ggml-org#13389) llama-run: add support for downloading models from ModelScope (ggml-org#13370) mtmd : fix batch_view for m-rope (ggml-org#13397) llama : one-off chat template fix for Mistral-Small-2503 (ggml-org#13398) rpc : add rpc_msg_set_tensor_hash_req (ggml-org#13353) vulkan: Allow up to 4096 elements for mul_mat_id row_ids (ggml-org#13326) server : (webui) rename has_multimodal --> modalities (ggml-org#13393) ci : limit write permission to only the release step + fixes (ggml-org#13392) mtmd : Expose helper_decode_image_chunk (ggml-org#13366) server : (webui) fix a very small misalignment (ggml-org#13387) server : (webui) revamp the input area, plus many small UI improvements (ggml-org#13365) convert : support rope_scaling type and rope_type (ggml-org#13349) mtmd : fix the calculation of n_tokens for smolvlm (ggml-org#13381) context : allow cache-less context for embeddings (ggml-org#13108) ...

mtmd: Expose helper_decode_image, output_embd_copy, image_tokens_copy…

227e139

…/free

github-actions bot added the examples label May 7, 2025

ngxson reviewed May 7, 2025

View reviewed changes

tools/mtmd/mtmd.cpp Outdated Show resolved Hide resolved

ngxson reviewed May 7, 2025

View reviewed changes

tools/mtmd/mtmd.cpp Outdated Show resolved Hide resolved

Slim down

816a375

mattjcly requested a review from ngxson May 8, 2025 16:47

mattjcly changed the title ~~mtmd: Expose helper_decode_image, output_embd_copy, image_tokens_copy/free~~ mtmd: Expose helper_decode_image_chunk May 8, 2025

ngxson reviewed May 8, 2025

View reviewed changes

tools/mtmd/mtmd.cpp Outdated Show resolved Hide resolved

ngxson reviewed May 8, 2025

View reviewed changes

tools/mtmd/mtmd.h Outdated Show resolved Hide resolved

ngxson reviewed May 8, 2025

View reviewed changes

tools/mtmd/mtmd.h Show resolved Hide resolved

Cleanups

4d17bfc

mattjcly requested a review from ngxson May 8, 2025 17:17

ngxson approved these changes May 8, 2025

View reviewed changes

ngxson merged commit f05a6d7 into ggml-org:master May 8, 2025
44 checks passed

mattjcly deleted the mtmd-api-extension branch May 8, 2025 18:37

ngxson mentioned this pull request May 9, 2025

server : vision support via libmtmd #12898

Merged

8 tasks

ngxson mentioned this pull request May 10, 2025

mtmd : move helpers to dedicated file #13442

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mtmd: Expose helper_decode_image_chunk#13366

mtmd: Expose helper_decode_image_chunk#13366
ngxson merged 3 commits intoggml-org:masterfrom
mattjcly:mtmd-api-extension

mattjcly commented May 7, 2025 •

edited

Loading

Uh oh!

Uh oh!

ngxson left a comment •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ngxson left a comment •

edited

Loading

Uh oh!

Uh oh!

mattjcly commented May 8, 2025

Uh oh!

ngxson commented May 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

mattjcly commented May 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

New API

Decoding-only helper

Output embedding copy

mtmd_image_tokens management functions

Uh oh!

Uh oh!

ngxson left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ngxson left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mattjcly commented May 8, 2025

Uh oh!

ngxson commented May 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mattjcly commented May 7, 2025 •

edited

Loading

`mtmd_image_tokens` management functions

ngxson left a comment •

edited

Loading

ngxson left a comment •

edited

Loading