(This issue is mostly for transparency, no actions should be taken by other maintainers or contributors)
Planned to address this issue along with the new mtmd_batch API
The process_chunk is called twice, which encode the same image twice (cc @am17an for visibility)
|
while (slot.prompt.n_tokens() < slot.task->n_tokens() && input_tokens[slot.prompt.n_tokens()] == LLAMA_TOKEN_NULL) { |
|
// process the image |
|
size_t n_tokens_out = 0; |
|
int32_t res = input_tokens.process_chunk(ctx_tgt, mctx, slot.prompt.n_tokens(), slot.prompt.tokens.pos_next(), slot.id, n_tokens_out); |
|
if (res != 0) { |
|
SLT_ERR(slot, "failed to process image, res = %d\n", res); |
|
send_error(slot, "failed to process image", ERROR_TYPE_SERVER); |
|
slot.release(); |
|
continue; |
|
} |
|
|
|
if (ctx_dft && llama_get_ctx_other(ctx_dft.get()) != ctx_tgt) { |
|
// TODO: in the future, figure out how to infuse target embeddings to the images |
|
// for now, we skip this for simplicity |
|
// maybe we simply need to call `common_speculative_process()` on the mtmd batches in the `process_chunk` above? |
|
// [TAG_MTMD_DRAFT_PROCESSING] |
|
res = input_tokens.process_chunk(ctx_dft.get(), mctx, slot.prompt.n_tokens(), slot.prompt.tokens.pos_next(), slot.id, n_tokens_out); |
|
if (res != 0) { |
|
GGML_ABORT("failed to process multi-modal data on draft context\n"); |
|
} |
|
} |
(This issue is mostly for transparency, no actions should be taken by other maintainers or contributors)
Planned to address this issue along with the new
mtmd_batchAPIThe
process_chunkis called twice, which encode the same image twice (cc @am17an for visibility)llama.cpp/tools/server/server-context.cpp
Lines 2979 to 2999 in 76da245