Skip to content

fix: Add --jinja flag to granite-vision-3.2 test#24323

Closed
gabe-l-hart wants to merge 1 commit into
ggml-org:masterfrom
gabe-l-hart:GraniteVisionMTMDTests
Closed

fix: Add --jinja flag to granite-vision-3.2 test#24323
gabe-l-hart wants to merge 1 commit into
ggml-org:masterfrom
gabe-l-hart:GraniteVisionMTMDTests

Conversation

@gabe-l-hart

Copy link
Copy Markdown
Collaborator

Overview

Add missing --jinja flag for mtmd/test.sh: #23545 (comment)

Branch: GraniteVisionMTMDTests
AI-usage: none
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
@gabe-l-hart gabe-l-hart requested a review from a team as a code owner June 8, 2026 22:20
@gabe-l-hart gabe-l-hart requested a review from ngxson June 8, 2026 22:21
@ngxson

ngxson commented Jun 9, 2026

Copy link
Copy Markdown
Collaborator

the response you mentioned doesn't seem quite right:

The Newspaper is a British newspaper.

the correct answer must contain the word "new york", that was the case for granite before the change ; if it doesn't contains "new york", the test won't pass

@gabe-l-hart

Copy link
Copy Markdown
Collaborator Author

@ngxson thanks for being diligent as always, and sorry to not give this better attention. I'll get to the bottom of it and make sure we've got a proper fix.

@gabe-l-hart

Copy link
Copy Markdown
Collaborator Author

Ok, this wasn't the actual root of the issue. I've confirmed that #24357 fixes the behavior:

$ ./build/bin/llama-mtmd-cli -hf ibm-research/granite-vision-3.2-2b-GGUF:Q4_K_M --image ./tools/mtmd/test-1.jpeg --temp 0 -n 128 --flash-attn on -p 'what is the publisher name of the newspaper?' 
0.02.853.058 I common_init_result: fitting params to device memory ...
0.02.853.075 I common_init_result: (for bugs during this step try to reproduce them with -fit off, or provide --verbose logs if the bug only occurs with -fit on)
0.03.546.024 I common_init_from_params: warming up the model with an empty run - please wait ... (--no-warmup to disable)
0.03.576.776 I mtmd_cli_context: chat template example:
<|system|>
You are a helpful assistant<|user|>
Hello<|assistant|>
Hi there<|user|>
How are you?<|assistant|>
0.03.578.991 W load_tensors: ffn up/down are swapped
0.09.578.116 I main: loading model: /Users/ghart/.cache/huggingface/hub/models--ibm-research--granite-vision-3.2-2b-GGUF/snapshots/f5bf90fdfd834d3ea56ce8bd506e94ab9b5bcd5f/granite-vision-3.2-2b-Q4_K_M.gguf
0.09.578.122 W WARN: This is an experimental CLI for testing multimodal capability.
0.09.578.123 W       For normal use cases, please use the standard llama-cli


the new york times

@gabe-l-hart gabe-l-hart closed this Jun 9, 2026
@gabe-l-hart gabe-l-hart deleted the GraniteVisionMTMDTests branch June 9, 2026 15:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants