model: add Janus Pro for image understanding by ravenouse · Pull Request #16906 · ggml-org/llama.cpp

ravenouse · 2025-10-31T23:24:02Z

This pull request introduces support for the Janus‑Pro 1B and Janus‑Pro 7B models within the llama.cpp framework.

The focus of this update is on image understanding (i.e., visual-input → textual or conceptual output).
Image generation is not covered by this PR.

Usage & Current Progress

Convert models to GGUF files:

# Convert the base Janus-Pro 1B model
python convert_hf_to_gguf.py deepseek-community/Janus-Pro-1B \
    --remote \
    --outfile janus-pro-1b-f16.gguf \
    --outtype f16

# Convert the mmproj component
python convert_hf_to_gguf.py deepseek-community/Janus-Pro-1B \
    --remote \
    --outfile mmproj-janus-pro-1b-f16.gguf \
    --outtype f16 \
    --mmproj

The converted GGUF files can be accessed here: https://huggingface.co/Ericwang/Janus-Pro-1B-GGUF

Run the model:

# Build the project:
cmake -B build
cmake --build build --target llama-mtmd-cli

./build/bin/llama-mtmd-cli \
    -m janus-pro-1b-f16.gguf \
    --mmproj mmproj-janus-pro-1b-f16.gguf \
    --chat-template deepseek

References

Janus-Pro 1B model card:
https://huggingface.co/deepseek-community/Janus-Pro-1B

Janus-Pro 7B model card:
https://huggingface.co/deepseek-community/Janus-Pro-7B

Configurations:
https://huggingface.co/deepseek-community/Janus-Pro-1B/blob/main/config.json
https://huggingface.co/deepseek-community/Janus-Pro-7B/blob/main/config.json

HF Implementation:
https://github.com/huggingface/transformers/tree/main/src/transformers/models/janus

ravenouse · 2025-10-31T23:37:40Z

Tested with this image: https://www.pixelstalk.net/wp-content/uploads/2016/04/Golden-retriever-dogs-high-definition-wallpapers.jpg

gguf-py/gguf/tensor_mapping.py

convert_hf_to_gguf.py

tools/mtmd/clip.cpp

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

Co-authored-by: Xuan-Son Nguyen <son@huggingface.co>

ravenouse · 2025-11-02T07:34:56Z

Hi @CISC and @ngxson ,

Thank you for the thorough review and valuable feedback.
I've addressed all the comments. I also re-ran the conversion and inference workflows, and both are working as expected.

Ready for another look when you have a moment. Thanks a lot!

tools/mtmd/clip.cpp

Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>

Co-authored-by: Xuan-Son Nguyen <son@huggingface.co>

ravenouse · 2025-11-02T18:31:41Z

Thanks again for the review!

Just updated the code and tested it again.

I've updated the code and retested it with the following image:
https://1.bp.blogspot.com/-tLB0HRLcOp4/Tj4Pvhsq6vI/AAAAAAAAAG8/h6ahy6g4GJI/s1600/Llama_lying_down.jpg

PS: The forced push was made to correct a formatting typo in the commit message.

ngxson

Looking good! Merging once the CI passes

CISC · 2025-11-02T19:22:22Z

Fix all the whitespace errors though. :)
https://github.com/ggml-org/llama.cpp/actions/runs/19016845485/job/54305935425?pr=16906

tools/mtmd/clip.cpp

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

ravenouse · 2025-11-03T16:02:53Z

Hi @CISC and @ngxson ,

Thank you so much for the quick review and support to finalize this PR. Truly appreciated!

* origin/master: (169 commits) opencl: support imrope (ggml-org#16914) fix: Viewing multiple PDF attachments (ggml-org#16974) model-conversion : pass config to from_pretrained (ggml-org#16963) server : add props.model_alias (ggml-org#16943) ggml: CUDA: add head size 72 for flash-attn (ggml-org#16962) mtmd: add --image-min/max-tokens (ggml-org#16921) mtmd: pad mask for qwen2.5vl (ggml-org#16954) ggml : LoongArch fixes (ggml-org#16958) sync: minja (glm 4.6 & minmax m2 templates) (ggml-org#16949) SYCL: optimized repeat_back kernel (3× fewer asm instructions, 2× faster)Feature/sycl repeat back opt (ggml-org#16869) feat(webui): improve LaTeX rendering with currency detection (ggml-org#16508) test-backend-ops : fix segfault in moe-expert-reduce test in support mode and coverage (ggml-org#16936) ci : disable failing riscv cross build (ggml-org#16952) model: add Janus Pro for image understanding (ggml-org#16906) clip : use FA (ggml-org#16837) server : support unified cache across slots (ggml-org#16736) common : move gpt-oss reasoning processing to init params (ggml-org#16937) docs: remove llama_sampler_accept reference in sampling sample usage (ggml-org#16920) CUDA: add FLOOR, CEIL, ROUND, TRUNC unary ops (ggml-org#16917) devops: fix failing s390x docker build (ggml-org#16918) ...

* model : Granite docling + Idefics3 preprocessing (SmolVLM) (ggml-org#16206) * feat: Add granite-docling conversion using trillion pretokenizer Branch: gabe-l-hart/GraniteDocling Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> * feat: Add granite-docling vocab pre enum Branch: gabe-l-hart/GraniteDocling Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> * fix: Use granite-docling pre Branch: gabe-l-hart/GraniteDocling Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> * feat: Add clip_is_idefics3 Branch: gabe-l-hart/GraniteDocling Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> * feat: Allow multi-token boundary sequences for image templating Branch: gabe-l-hart/GraniteDocling Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> * feat: Add tiling support for idefices3 in clip.cpp This should likely be moved into llava_uhd::get_slice_instructions, but for now this avoids disrupting the logic there. Branch: gabe-l-hart/GraniteDocling Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> * feat: Partial support for full templating for idefics3 in mtmd There are still errors encoding some of the image chunks, but the token sequence now matches transformers _almost_ perfectly, except for the double newline before the global image which shows up as two consecutive newline tokens instead of a single double-newline token. I think this is happening because the blocks are tokenized separately then concatenated. Branch: gabe-l-hart/GraniteDocling Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> * feat: Fully working image preprocessing for idefics3 w/ resize and slicing Branch: gabe-l-hart/GraniteDocling Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> * feat: Parse the preprocessor config's longest side and add it to the mmproj hparams Branch: GraniteDocling Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> * fix: Use the longest side instead of size * scale_factor For Granite Docling, these come out to the same value, but that was just a conicidence. Branch: GraniteDocling Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> * fix: Allow batch encoding and remove clip_is_idefics3 Branch: GraniteDocling Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> * refactor: Remove unnecessary conditionals for empty token vectors Branch: GraniteDocling Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> * refactor: Use image_manipulation util Branch: GraniteDocling Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> * add test model --------- Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> Co-authored-by: Xuan Son Nguyen <son@huggingface.co> # Conflicts: # convert_hf_to_gguf.py # convert_hf_to_gguf_update.py # gguf-py/gguf/constants.py # gguf-py/gguf/gguf_writer.py # src/llama-vocab.cpp # src/llama-vocab.h * mtmd : support home-cooked Mistral Small Omni (ggml-org#14928) * model : add LightOnOCR-1B model (ggml-org#16764) * model : add LightOnOCR-1B model * add test # Conflicts: # convert_hf_to_gguf.py # gguf-py/gguf/constants.py * mtmd : fix idefics3 preprocessing (ggml-org#16806) * mtmd : fix idefics3 preprocessing * disable granite test * fix test for granite * model: Add support for CogVLM model (ggml-org#15002) * Added GGUF mappings for CogVLM model * Add tensor mapping for CogVLM visual encoder * Add CogVLM to conversion script, no vision part yet * Added CogVLM vision model to conversion script * Add graph for CogVLM CLIP model * Add graph for CogVLM * Fixes for CogVLM. Now compiles. * Model now runs * Fixes for cogvlm graph * Account for graph context change after rebase * Changes for whitespace * Changes in convert script according to comments * Switch CogVLM LLM graph to merged QKV tensor * Use rope_type variable instead of direct definition * Change CogVLM CLIP encoder to use SWIGLU * Switch CogVLM CLIP to use merged QKV * Apply rebase edits and remove ggml_cont call that is now unnecessary * clean up --------- Co-authored-by: Xuan Son Nguyen <son@huggingface.co> # Conflicts: # convert_hf_to_gguf.py # examples/mtmd/clip.cpp # gguf-py/gguf/constants.py # gguf-py/gguf/tensor_mapping.py # src/llama-arch.cpp # src/llama-arch.h # src/llama-model.cpp # src/llama-model.h * mtmd: refactor preprocessing + support max/min pixels (ggml-org#16878) * mtmd: refactor preprocessing + support max/min pixels * fix mlp type * implement mix/max pixels * improve hparams * better image preproc for qwen * fix * fix out of bound composite * fix (2) * fix token calculation * get_merge_kernel_size() * fix llama4 and lfm2 * gonna fix them all * use simple resize for qwen * qwen: increase min tokens * no resize if dst size == src size * restore to initial min/max tokens value for qwen # Conflicts: # examples/mtmd/clip.cpp * clip : use FA (ggml-org#16837) * clip : use FA * cont : add warning about unsupported ops * implement "auto" mode for clip flash attn * clip : print more detailed op support info during warmup * cont : remove obsolete comment [no ci] * improve debugging message * trailing space * metal : remove stray return --------- Co-authored-by: Xuan Son Nguyen <son@huggingface.co> * model: add Janus Pro for image understanding (ggml-org#16906) * Add support for Janus Pro * Update gguf-py/gguf/tensor_mapping.py Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Update gguf-py/gguf/tensor_mapping.py Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Address reviewer suggestions Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Add JANUS_PRO constant * Update clip model handling Co-authored-by: Xuan-Son Nguyen <son@huggingface.co> * Update tools/mtmd/clip.cpp Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com> * Refactor JANUS_PRO handling in clip.cpp Co-authored-by: Xuan-Son Nguyen <son@huggingface.co> * Update tools/mtmd/clip.cpp Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * em whitespace --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> Co-authored-by: Xuan-Son Nguyen <son@huggingface.co> Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com> # Conflicts: # convert_hf_to_gguf.py # gguf-py/gguf/constants.py # gguf-py/gguf/tensor_mapping.py * mtmd: pad mask for qwen2.5vl (ggml-org#16954) * mtmd: pad mask for qwen2.5vl * improve * mtmd: add --image-min/max-tokens (ggml-org#16921) * mtmd: improve struct initialization (ggml-org#16981) * mtmd: allow QwenVL to process larger image by default (ggml-org#17020) * Disable flash attention * mtmd : fix embedding size for image input (ggml-org#17123) * mtmd: fix patch_size initialized to random value in audio models (ggml-org#17128) * mtmd: fix patch_size initialized to random value in audio models * add default hparams * add llama_model_n_embd_inp * Fix load qwen3 vl Change batch size * Add description * Fix cli build error --------- Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> Co-authored-by: Gabe Goodhart <ghart@us.ibm.com> Co-authored-by: Xuan Son Nguyen <son@huggingface.co> Co-authored-by: Tianyue-Zhao <zhaotianyue@outlook.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> Co-authored-by: Zhiyong Wang <85110830+ravenouse@users.noreply.github.com> Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com> Co-authored-by: firecoperana <firecoperana>

* Add support for Janus Pro * Update gguf-py/gguf/tensor_mapping.py Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Update gguf-py/gguf/tensor_mapping.py Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Address reviewer suggestions Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Add JANUS_PRO constant * Update clip model handling Co-authored-by: Xuan-Son Nguyen <son@huggingface.co> * Update tools/mtmd/clip.cpp Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com> * Refactor JANUS_PRO handling in clip.cpp Co-authored-by: Xuan-Son Nguyen <son@huggingface.co> * Update tools/mtmd/clip.cpp Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * em whitespace --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> Co-authored-by: Xuan-Son Nguyen <son@huggingface.co> Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>

Add support for Janus Pro

5471f50

ravenouse requested review from CISC and ngxson as code owners October 31, 2025 23:24

github-actions bot added examples python python script changes labels Oct 31, 2025

DajanaV mentioned this pull request Nov 1, 2025

UPSTREAM PR #16906: model: add Janus Pro for image understanding auroralabs-loci/llama.cpp#32

Closed

CISC reviewed Nov 1, 2025

View reviewed changes

gguf-py/gguf/tensor_mapping.py Outdated Show resolved Hide resolved

gguf-py/gguf/tensor_mapping.py Outdated Show resolved Hide resolved

convert_hf_to_gguf.py Outdated Show resolved Hide resolved

ngxson reviewed Nov 1, 2025

View reviewed changes

tools/mtmd/clip.cpp Outdated Show resolved Hide resolved

tools/mtmd/clip.cpp Outdated Show resolved Hide resolved

tools/mtmd/clip.cpp Outdated Show resolved Hide resolved

tools/mtmd/clip.cpp Outdated Show resolved Hide resolved

ravenouse and others added 5 commits November 1, 2025 09:05

Update gguf-py/gguf/tensor_mapping.py

01bd163

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

Update gguf-py/gguf/tensor_mapping.py

d6069df

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

Address reviewer suggestions

d92205e

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

Add JANUS_PRO constant

e260b0e

Update clip model handling

5794785

Co-authored-by: Xuan-Son Nguyen <son@huggingface.co>

ngxson reviewed Nov 2, 2025

View reviewed changes

tools/mtmd/clip.cpp Outdated Show resolved Hide resolved

tools/mtmd/clip.cpp Outdated Show resolved Hide resolved

tools/mtmd/clip.cpp Outdated Show resolved Hide resolved

ravenouse and others added 2 commits November 2, 2025 10:05

Update tools/mtmd/clip.cpp

9601dc8

Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>

Refactor JANUS_PRO handling in clip.cpp

38ff44f

Co-authored-by: Xuan-Son Nguyen <son@huggingface.co>

ravenouse force-pushed the januspro branch from 4ba66ee to 38ff44f Compare November 2, 2025 18:24

Merge branch 'master' into januspro

5b35faa

ngxson approved these changes Nov 2, 2025

View reviewed changes

CISC approved these changes Nov 2, 2025

View reviewed changes

CISC requested changes Nov 2, 2025

View reviewed changes

tools/mtmd/clip.cpp Show resolved Hide resolved

CISC reviewed Nov 2, 2025

View reviewed changes

tools/mtmd/clip.cpp Outdated Show resolved Hide resolved

ngxson and others added 2 commits November 2, 2025 21:14

Update tools/mtmd/clip.cpp

c06440f

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

em whitespace

63f7cf3

ngxson requested a review from CISC November 2, 2025 20:18

CISC approved these changes Nov 2, 2025

View reviewed changes

ngxson merged commit 6b9a524 into ggml-org:master Nov 2, 2025
75 of 79 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

model: add Janus Pro for image understanding#16906

model: add Janus Pro for image understanding#16906
ngxson merged 11 commits intoggml-org:masterfrom
ravenouse:januspro

ravenouse commented Oct 31, 2025 •

edited

Loading

Uh oh!

ravenouse commented Oct 31, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ravenouse commented Nov 2, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ravenouse commented Nov 2, 2025 •

edited

Loading

Uh oh!

ngxson left a comment

Uh oh!

CISC commented Nov 2, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ravenouse commented Nov 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ravenouse commented Oct 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Usage & Current Progress

References

Uh oh!

ravenouse commented Oct 31, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ravenouse commented Nov 2, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ravenouse commented Nov 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ngxson left a comment

Choose a reason for hiding this comment

Uh oh!

CISC commented Nov 2, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ravenouse commented Nov 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ravenouse commented Oct 31, 2025 •

edited

Loading

ravenouse commented Nov 2, 2025 •

edited

Loading