model : add support for Phi4ForCausalLMV#20168
Conversation
…t) and matching HF NaFlex resize behavior in mtmd.
ngxson
left a comment
There was a problem hiding this comment.
I believe this can be simplify further. Phi-4 siglip is not very breakthrough in term of architecture, it reuse a lot of things already existing in the code base, so there is no need to add too many new code paths for it.
ngxson
left a comment
There was a problem hiding this comment.
I'm approving this PR because this model is too trivial to support.
I can be a bit harsh here, but I want to make it clear: I don't recommend contributions where the author cannot properly respond to trivial questions (proof: #20168 (comment) and #20168 (comment)). This shows the author put too little effort into their own work.
Many other contributors are willing to spend time understanding the code, even when it's AI-generated, and we welcome that kind of contribution. What we don't encourage is the type of PR where more than half of the work ends up being done by the reviewers.
Thanks for your honesty @ngxson. I was genuinely trying to help and I understand this is likely wasting your time more than if I didn't contribute to the project using AI to help me. |
|
Working now? |
Yes, still working here. |
* Add support for Phi4ForCausalLMV. * Fix Phi-4 vision parity (correcting SigLIP2 patch-kernel export layout) and matching HF NaFlex resize behavior in mtmd. * Rename contants + fix tokenizer label * Clean-ups. * Fix GGUF export. * Set tokenizer.ggml.pre explicitly. * Default vocab name rather than forcing it. * Clean-ups. * Fix indent. * Fix subscriptable error. * remov overcomplicated code path * Clean-ups. --------- Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
* 'master' of github.com:ggml-org/llama.cpp: (33 commits) convert : better mtp check and fix return [no ci] (ggml-org#20419) vulkan: fix SSM_CONV PP scaling with large ubatch sizes (ggml-org#20379) New conversations now auto-select the first loaded model (ggml-org#20403) ggml-virtgpu: Fix some build commands (ggml-org#20341) metal : avoid divisions in bin kernel (ggml-org#20426) ci: Setup self-hosted CI for Intel Linux Vulkan backend (ggml-org#20154) vulkan: fix l2_norm epsilon handling (ggml-org#20350) vulkan: fix OOB check in flash_attn_mask_opt (ggml-org#20296) vulkan: Fix ErrorOutOfHostMemory on Intel GPU when loading large models with --no-mmap (ggml-org#20059) opencl: use larger workgroup size for get_rows (ggml-org#20316) opencl: add cumsum op (ggml-org#18981) hip: compile debug builds with -O2 on hip to avoid a compiler bug (ggml-org#20392) common/parser: add GigaChatV3/3.1 models support (ggml-org#19931) model : add support for Phi4ForCausalLMV (ggml-org#20168) graph : add optional scale parameter to build_lora_mm [no ci] (ggml-org#20427) common : fix --n-cpu-moe, --cpu-moe for models with fused gate + up (ggml-org#20416) ggml-webgpu: Add supports for `GGML_OP_REPEAT` (ggml-org#20230) llama : enable chunked fused GDN path (ggml-org#20340) llama : whitespace cleanup (ggml-org#20422) ggml : add NVFP4 quantization type support (ggml-org#19769) ...
Add support for microsoft/Phi-4-reasoning-vision-15B.
This is reusing the existing Phi-3 text path for the decoder and adds a mmproj/mtmd path for Phi SigLIP2 vision.
I uploaded some converted weights for testing https://huggingface.co/dranger003/Phi-4-reasoning-vision-15B-GGUF.
The model generates coherent text and proper image descriptions, including OCR (see below).
DISCLAIMER: GPT-5.4 was used to help with this PR.
EDIT: More information about this model here
https://www.microsoft.com/en-us/research/blog/phi-4-reasoning-vision-and-the-lessons-of-training-a-multimodal-reasoning-model/