[Feature] Phi-4-MM support

### Update

Currently we have supported text, vision, and audio.

Repeated MMMU benchmark runs range between 53.6 - 55.5, consistent with the the benchmark reported in the original paper (55).

**Known limitations:** (See *Execution Plan* before for full list):


1. Token: Phi4MM supports two types of image token conventions (`<|image1|>` and `<|endoftext10|>`), currently we only support  the latter. If you use the default chat template, it will automatically pick up the supported one.
2. ~~Audio capabilities: currently we do not support audio at all.~~ Fixed with #8048 
3. ~~LoRA / Image quality: Phi4MM depends on LoRA for full image capability, but there is some compatibility issues with the native SGL LORA solution. We are working on solving it by refactoring / generalizing SGL LoRA capabilities.~~ Fixed with #6585, #6734, #6861)

### Motivation

Supporting the Phi4 Multimodal model (https://[huggingface.co/microsoft/Phi-4-multimodal-instruct](https://huggingface.co/microsoft/Phi-4-multimodal-instruct) in SGL. 

Execution Plan: 

- [x] Basic text + image support (@lifuhuang #6494 )
- [x] LoRA support (required for full image understanding capability): (@lifuhuang #6585 , #6734 , #6861 )
- [x] audio support (@byjiang1996  #8048)
- [x] perf optimization (@lifuhuang #6960 #6994)
- [x] SGLang LoRA compatibility with Radix Attention (@Fridge003 #7216 )
- [ ] (low priority) Precomputed feature support. 
- [ ] (low priority) Refactor SGL MM processor logic support for support the original token variable image token (e.g., `<image_1>`)



### Related resources

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Phi-4-MM support #6544

Update

Motivation

Related resources

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Feature] Phi-4-MM support #6544

Description

Update

Motivation

Related resources

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions