Skip to content

Add LFM2-VL (Liquid Foundation Model 2 Vision-Language) support#21230

Merged
mickqian merged 4 commits intosgl-project:mainfrom
tugot17:feature/lfm2-vl-upstream
Apr 4, 2026
Merged

Add LFM2-VL (Liquid Foundation Model 2 Vision-Language) support#21230
mickqian merged 4 commits intosgl-project:mainfrom
tugot17:feature/lfm2-vl-upstream

Conversation

@tugot17
Copy link
Copy Markdown
Contributor

@tugot17 tugot17 commented Mar 23, 2026

This PR adds support for the LFM2-VL vision-language architecture, combining a SigLip2 vision encoder (NaFlex variable-resolution) with the LFM2 hybrid language model.

Example model: LFM2.5-VL-1.6B

How to run

sglang serve --model-path LiquidAI/LFM2.5-VL-1.6B

# Also tested on TP=2
sglang serve --model-path LiquidAI/LFM2.5-VL-1.6B --tp 2

Tests

pytest test/registered/vlm/test_vision_openai_server_a.py::TestLfm2VlServer -v -s

===================== 5 passed, 2 warnings in 72.95s (0:01:12) =====================

Numerics

All 4 prompts pass with ROUGE-L 1.0 and low logprob divergence (HF vs SGLang, teacher forcing):

{
  "prompt": "What is in this image?",
  "image": "man ironing on taxi",
  "rouge_l": 1.0,
  "prefill_max_diff": 0.0141,
  "decode_max_diff": 0.0091
},
{
  "prompt": "Describe this image in detail but as a rap song.",
  "rouge_l": 1.0,
  "prefill_max_diff": 0.0141,
  "decode_max_diff": 0.0111
},
{
  "prompt": "What is in this image?",
  "image": "Statue of Liberty (high-res, 1807 prompt tokens)",
  "rouge_l": 1.0,
  "prefill_max_diff": 0.0326,
  "decode_max_diff": 0.0044
},
{
  "prompt": "Compare these two images. What do you see in each?",
  "image": "multi-image (2031 prompt tokens)",
  "rouge_l": 1.0,
  "prefill_max_diff": 0.0141,
  "decode_max_diff": 0.0109
}

Changes

New files:

  • python/sglang/srt/configs/lfm2_vl.py — VL config with hybrid cache support
  • python/sglang/srt/models/lfm2_vl.py — VL model (vision tower + projector + LFM2 LM)
  • python/sglang/srt/models/siglip2.py — SigLip2 vision encoder with NaFlex packed attention
  • python/sglang/srt/multimodal/processors/lfm2_vl.py — Image processor with variable-resolution tiling

Modified files:

  • python/sglang/srt/models/lfm2.py — Add get_input_embeddings(), rename inputs_embedsinput_embeds for multimodal compatibility
  • python/sglang/srt/configs/__init__.py — Register Lfm2VlConfig
  • python/sglang/srt/configs/model_config.py — Add to multimodal arch list
  • python/sglang/srt/model_executor/model_runner.py — Add hybrid model detection
  • test/registered/vlm/test_vision_openai_server_a.py — Add TestLfm2VlServer

Add support for the LFM2-VL vision-language architecture, combining a
SigLip2 vision encoder (NaFlex variable-resolution) with the LFM2
hybrid language model.

New files:
- configs/lfm2_vl.py: VL config with hybrid cache support
- models/lfm2_vl.py: VL model (vision tower + projector + LFM2 LM)
- models/siglip2.py: SigLip2 vision encoder with NaFlex packed attention
- multimodal/processors/lfm2_vl.py: image processor with variable-resolution tiling

Modified:
- models/lfm2.py: add get_input_embeddings(), rename inputs_embeds -> input_embeds
- configs/__init__.py, model_config.py, model_runner.py: register VL model
- test_vision_openai_server_a.py: add TestLfm2VlServer
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@github-actions github-actions Bot added documentation Improvements or additions to documentation Multi-modal multi-modal language model labels Mar 23, 2026
Comment thread python/sglang/srt/models/lfm2_vl.py Outdated
Comment thread python/sglang/srt/multimodal/processors/lfm2_vl.py Outdated
@ispobock
Copy link
Copy Markdown
Collaborator

@tugot17 could you address the comments?

@tugot17
Copy link
Copy Markdown
Contributor Author

tugot17 commented Mar 26, 2026

@ispobock yes I will go over this tomorrow

@ispobock
Copy link
Copy Markdown
Collaborator

@tugot17 Any update on this?

@tugot17
Copy link
Copy Markdown
Contributor Author

tugot17 commented Mar 30, 2026

WIP, should make an update today

@tugot17
Copy link
Copy Markdown
Contributor Author

tugot17 commented Mar 30, 2026

@mickqian Addressed both comments, my correctness tests pass. WDYT?

@mickqian
Copy link
Copy Markdown
Collaborator

/tag-and-rerun-ci

@tugot17
Copy link
Copy Markdown
Contributor Author

tugot17 commented Apr 1, 2026

@ispobock should I add anything more or could we merge it?

@ispobock
Copy link
Copy Markdown
Collaborator

ispobock commented Apr 2, 2026

we can merge it once ci tests passed

@tugot17
Copy link
Copy Markdown
Contributor Author

tugot17 commented Apr 3, 2026

@ispobock which of these fails in CI are concerning and should still be addrressed?

@mickqian mickqian merged commit b5e8c4b into sgl-project:main Apr 4, 2026
226 of 269 checks passed
sundar24295s pushed a commit to sundar24295s/sglang that referenced this pull request Apr 4, 2026
JustinTong0323 pushed a commit to JustinTong0323/sglang that referenced this pull request Apr 7, 2026
Fridge003 pushed a commit that referenced this pull request Apr 7, 2026
…21230)

Co-authored-by: Piotr Mazurek <piotr.mazurek@liquid.ai>
xiezhq-hermann pushed a commit to antgroup/sglang that referenced this pull request Apr 7, 2026
yhyang201 pushed a commit to yhyang201/sglang that referenced this pull request Apr 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation Multi-modal multi-modal language model run-ci

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants