Skip to content

model: LFM2-VL fixes#17577

Merged
ngxson merged 13 commits into
ggml-org:masterfrom
Liquid4All:tarek/feat/upstream_17290
Nov 30, 2025
Merged

model: LFM2-VL fixes#17577
ngxson merged 13 commits into
ggml-org:masterfrom
Liquid4All:tarek/feat/upstream_17290

Conversation

@tdakhran

Copy link
Copy Markdown
Contributor

Debugging of #17290 revealed multiple issues with LFM2-VL.

This PR fixes the following issues and makes the output of llama.cpp equivalent to PyTorch

  • surround image embeddings with <|image_start|> and <|image_end|> tokens
  • use round_by_factor to calculate target width and height in "smart resize".
  • stretch the image to the width and height calculated by smart resize instead of padding
  • place image embeddings before user prompt
  • resize positional embedding with antialiasing enabled

The central issue was the resizing of positional embeddings. Siglip2 implementation in PyTorch uses F.interpolate(..., mode="bilinear", align_corners=False, antialias=True). Antialiasing only contributes during downscaling. When the image width or height is less than 256, the scaling of positional embeddings in llama.cpp produced numerically different results from PyTorch.

A new flag, ' GGML_SCALE_FLAG_ANTIALIAS', has been added for the upscale function, with implementations for CPU and CUDA.
Now outputs match:
siglip_1024
PyTorch (fp32)

For the vision tower, LF2-M2-VL uses Sigilip2 NaFlex encoders to convert input images into token sequences. Two variants are implemented:

this PR (bin/llama-mtmd-cli -m $CKPT/LFM2-VL-1.6B-F32.gguf --mmproj $CKPT/mmproj-LFM2-VL-1.6B-F32.gguf -n 64 -t 4 --image /data/playground/issue_17290/siglip_1024.png -p "OCR." --temp 0.0 --top-k 1)

For the vision tower, LF2-M2-VL uses Sigilip2 NaFlex encoders to convert input images into token sequences. Two variants are implemented:

Comment thread ggml/src/ggml-cpu/ops.cpp
Comment thread ggml/src/ggml-cpu/ops.cpp Outdated
Comment thread ggml/src/ggml-cuda/upscale.cu Outdated
Comment thread ggml/src/ggml-cuda/upscale.cu Outdated
Comment thread tools/mtmd/clip.cpp
Comment thread tools/mtmd/mtmd.h Outdated
@github-actions github-actions Bot added testing Everything test related Nvidia GPU Issues specific to Nvidia GPUs examples ggml changes relating to the ggml tensor library for machine learning labels Nov 28, 2025

@ngxson ngxson left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we probably need to update ggml_backend_*_supports_op across backend, to avoid some backend fallback to the non-antialias kernel, which will result in wrong results.

For backend that does not support this mode, it will be fallback to CPU

@SmartestWashingMachine

SmartestWashingMachine commented Nov 28, 2025

Copy link
Copy Markdown
Contributor

On my end (CPU) the outputs of fp32 and bf16 450M looked good, tested on a variety of small images (< 16/32px one side).

Also checked a few personal tuned 1.6B Q3s (which should be more sensitive) and the outputs were great - it didn't go into a repetitive "breaking" state like before!

It couldn't have been easy to figure this issue out... Thank you guys for looking into this!

@tdakhran tdakhran force-pushed the tarek/feat/upstream_17290 branch from 50ba22e to 2385ecf Compare November 30, 2025 10:27
@github-actions github-actions Bot added Vulkan Issues specific to the Vulkan backend SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language Apple Metal https://en.wikipedia.org/wiki/Metal_(API) Ascend NPU issues specific to Ascend NPUs OpenCL Issues specific to the OpenCL backend labels Nov 30, 2025
@tdakhran

Copy link
Copy Markdown
Contributor Author

@SmartestWashingMachine, great to see it work for you!

@ngxson, I rolled back changes related to default marker placement, for now -p "<__media__>OCR." has to be explicitly specified for LFM2-VL for llama-mtmd-cli, otherwise incorrect output will be produced.

Added a fallback to CPU for backends as well.

@ngxson ngxson left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The mtmd changes look good to me. I'll also need a review on backend code (cpu/cuda) before merging this. CC @CISC @JohannesGaessler if you have a bit of time.

Comment thread tools/mtmd/clip.cpp

clip_image_u8 resized_img;
img_tool::resize(*img, resized_img, target_size, img_tool::RESIZE_ALGO_BILINEAR, true, pad_color);
const bool pad = (ctx->proj_type() != PROJECTOR_TYPE_LFM2);

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self-note: probably kimi-vl also need padding, to be verified later

@JohannesGaessler JohannesGaessler left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CUDA changes look good to me.

@ngxson ngxson merged commit 2ba7195 into ggml-org:master Nov 30, 2025
71 of 74 checks passed
@tdakhran tdakhran deleted the tarek/feat/upstream_17290 branch November 30, 2025 22:27
Anico2 added a commit to Anico2/llama.cpp that referenced this pull request Jan 15, 2026
* Adjust to pytorch

* Add antialiasing upscale

* Increase number of patches to 1024

* Handle default marker insertion for LFM2

* Switch to flag

* Reformat

* Cuda implementation of antialias kernel

* Change placement in ops.cpp

* consistent float literals

* Pad only for LFM2

* Address PR feedback

* Rollback default marker placement changes

* Fallback to CPU implementation for antialias implementation of upscale
blime4 referenced this pull request in blime4/llama.cpp Feb 5, 2026
* Adjust to pytorch

* Add antialiasing upscale

* Increase number of patches to 1024

* Handle default marker insertion for LFM2

* Switch to flag

* Reformat

* Cuda implementation of antialias kernel

* Change placement in ops.cpp

* consistent float literals

* Pad only for LFM2

* Address PR feedback

* Rollback default marker placement changes

* Fallback to CPU implementation for antialias implementation of upscale
Seunghhon pushed a commit to Seunghhon/llama.cpp that referenced this pull request Apr 26, 2026
* Adjust to pytorch

* Add antialiasing upscale

* Increase number of patches to 1024

* Handle default marker insertion for LFM2

* Switch to flag

* Reformat

* Cuda implementation of antialias kernel

* Change placement in ops.cpp

* consistent float literals

* Pad only for LFM2

* Address PR feedback

* Rollback default marker placement changes

* Fallback to CPU implementation for antialias implementation of upscale
ljubomirj pushed a commit to ljubomirj/llama.cpp that referenced this pull request May 6, 2026
* Adjust to pytorch

* Add antialiasing upscale

* Increase number of patches to 1024

* Handle default marker insertion for LFM2

* Switch to flag

* Reformat

* Cuda implementation of antialias kernel

* Change placement in ops.cpp

* consistent float literals

* Pad only for LFM2

* Address PR feedback

* Rollback default marker placement changes

* Fallback to CPU implementation for antialias implementation of upscale
my-other-github-account pushed a commit to my-other-github-account/llama.cpp that referenced this pull request May 15, 2026
* Adjust to pytorch

* Add antialiasing upscale

* Increase number of patches to 1024

* Handle default marker insertion for LFM2

* Switch to flag

* Reformat

* Cuda implementation of antialias kernel

* Change placement in ops.cpp

* consistent float literals

* Pad only for LFM2

* Address PR feedback

* Rollback default marker placement changes

* Fallback to CPU implementation for antialias implementation of upscale
my-other-github-account pushed a commit to my-other-github-account/llama.cpp that referenced this pull request May 15, 2026
* Adjust to pytorch

* Add antialiasing upscale

* Increase number of patches to 1024

* Handle default marker insertion for LFM2

* Switch to flag

* Reformat

* Cuda implementation of antialias kernel

* Change placement in ops.cpp

* consistent float literals

* Pad only for LFM2

* Address PR feedback

* Rollback default marker placement changes

* Fallback to CPU implementation for antialias implementation of upscale
phibya pushed a commit to ziee-ai/llama.cpp that referenced this pull request May 29, 2026
* Adjust to pytorch

* Add antialiasing upscale

* Increase number of patches to 1024

* Handle default marker insertion for LFM2

* Switch to flag

* Reformat

* Cuda implementation of antialias kernel

* Change placement in ops.cpp

* consistent float literals

* Pad only for LFM2

* Address PR feedback

* Rollback default marker placement changes

* Fallback to CPU implementation for antialias implementation of upscale
fewtarius pushed a commit to fewtarius/llama.cpp that referenced this pull request May 30, 2026
* Adjust to pytorch

* Add antialiasing upscale

* Increase number of patches to 1024

* Handle default marker insertion for LFM2

* Switch to flag

* Reformat

* Cuda implementation of antialias kernel

* Change placement in ops.cpp

* consistent float literals

* Pad only for LFM2

* Address PR feedback

* Rollback default marker placement changes

* Fallback to CPU implementation for antialias implementation of upscale
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Apple Metal https://en.wikipedia.org/wiki/Metal_(API) Ascend NPU issues specific to Ascend NPUs examples ggml changes relating to the ggml tensor library for machine learning Nvidia GPU Issues specific to Nvidia GPUs OpenCL Issues specific to the OpenCL backend SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language testing Everything test related Vulkan Issues specific to the Vulkan backend

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants