Llama3.2 vision model support by hnyls2002 · Pull Request #1551 · sgl-project/sglang

hnyls2002 · 2024-10-01T06:45:43Z

Motivation

Support encoder-decoder architecture in SGLang.
Support llama vision model.
Support CUDA graph and prefix cache for llama vision model

Note that to support CUDA graph for encoder-decoder architecture like llama vision (mllama), we should make encoder_lens the part of the cuda graph, as the full_text_row_masked_out_mask is decided by encoder_lens to skip the text-only req in a mixed batch.

However, the current cuda graph backend (flashinfer) seems to have trouble handling the mixed batch. So we for now only accept the pure image decoding batch.

Todo in the following PRs:

Split attention backends: sliding_window, single_attention, cross_attention
Optimize encoder cache locations indexing, and reduce memory usage.

Modifications

Checklist

Format your code according to the Contributor Guide.
Add unit tests as outlined in the Contributor Guide.
Update documentation as needed, including docstrings or example tutorials.

hcyz33 · 2024-12-04T08:31:14Z

    def set_kv_buffer(
        self,
-        layer_id: int,
+        layer: RadixAttention,


Why is it necessary to change the data type from int to RadixAttention here?

hnyls2002 added 12 commits September 25, 2024 21:41

format toml

bf67d69

hf_config -> hf_text_config

9e072e5

copy code

3c9e99d

replace some components

265c202

make it compatible only with text model

3125f90

Merge branch 'main' into llama-3.2

8c5c713

Merge branch 'main' into llama-3.2

716a163

handle image inputs

3d8c592

Merge branch 'main' into llama-3.2

44ec303

align with main

2092617

Merge branch 'main' into llama-3.2

d4a8b2b

Merge branch 'main' into llama-3.2

bfb41c3

hnyls2002 marked this pull request as draft October 1, 2024 06:46

Merge branch 'main' into llama-3.2

7f23930

hnyls2002 force-pushed the llama-3.2 branch 2 times, most recently from 00cd46a to 2aebd9f Compare October 1, 2024 08:12

handle encoder_lens and wrapper init

64a5ebc

hnyls2002 force-pushed the llama-3.2 branch from 2aebd9f to 64a5ebc Compare October 1, 2024 08:20

antinucleon reviewed Oct 6, 2024

View reviewed changes

Comment thread python/pyproject.toml Outdated

merrymercy mentioned this pull request Oct 6, 2024

[Feature] add support for llama 3.2 #1523

Closed

5 tasks

merrymercy added the high priority label Oct 12, 2024

Merge branch 'main' into llama-3.2

490160c

hnyls2002 force-pushed the llama-3.2 branch from 8b063e0 to 490160c Compare October 13, 2024 22:11

fix encoder_lens and other bugs

8bca63e

hnyls2002 force-pushed the llama-3.2 branch from 4a02347 to 8bca63e Compare October 13, 2024 23:32

hnyls2002 added 3 commits October 14, 2024 05:24

handle unified input

2571c85

fix

d42a7dc

support llama3.2

25b6720

merrymercy mentioned this pull request Oct 17, 2024

Add GLM-4v Multimodal Model support for SGLang #1641

Closed

3 tasks

zhyncs mentioned this pull request Oct 17, 2024

Development Roadmap (2024 Q4) #1487

Closed

37 tasks

hnyls2002 added 7 commits October 20, 2024 21:39

support cuda graph

46693e3

fix qwen vl2

67c7d98

fix merge and filter batch

9db8e81

fix cuda graph

9edcfb0

fix

71be322

fix

100befd

Merge branch 'main' into llama-3.2

e306245

hnyls2002 marked this pull request as ready for review October 21, 2024 03:52

hnyls2002 added 10 commits October 21, 2024 04:00

fix

b9a33db

fix cached

2e5bc99

fix encoder_len fill value

2e1f1e1

add ci test

21f43ed

update

3cbc880

fix

fdec343

fix ci test assert

9c5fa9a

Merge branch 'main' into llama-3.2

a0a2776

fix backend update

9fe933d

fix sum

a1f1875

merrymercy reviewed Oct 21, 2024

View reviewed changes

Comment thread python/sglang/srt/layers/attention/triton_backend.py

Comment thread python/sglang/srt/mem_cache/memory_pool.py Outdated

Comment thread python/sglang/srt/models/qwen2_vl.py

reduce runtime if

f25fe9d

hnyls2002 requested review from ByronHsu, Ying1123, ispobock and zhyncs as code owners October 21, 2024 21:14

hnyls2002 merged commit 94cde10 into main Oct 21, 2024

hnyls2002 deleted the llama-3.2 branch October 21, 2024 22:01

hcyz33 reviewed Dec 4, 2024

View reviewed changes

zhaochenyang20 mentioned this pull request Mar 3, 2025

Development Roadmap (2025 H1) #4035

Closed

22 tasks

timethink pushed a commit to timethink/sglang that referenced this pull request Mar 9, 2025

Llama3.2 vision model support (sgl-project#1551)

afc6aaa

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Llama3.2 vision model support#1551

Llama3.2 vision model support#1551
hnyls2002 merged 48 commits intomainfrom
llama-3.2

hnyls2002 commented Oct 1, 2024 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

hcyz33 Dec 4, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

hnyls2002 commented Oct 1, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Checklist

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

hcyz33 Dec 4, 2024

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

hnyls2002 commented Oct 1, 2024 •

edited

Loading