Skip to content

[diffusion] chore: support native attention for Mistral3 encoder#28176

Open
mickqian wants to merge 12 commits into
sgl-project:mainfrom
mickqian:codex/mistral3-native-attention-20260614
Open

[diffusion] chore: support native attention for Mistral3 encoder#28176
mickqian wants to merge 12 commits into
sgl-project:mainfrom
mickqian:codex/mistral3-native-attention-20260614

Conversation

@mickqian

@mickqian mickqian commented Jun 14, 2026

Copy link
Copy Markdown
Collaborator

Summary

  • Route Mistral3 text self-attention through LocalAttention with the torch SDPA backend constraint.
  • Use TP-aware QKVParallelLinear and RowParallelLinear for Mistral3 attention projections.
  • Load HF q_proj, k_proj, and v_proj checkpoint weights into fused qkv_proj shards and enable the SGLang forward context for Mistral3 text encoding.
  • Preserve the HF causal/sliding attention mask construction for Mistral3 while letting component accuracy transfer packed TP parameters through each parameter's weight_loader.

CI States

Latest PR Test (Base): ❌ Run #27526418476
Latest PR Test (Extra): ❌ Run #27526418386

@gemini-code-assist

Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@github-actions github-actions Bot added the diffusion SGLang Diffusion label Jun 14, 2026
@mickqian

Copy link
Copy Markdown
Collaborator Author

/tag-and-rerun-ci

@mickqian mickqian changed the title [Multimodal] Support native attention for Mistral3 encoder [diffusion] feat: support native attention for Mistral3 encoder Jun 14, 2026
@mickqian mickqian changed the title [diffusion] feat: support native attention for Mistral3 encoder [diffusion] chore: support native attention for Mistral3 encoder Jun 14, 2026
@mickqian

Copy link
Copy Markdown
Collaborator Author

/rerun-failed-ci

Copy link
Copy Markdown
Collaborator Author

/rerun-failed-ci

Copy link
Copy Markdown
Collaborator Author

/rerun-failed-ci

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

diffusion SGLang Diffusion run-ci

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant