feat: add sageattention by NanoCode012 · Pull Request #2823 · axolotl-ai-cloud/axolotl

NanoCode012 · 2025-06-23T10:08:36Z

Description

Adds SageAttention https://github.com/thu-ml/SageAttention/

Since it has similar interface as sdpa_attention, I used that implementation and flash attention in transformers to cross check.

Motivation and Context

How has this been tested?

No test yet!

Screenshots (if appropriate)

Types of changes

Social Handles (Optional)

Summary by CodeRabbit

New Features
- Added support for SageAttention as a selectable attention mechanism.
- Introduced a configuration option to enable SageAttention.
Bug Fixes
- Added validation to prevent enabling SageAttention alongside unsupported sample packing.
- Added validation to enforce hardware compatibility for SageAttention.
Documentation
- Updated configuration descriptions with references to SageAttention for user guidance.

coderabbitai · 2025-06-23T10:08:41Z

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

📝 Walkthrough

Walkthrough

Support for SageAttention, a new attention implementation, has been integrated. This includes configuration schema updates, a monkeypatch for Hugging Face transformers, conditional patch application logic, and internal model loader changes to select SageAttention. Validation prevents incompatible use with sample packing and enforces GPU compute capability requirements. No public APIs were changed; all modifications are internal or configuration-related.

Changes

File(s)	Change Summary
src/axolotl/utils/schemas/config.py	Added `sage_attention` config option and validators to disallow sample packing with SageAttention and to enforce GPU compute capability.
src/axolotl/loaders/model.py	Updated internal logic to support `"sage_attention"` as an attention implementation option in `_set_attention_config`.
src/axolotl/loaders/patch_manager.py	Added `_apply_sageattn_patches` method to conditionally apply SageAttention patch during pre-model load patching.
src/axolotl/monkeypatch/attention/sageattn.py	New file implementing SageAttention monkeypatch integration with transformers, including forward and patch functions.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Config
    participant ModelLoader
    participant PatchManager
    participant Transformers
    participant SageAttention

    User->>Config: Set sage_attention=True
    Config->>Config: Validate config (disallow sample_packing + sage_attention, check GPU capability)
    ModelLoader->>Config: Read sage_attention flag
    ModelLoader->>PatchManager: Apply SageAttention patch if enabled
    PatchManager->>Transformers: Register sage_attention_forward
    ModelLoader->>Transformers: Set attn_implementation="sage_attention"
    Transformers->>SageAttention: Use SageAttention for attention calls

Poem

In the warren, code hops anew,
SageAttention joins the view!
If you pack your samples tight,
Sage will say, "That’s not right!"
GPUs must meet the call,
Or SageAttention won’t run at all.
So patch and load with rabbit glee—
Smarter models, soon you’ll see!
🥕✨

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch feat/sageattention

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

src/axolotl/monkeypatch/attention/sageattn.py (1)
41-112: Well-implemented attention forward function with minor style improvements.

The function correctly handles SageAttention's limitations, GQA/MQA support, and tensor layout transformations. The extensive validation ensures clear error messages for unsupported features.

Apply these minor style improvements suggested by static analysis:
     if (
         kwargs.get("output_attentions", False)
-        or kwargs.get("head_mask", None) is not None
+        or kwargs.get("head_mask") is not None
     ):

-    if kwargs.get("position_ids", None) is not None:
+    if kwargs.get("position_ids") is not None:

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0494359 and 2d130f5.

📒 Files selected for processing (4)

src/axolotl/loaders/model.py (1 hunks)
src/axolotl/loaders/patch_manager.py (1 hunks)
src/axolotl/monkeypatch/attention/sageattn.py (1 hunks)
src/axolotl/utils/schemas/config.py (2 hunks)

🧰 Additional context used

🪛 Ruff (0.11.9)

src/axolotl/monkeypatch/attention/sageattn.py

60-60: Use kwargs.get("head_mask") instead of kwargs.get("head_mask", None)

Replace kwargs.get("head_mask", None) with kwargs.get("head_mask")

(SIM910)

77-77: Use kwargs.get("position_ids") instead of kwargs.get("position_ids", None)

Replace kwargs.get("position_ids", None) with kwargs.get("position_ids")

(SIM910)

⏰ Context from checks skipped due to timeout of 90000ms (9)

GitHub Check: preview
GitHub Check: PyTest from Source Dist (3.11, 2.5.1)
GitHub Check: PyTest from Source Dist (3.11, 2.6.0)
GitHub Check: PyTest from Source Dist (3.11, 2.7.1)
GitHub Check: PyTest (3.11, 2.5.1)
GitHub Check: pre-commit
GitHub Check: PyTest (3.11, 2.6.0)
GitHub Check: PyTest (3.11, 2.7.1)
GitHub Check: pre-commit

🔇 Additional comments (5)

src/axolotl/loaders/model.py (1)

550-554: LGTM! Consistent attention implementation pattern.

The SageAttention integration follows the same pattern as other attention implementations and correctly sets both the model kwargs and config attributes.

src/axolotl/utils/schemas/config.py (2)

497-502: LGTM! Well-documented configuration field.

The SageAttention configuration field follows the established pattern and includes a helpful description with a link to the source repository.

886-894: Good validation logic for incompatible features.

The validator correctly prevents using SageAttention with sample packing, which aligns with the current limitations documented in the monkeypatch implementation.

src/axolotl/monkeypatch/attention/sageattn.py (2)

17-39: LGTM! Robust availability checking and import logic.

The implementation properly handles the optional dependency with clear error messages and appropriate fallback behavior.

115-122: LGTM! Clean patch registration.

The patch function properly checks for SageAttention availability and registers the forward function with transformers.

github-actions · 2025-06-23T10:13:55Z

🚀 Deployed on https://68639693b7bf32fef17b6dbd--resonant-treacle-0fd729.netlify.app

codecov · 2025-06-23T10:19:13Z

Codecov Report

❌ Patch coverage is 21.05263% with 75 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
src/axolotl/monkeypatch/attention/sage_attn.py	0.00%	67 Missing ⚠️
src/axolotl/utils/schemas/config.py	80.00%	4 Missing ⚠️
src/axolotl/loaders/model.py	33.33%	2 Missing ⚠️
src/axolotl/loaders/patch_manager.py	60.00%	2 Missing ⚠️

📢 Thoughts on this report? Let us know!

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (2)

src/axolotl/monkeypatch/attention/sageattn.py (2)
32-39: Consider making installation instructions more maintainable.

The hardcoded commit hash in the installation command may become outdated. Consider either:

Using a version tag instead of a commit hash

Adding a comment to remind maintainers to keep this updated

Referring to the official installation documentation
-            "`pip install git+https://github.com/thu-ml/SageAttention.git@1718ddc06dbc694bcf3c6b49ac28c1921aa2d8bd`"
+            "`pip install git+https://github.com/thu-ml/SageAttention.git` or follow installation instructions at https://github.com/thu-ml/SageAttention/"
58-82: Address static analysis hints and document significant limitations.

The error handling is thorough, but there are two style improvements suggested by static analysis tools, and the limitations should be prominently documented.

Apply these style improvements:
-        or kwargs.get("head_mask", None) is not None
+        or kwargs.get("head_mask") is not None
-    if kwargs.get("position_ids", None) is not None:
+    if kwargs.get("position_ids") is not None:
Important: The lack of support for attention_mask and position_ids significantly limits this integration's applicability. Consider adding a prominent warning in the docstring about these constraints.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9a269c4 and 67bc55b.

📒 Files selected for processing (1)

src/axolotl/monkeypatch/attention/sageattn.py (1 hunks)

🧰 Additional context used

🪛 Ruff (0.11.9)

src/axolotl/monkeypatch/attention/sageattn.py

60-60: Use kwargs.get("head_mask") instead of kwargs.get("head_mask", None)

Replace kwargs.get("head_mask", None) with kwargs.get("head_mask")

(SIM910)

77-77: Use kwargs.get("position_ids") instead of kwargs.get("position_ids", None)

Replace kwargs.get("position_ids", None) with kwargs.get("position_ids")

(SIM910)

⏰ Context from checks skipped due to timeout of 90000ms (9)

GitHub Check: PyTest from Source Dist (3.11, 2.7.1)
GitHub Check: PyTest (3.11, 2.6.0)
GitHub Check: PyTest from Source Dist (3.11, 2.6.0)
GitHub Check: PyTest from Source Dist (3.11, 2.5.1)
GitHub Check: pre-commit
GitHub Check: PyTest (3.11, 2.5.1)
GitHub Check: PyTest (3.11, 2.7.1)
GitHub Check: pre-commit
GitHub Check: preview

🔇 Additional comments (3)

src/axolotl/monkeypatch/attention/sageattn.py (3)

1-30: LGTM! Well-structured conditional import pattern.

The import section properly handles the optional SageAttention dependency with clear documentation and a standard conditional import pattern.

115-123: LGTM! Proper integration with transformers attention registry.

The patch function correctly registers SageAttention with transformers' global attention function registry following the established pattern.

83-112: Verify causal mask inference logic and confirm tensor layout assumptions.

The GQA/MQA handling and tensor operations look correct, but the causal mask inference should be verified.

Please verify that the causal mask inference logic matches transformers' behavior:
#!/bin/bash
# Search for similar causal mask inference patterns in transformers codebase
rg -A 5 -B 5 "is_causal.*query.*shape" --type py
rg -A 5 -B 5 "getattr.*is_causal" --type py
The tensor layout conversion from "HND" (batch, heads, seq_len, dim) to transformers format (batch, seq_len, heads, dim) using transpose(1, 2) appears mathematically correct.

winglian · 2025-06-27T15:36:56Z

@NanoCode012 what's the sage vs flash attn VRAM usage?

NanoCode012 · 2025-06-27T17:17:03Z

@winglian , weirdly not getting vram savings as in benchmarks.

Current early wandb result show that: about 20% faster with same vram usage. However, kernel benchmarking showed it using less vram (when <32k context at least).

More runs needs to be done still.

NanoCode012 · 2025-11-11T03:35:11Z

Updated PR from main and added more validation/docs on attention. It is a bit faster than FA for adapter mode.

I added warning that this is not recommended for FFT due to unstable loss. I did not add test as I didn't want to install another module by default.

Edit Feb 2026: We will merge this PR as it is an external dependency to let users try it out. We understand that the metrics are not stable and are always open for help fixing it.

github-actions · 2025-11-11T03:40:01Z

📖 Documentation Preview: https://6912b00334daf4d5637097f2--resonant-treacle-0fd729.netlify.app

Deployed on Netlify from commit 12a2f62

feat: add sageattention

2d130f5

coderabbitai Bot reviewed Jun 23, 2025

View reviewed changes

Comment thread src/axolotl/loaders/patch_manager.py

github-actions Bot temporarily deployed to preview June 23, 2025 10:13 Inactive

feat: call path on pre model load

9a269c4

github-actions Bot temporarily deployed to preview June 23, 2025 11:00 Inactive

NanoCode012 added 2 commits June 23, 2025 18:10

fix: patch to use register to correct var

67bc55b

fix: add strict check import at start

2800228

coderabbitai Bot reviewed Jun 23, 2025

View reviewed changes

github-actions Bot temporarily deployed to preview June 23, 2025 11:16 Inactive

NanoCode012 added 2 commits June 23, 2025 18:18

chore: fix comments

a1bd892

chore: refactor

627677c

github-actions Bot temporarily deployed to preview June 23, 2025 11:26 Inactive

feat: add capability check

288ade9

github-actions Bot temporarily deployed to preview June 24, 2025 04:21 Inactive

fix: missed underscore

7f8ba58

NanoCode012 marked this pull request as draft June 24, 2025 11:48

fix: let sageattention use FA backend in transformers

6457968

github-actions Bot temporarily deployed to preview June 25, 2025 02:33 Inactive

feat: update sage attention for attention mask and position ids

0e7e708

github-actions Bot temporarily deployed to preview June 25, 2025 11:45 Inactive

NanoCode012 added 2 commits June 27, 2025 10:45

feat: allow sample packing but add warning without packing

4643d99

fix: loss hitting 0 with packing and attention mask note

9e8230f

github-actions Bot temporarily deployed to preview June 27, 2025 03:52 Inactive

Merge branch 'main' into feat/sageattention

2c27e7f

github-actions Bot temporarily deployed to preview July 1, 2025 04:50 Inactive

feat: downcast embeds if sage attention too

59af435

github-actions Bot temporarily deployed to preview July 1, 2025 08:04 Inactive

NanoCode012 mentioned this pull request Jul 1, 2025

[ COLLAB ] Integration into Axolotl framework thu-ml/SageAttention#198

Open

NanoCode012 added 4 commits November 11, 2025 09:16

Merge branch 'main' into feat/sageattention

8fcba96

feat: add config validation

faef079

feat: add attention docs

6c62fa2

chore: docs

12a2f62

NanoCode012 marked this pull request as ready for review November 11, 2025 03:33

NanoCode012 requested a review from salmanmohammadi November 11, 2025 03:35

NanoCode012 added the ready to merge label Nov 11, 2025

NanoCode012 merged commit fcc4cfd into main Feb 10, 2026
26 of 27 checks passed

NanoCode012 deleted the feat/sageattention branch February 10, 2026 10:49

Uh oh!

Conversation

NanoCode012 commented Jun 23, 2025 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Motivation and Context

How has this been tested?

Screenshots (if appropriate)

Types of changes

Social Handles (Optional)

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Walkthrough

Changes

Sequence Diagram(s)

Poem

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions Bot commented Jun 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov Bot commented Jun 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

winglian commented Jun 27, 2025

Uh oh!

NanoCode012 commented Jun 27, 2025

Uh oh!

NanoCode012 commented Nov 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Nov 11, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

NanoCode012 commented Jun 23, 2025 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 23, 2025 •

edited

Loading

github-actions Bot commented Jun 23, 2025 •

edited

Loading

codecov Bot commented Jun 23, 2025 •

edited

Loading

NanoCode012 commented Nov 11, 2025 •

edited

Loading