Skip to content

Set csgmv as default lora backend.#11488

Merged
Fridge003 merged 8 commits intomainfrom
lifu/default-lora-backend
Oct 16, 2025
Merged

Set csgmv as default lora backend.#11488
Fridge003 merged 8 commits intomainfrom
lifu/default-lora-backend

Conversation

@lifuhuang
Copy link
Copy Markdown
Collaborator

@lifuhuang lifuhuang commented Oct 12, 2025

Motivation

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

Summary by CodeRabbit

  • New Features

    • Added support for the “csgmv” LoRA backend and made it the default for server launches.
    • Expanded selectable LoRA backends to include “triton” and “csgmv”.
  • Refactor

    • Made the LoRA backend parameter optional across runners and sessions, allowing omission to use defaults.
  • Tests

    • Updated LoRA test utilities and scenarios to work with optional backend selection and the new default.
    • Simplified test invocations by removing hard-coded backend arguments.
  • Chores

    • Aligned configuration defaults to the new backend behavior.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello @lifuhuang, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request updates the system's default LoRA backend to 'csgmv', moving away from 'triton'. This change is reflected in the server's configuration and command-line arguments. Additionally, the test infrastructure has been adjusted to accommodate this new default, ensuring that tests either implicitly use 'csgmv' or allow for flexible backend selection.

Highlights

  • Default LoRA Backend Change: The default LoRA backend has been switched from 'triton' to 'csgmv' in both the server launch arguments and the core server configuration.
  • Test Suite Refactoring: Numerous LoRA-related test files have been updated to remove explicit 'triton' backend specifications, allowing them to utilize the newly set default 'csgmv' backend or to accept an optional backend parameter.
  • Backend Option Expansion: The list of available LoRA backends for testing purposes has been expanded to include 'csgmv' alongside 'triton'.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Oct 12, 2025

Walkthrough

Default LoRA backend changed from "triton" to "csgmv" in server defaults and CLI. Runner and test utilities now accept an optional lora_backend (None). Tests stop passing explicit backends and adjust helpers; backend list expanded to include "csgmv". One test retains a print referencing a removed backend variable.

Changes

Cohort / File(s) Summary
LoRA backend default update (server/CLI)
benchmark/lora/launch_server.py, python/sglang/srt/server_args.py
Set default lora backend from "triton" to "csgmv" in CLI arg and ServerArgs dataclass; no control-flow changes.
Runner and test session signatures
python/sglang/test/runners.py, test/srt/lora/test_lora_update.py
Make lora_backend Optional[str] with default None in SRTRunner and LoRAUpdateTestSessionBase constructors; types updated, behavior delegates to defaults.
Test utility updates
test/srt/lora/utils.py
Add Optional import; expand BACKENDS to ["triton", "csgmv"]; make backend parameter Optional with default None in run_lora_test_one_by_one.
Tests: remove explicit backend arguments
test/srt/lora/test_lora.py, test/srt/lora/test_lora_cuda_graph.py, test/srt/lora/test_lora_eviction.py, test/srt/lora/test_lora_radix_cache.py, test/srt/lora/test_lora_tp.py
Drop local backend="triton" variables and lora_backend/backend arguments in test invocations; rely on defaults; adjust log/strings accordingly.
Test: leftover reference
test/srt/lora/test_lora_qwen3.py
Removed backend variable and argument; print statement still references backend (now undefined).

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

A rabbit taps the keys with glee,
“From triton tides to csgmv!”
Tests hop light, no flags in tow,
Defaults guide the streams that flow.
One stray print—oops, where’d you be?
Carrot commits: consistency. 🥕✨

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Description Check ⚠️ Warning The pull request description only includes the template headings without any substantive content under Motivation, Modifications, Accuracy Tests, or Benchmarking and Profiling, and none of the checklist items have been completed. Please provide detailed descriptions for the Motivation and Modifications sections, include any accuracy test results or benchmarks if applicable, and check off the completed items in the checklist.
Docstring Coverage ⚠️ Warning Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (1 passed)
Check name Status Explanation
Title Check ✅ Passed The title succinctly describes the primary change of updating the default LoRA backend to "csgmv" and directly reflects the modifications made in the pull request.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch lifu/default-lora-backend

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4ac8e09 and 353e5534e5e8dc6e5b12bb01f91c6d41d880490e.

📒 Files selected for processing (11)
  • benchmark/lora/launch_server.py (1 hunks)
  • python/sglang/srt/server_args.py (1 hunks)
  • python/sglang/test/runners.py (1 hunks)
  • test/srt/lora/test_lora.py (2 hunks)
  • test/srt/lora/test_lora_cuda_graph.py (0 hunks)
  • test/srt/lora/test_lora_eviction.py (0 hunks)
  • test/srt/lora/test_lora_qwen3.py (0 hunks)
  • test/srt/lora/test_lora_radix_cache.py (0 hunks)
  • test/srt/lora/test_lora_tp.py (0 hunks)
  • test/srt/lora/test_lora_update.py (1 hunks)
  • test/srt/lora/utils.py (3 hunks)
💤 Files with no reviewable changes (5)
  • test/srt/lora/test_lora_cuda_graph.py
  • test/srt/lora/test_lora_radix_cache.py
  • test/srt/lora/test_lora_tp.py
  • test/srt/lora/test_lora_eviction.py
  • test/srt/lora/test_lora_qwen3.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (21)
  • GitHub Check: unit-test-backend-1-gpu-amd (linux-mi325-gpu-1, 9)
  • GitHub Check: unit-test-backend-1-gpu-amd (linux-mi325-gpu-1, 8)
  • GitHub Check: unit-test-backend-1-gpu-amd (linux-mi325-gpu-1, 11)
  • GitHub Check: unit-test-backend-1-gpu-amd (linux-mi325-gpu-1, 4)
  • GitHub Check: unit-test-backend-1-gpu-amd (linux-mi325-gpu-1, 10)
  • GitHub Check: unit-test-backend-1-gpu-amd (linux-mi325-gpu-1, 7)
  • GitHub Check: unit-test-backend-1-gpu-amd (linux-mi325-gpu-1, 6)
  • GitHub Check: unit-test-backend-1-gpu-amd (linux-mi325-gpu-1, 5)
  • GitHub Check: unit-test-backend-1-gpu-amd (linux-mi325-gpu-1, 3)
  • GitHub Check: unit-test-backend-1-gpu-amd (linux-mi325-gpu-1, 2)
  • GitHub Check: unit-test-backend-1-gpu-amd (linux-mi325-gpu-1, 0)
  • GitHub Check: unit-test-backend-1-gpu-amd (linux-mi325-gpu-1, 1)
  • GitHub Check: bench-test-2-gpu-amd (linux-mi325-gpu-2)
  • GitHub Check: unit-test-backend-2-gpu-amd (linux-mi325-gpu-2)
  • GitHub Check: performance-test-1-gpu-part-2-amd (linux-mi325-gpu-1)
  • GitHub Check: performance-test-1-gpu-part-1-amd (linux-mi325-gpu-1)
  • GitHub Check: unit-test-sgl-kernel-amd (linux-mi325-gpu-1)
  • GitHub Check: accuracy-test-2-gpu-amd (linux-mi325-gpu-2)
  • GitHub Check: accuracy-test-1-gpu-amd (linux-mi325-gpu-1)
  • GitHub Check: mla-test-1-gpu-amd (linux-mi325-gpu-1)
  • GitHub Check: lint

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the default LoRA backend from 'triton' to 'csgmv'. The changes are applied to server arguments, benchmarks, and test runners. Most tests are updated to remove the hardcoded 'triton' backend and rely on the new default.

I've found a critical issue in test/srt/lora/test_lora_qwen3.py where removing a variable without removing its usages will lead to a NameError, breaking the test.

Additionally, I've pointed out that with these changes, the 'triton' backend is no longer being tested. I've asked for clarification on whether this is intended and suggested either removing unused code or updating tests to cover both backends if needed.

Overall, the changes are aligned with the PR's goal, but the identified issues should be addressed.

Comment thread test/srt/lora/test_lora_qwen3.py
Comment thread test/srt/lora/utils.py
@@ -50,7 +50,7 @@ def __post_init__(self):


TORCH_DTYPES = [torch.float16]
BACKENDS = ["triton"]
BACKENDS = ["triton", "csgmv"]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This BACKENDS constant seems to be unused across the test files. With the removal of the hardcoded backend variable in the tests, they will now only run with the new default LoRA backend (csgmv). This means testing for the 'triton' backend is effectively removed.

If this is intended, this constant could be removed to avoid confusion. If both backends should be tested, the tests should be updated to iterate over this BACKENDS list.

@lifuhuang lifuhuang force-pushed the lifu/default-lora-backend branch from 353e553 to 269174b Compare October 12, 2025 07:11
@lifuhuang lifuhuang force-pushed the lifu/default-lora-backend branch from b0c6c74 to 91fb881 Compare October 12, 2025 08:42
@Fridge003 Fridge003 self-assigned this Oct 13, 2025
@lifuhuang lifuhuang changed the title [WIP] Set csgmv as default lora backend. Set csgmv as default lora backend. Oct 16, 2025
Copy link
Copy Markdown
Collaborator

@Fridge003 Fridge003 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Fridge003 Fridge003 merged commit b0d20cd into main Oct 16, 2025
122 of 134 checks passed
@Fridge003 Fridge003 deleted the lifu/default-lora-backend branch October 16, 2025 04:53
@zhyncs
Copy link
Copy Markdown
Collaborator

zhyncs commented Oct 16, 2025

@lifuhuang @Fridge003 this pr breaks lora unit tests on AMD
https://github.com/sgl-project/sglang/actions/runs/18546867193/job/52866524811

May you fix this in the follow up? Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants