Revert "Set csgmv as default lora backend. (#11488)" by zhyncs · Pull Request #11735 · sgl-project/sglang

zhyncs · 2025-10-16T22:09:59Z

This reverts commit b0d20cd.

Motivation

We can close this pull request once the issue has been resolved. I am using this PR for testing purposes, and if the issue is not fixed today, I will merge it first.

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.

This reverts commit b0d20cd.

gemini-code-assist · 2025-10-16T22:10:18Z

Summary of Changes

Hello @zhyncs, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request is a temporary measure to revert the default LoRA backend from 'csgmv' back to 'triton'. This change is being made for testing purposes to address an identified issue, with the intention to merge it if the underlying problem is not resolved promptly. The primary impact is that 'triton' will once again be the standard LoRA backend used by default throughout the system and its associated tests.

Highlights

Revert Default LoRA Backend: This pull request reverts the change that previously set 'csgmv' as the default LoRA backend, restoring 'triton' as the default option across server arguments, benchmark scripts, and test runners.
Test Configuration Updates: All relevant LoRA test files have been updated to explicitly use 'triton' as the backend or to reflect the removal of 'csgmv' from the list of available backends, ensuring consistency with the reverted default.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request reverts the default LoRA backend from csgmv to triton across the codebase, including server arguments, benchmarks, and test configurations. The changes appear correct and consistent with the goal of the revert.

I've added a few suggestions in the test files to improve maintainability. Instead of hardcoding the triton backend, the tests could iterate over a list of supported backends defined in test/srt/lora/utils.py. This would make it easier to re-enable tests for csgmv or add new backends in the future by just updating a single list.

gemini-code-assist · 2025-10-16T22:12:38Z

        for model_case in model_cases:
            for torch_dtype in TORCH_DTYPES:
                max_new_tokens = 32
+                backend = "triton"


To improve the test suite's flexibility, consider iterating over the BACKENDS list from utils.py instead of hardcoding "triton". This would make it easier to test multiple backends in the future by simply updating the list.

For example:

from .utils import BACKENDS # ... class TestLoRA(CustomTestCase): # ... def _run_lora_multiple_batch_on_model_cases(self, model_cases: List[LoRAModelCase]): for model_case in model_cases: for torch_dtype in TORCH_DTYPES: for backend in BACKENDS: # ... rest of the test logic using `backend` variable

gemini-code-assist · 2025-10-16T22:12:38Z

                    model_case,
                    torch_dtype,
                    max_new_tokens=32,
+                    backend="triton",


Instead of hardcoding backend="triton", you could iterate over the BACKENDS list from utils.py to make the test more maintainable and easier to extend for other backends in the future.

Example:

from .utils import BACKENDS # ... class TestLoRACudaGraph(CustomTestCase): def _run_without_cuda_graph_on_model_cases(self, model_cases: List[LoRAModelCase]): # ... for model_case in model_cases: # ... for torch_dtype in TORCH_DTYPES: for backend in BACKENDS: run_lora_test_one_by_one( # ... backend=backend, # ... )

gemini-code-assist · 2025-10-16T22:12:38Z

                    model_case,
                    torch_dtype,
                    max_new_tokens=32,
+                    backend="triton",


Similar to the above, consider iterating over BACKENDS from utils.py here as well to improve test flexibility and maintainability.

gemini-code-assist · 2025-10-16T22:12:38Z

    ):
        REUSED_LORA_NAME = "lora"
        max_new_tokens = 256
+        backend = "triton"


To make this test more maintainable, consider iterating over the BACKENDS list from utils.py instead of hardcoding "triton". This would allow you to easily enable or disable backend tests from a central place.

Example:

from .utils import BACKENDS # ... def _run_test( self, # ... ): for backend in BACKENDS: # ... rest of the test logic using `backend` variable

gemini-code-assist · 2025-10-16T22:12:38Z

        for model_case in model_cases:
            for torch_dtype in TORCH_DTYPES:
                max_new_tokens = 32
+                backend = "triton"


For better maintainability, I'd suggest iterating over the BACKENDS list from utils.py rather than hardcoding "triton". This makes it straightforward to add or remove backends from testing in the future.

Example:

from .utils import BACKENDS # ... class TestLoRAQwen3(CustomTestCase): def _run_lora_multiple_batch_on_model_cases(self, model_cases: List[LoRAModelCase]): for model_case in model_cases: for torch_dtype in TORCH_DTYPES: for backend in BACKENDS: # ... rest of the test logic

gemini-code-assist · 2025-10-16T22:12:39Z


        torch_dtype = torch.float16
        max_new_tokens = 32
+        backend = "triton"


Instead of hardcoding backend = "triton", consider iterating over the BACKENDS list from utils.py. This would make the test suite more flexible and easier to maintain when new backends are added or existing ones are re-enabled for testing.

Example:

from .utils import BACKENDS # ... class TestLoRARadixCache(CustomTestCase): def test_lora_radix_cache(self): # ... for backend in BACKENDS: # ... rest of the test logic using `backend` variable

gemini-code-assist · 2025-10-16T22:12:39Z

                        model_case,
                        torch_dtype,
                        max_new_tokens=32,
+                        backend="triton",


To improve maintainability, it would be better to iterate over the BACKENDS list from utils.py here instead of hardcoding backend="triton". This would make it easier to manage which backends are included in the test run.

Example:

from .utils import BACKENDS # ... class TestLoRATP(CustomTestCase): def _run_tp_on_model_cases(self, model_cases: List[LoRAModelCase]): # ... for model_case in model_cases: # ... for tp_size in tp_list: model_case.tp_size = tp_size for torch_dtype in TORCH_DTYPES: for backend in BACKENDS: run_lora_test_one_by_one( # ... backend=backend, # ... )

Revert "Set csgmv as default lora backend. (#11488)"

12245b0

This reverts commit b0d20cd.

zhyncs requested review from Fridge003 and lifuhuang October 16, 2025 22:09

zhyncs assigned lifuhuang, zhyncs and Fridge003 Oct 16, 2025

zhyncs added the high priority label Oct 16, 2025

sglang-bot added the run-ci label Oct 16, 2025

gemini-code-assist Bot reviewed Oct 16, 2025

View reviewed changes

Fridge003 approved these changes Oct 16, 2025

View reviewed changes

zhyncs and others added 2 commits October 16, 2025 17:35

Merge branch 'main' into zhyncs/revert

19be530

Merge branch 'main' into zhyncs/revert

2d98883

Fridge003 merged commit da681f3 into main Oct 17, 2025
100 of 106 checks passed

Fridge003 deleted the zhyncs/revert branch October 17, 2025 17:01

lifuhuang mentioned this pull request Oct 26, 2025

Add env var to control custom Triton kernel cache and set CSGMV as default backend. #12176

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Revert "Set csgmv as default lora backend. (#11488)"#11735

Revert "Set csgmv as default lora backend. (#11488)"#11735
Fridge003 merged 3 commits intomainfrom
zhyncs/revert

zhyncs commented Oct 16, 2025

Uh oh!

gemini-code-assist Bot commented Oct 16, 2025

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Oct 16, 2025

Uh oh!

gemini-code-assist Bot Oct 16, 2025

Uh oh!

gemini-code-assist Bot Oct 16, 2025

Uh oh!

gemini-code-assist Bot Oct 16, 2025

Uh oh!

gemini-code-assist Bot Oct 16, 2025

Uh oh!

gemini-code-assist Bot Oct 16, 2025

Uh oh!

gemini-code-assist Bot Oct 16, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

zhyncs commented Oct 16, 2025

Motivation

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

Uh oh!

gemini-code-assist Bot commented Oct 16, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Oct 16, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Oct 16, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Oct 16, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Oct 16, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Oct 16, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Oct 16, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Oct 16, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants