Skip to content

Revert "Set csgmv as default lora backend. (#11488)"#11735

Merged
Fridge003 merged 3 commits intomainfrom
zhyncs/revert
Oct 17, 2025
Merged

Revert "Set csgmv as default lora backend. (#11488)"#11735
Fridge003 merged 3 commits intomainfrom
zhyncs/revert

Conversation

@zhyncs
Copy link
Copy Markdown
Collaborator

@zhyncs zhyncs commented Oct 16, 2025

This reverts commit b0d20cd.

Motivation

ref #11488 (comment)

We can close this pull request once the issue has been resolved. I am using this PR for testing purposes, and if the issue is not fixed today, I will merge it first.

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello @zhyncs, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request is a temporary measure to revert the default LoRA backend from 'csgmv' back to 'triton'. This change is being made for testing purposes to address an identified issue, with the intention to merge it if the underlying problem is not resolved promptly. The primary impact is that 'triton' will once again be the standard LoRA backend used by default throughout the system and its associated tests.

Highlights

  • Revert Default LoRA Backend: This pull request reverts the change that previously set 'csgmv' as the default LoRA backend, restoring 'triton' as the default option across server arguments, benchmark scripts, and test runners.
  • Test Configuration Updates: All relevant LoRA test files have been updated to explicitly use 'triton' as the backend or to reflect the removal of 'csgmv' from the list of available backends, ensuring consistency with the reverted default.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request reverts the default LoRA backend from csgmv to triton across the codebase, including server arguments, benchmarks, and test configurations. The changes appear correct and consistent with the goal of the revert.

I've added a few suggestions in the test files to improve maintainability. Instead of hardcoding the triton backend, the tests could iterate over a list of supported backends defined in test/srt/lora/utils.py. This would make it easier to re-enable tests for csgmv or add new backends in the future by just updating a single list.

for model_case in model_cases:
for torch_dtype in TORCH_DTYPES:
max_new_tokens = 32
backend = "triton"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

To improve the test suite's flexibility, consider iterating over the BACKENDS list from utils.py instead of hardcoding "triton". This would make it easier to test multiple backends in the future by simply updating the list.

For example:

from .utils import BACKENDS
# ...

class TestLoRA(CustomTestCase):
    # ...
    def _run_lora_multiple_batch_on_model_cases(self, model_cases: List[LoRAModelCase]):
        for model_case in model_cases:
            for torch_dtype in TORCH_DTYPES:
                for backend in BACKENDS:
                    # ... rest of the test logic using `backend` variable

model_case,
torch_dtype,
max_new_tokens=32,
backend="triton",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Instead of hardcoding backend="triton", you could iterate over the BACKENDS list from utils.py to make the test more maintainable and easier to extend for other backends in the future.

Example:

from .utils import BACKENDS
# ...

class TestLoRACudaGraph(CustomTestCase):
    def _run_without_cuda_graph_on_model_cases(self, model_cases: List[LoRAModelCase]):
        # ...
        for model_case in model_cases:
            # ...
            for torch_dtype in TORCH_DTYPES:
                for backend in BACKENDS:
                    run_lora_test_one_by_one(
                        # ...
                        backend=backend,
                        # ...
                    )

model_case,
torch_dtype,
max_new_tokens=32,
backend="triton",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Similar to the above, consider iterating over BACKENDS from utils.py here as well to improve test flexibility and maintainability.

):
REUSED_LORA_NAME = "lora"
max_new_tokens = 256
backend = "triton"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

To make this test more maintainable, consider iterating over the BACKENDS list from utils.py instead of hardcoding "triton". This would allow you to easily enable or disable backend tests from a central place.

Example:

from .utils import BACKENDS
# ...

    def _run_test(
        self,
        # ...
    ):
        for backend in BACKENDS:
            # ... rest of the test logic using `backend` variable

for model_case in model_cases:
for torch_dtype in TORCH_DTYPES:
max_new_tokens = 32
backend = "triton"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For better maintainability, I'd suggest iterating over the BACKENDS list from utils.py rather than hardcoding "triton". This makes it straightforward to add or remove backends from testing in the future.

Example:

from .utils import BACKENDS
# ...

class TestLoRAQwen3(CustomTestCase):
    def _run_lora_multiple_batch_on_model_cases(self, model_cases: List[LoRAModelCase]):
        for model_case in model_cases:
            for torch_dtype in TORCH_DTYPES:
                for backend in BACKENDS:
                    # ... rest of the test logic


torch_dtype = torch.float16
max_new_tokens = 32
backend = "triton"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Instead of hardcoding backend = "triton", consider iterating over the BACKENDS list from utils.py. This would make the test suite more flexible and easier to maintain when new backends are added or existing ones are re-enabled for testing.

Example:

from .utils import BACKENDS
# ...

class TestLoRARadixCache(CustomTestCase):
    def test_lora_radix_cache(self):
        # ...
        for backend in BACKENDS:
            # ... rest of the test logic using `backend` variable

model_case,
torch_dtype,
max_new_tokens=32,
backend="triton",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

To improve maintainability, it would be better to iterate over the BACKENDS list from utils.py here instead of hardcoding backend="triton". This would make it easier to manage which backends are included in the test run.

Example:

from .utils import BACKENDS
# ...

class TestLoRATP(CustomTestCase):
    def _run_tp_on_model_cases(self, model_cases: List[LoRAModelCase]):
        # ...
        for model_case in model_cases:
            # ...
            for tp_size in tp_list:
                model_case.tp_size = tp_size
                for torch_dtype in TORCH_DTYPES:
                    for backend in BACKENDS:
                        run_lora_test_one_by_one(
                            # ...
                            backend=backend,
                            # ...
                        )

@Fridge003 Fridge003 merged commit da681f3 into main Oct 17, 2025
100 of 106 checks passed
@Fridge003 Fridge003 deleted the zhyncs/revert branch October 17, 2025 17:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants