Revert "Set csgmv as default lora backend. (#11488)"#11735
Conversation
This reverts commit b0d20cd.
Summary of ChangesHello @zhyncs, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request is a temporary measure to revert the default LoRA backend from 'csgmv' back to 'triton'. This change is being made for testing purposes to address an identified issue, with the intention to merge it if the underlying problem is not resolved promptly. The primary impact is that 'triton' will once again be the standard LoRA backend used by default throughout the system and its associated tests. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request reverts the default LoRA backend from csgmv to triton across the codebase, including server arguments, benchmarks, and test configurations. The changes appear correct and consistent with the goal of the revert.
I've added a few suggestions in the test files to improve maintainability. Instead of hardcoding the triton backend, the tests could iterate over a list of supported backends defined in test/srt/lora/utils.py. This would make it easier to re-enable tests for csgmv or add new backends in the future by just updating a single list.
| for model_case in model_cases: | ||
| for torch_dtype in TORCH_DTYPES: | ||
| max_new_tokens = 32 | ||
| backend = "triton" |
There was a problem hiding this comment.
To improve the test suite's flexibility, consider iterating over the BACKENDS list from utils.py instead of hardcoding "triton". This would make it easier to test multiple backends in the future by simply updating the list.
For example:
from .utils import BACKENDS
# ...
class TestLoRA(CustomTestCase):
# ...
def _run_lora_multiple_batch_on_model_cases(self, model_cases: List[LoRAModelCase]):
for model_case in model_cases:
for torch_dtype in TORCH_DTYPES:
for backend in BACKENDS:
# ... rest of the test logic using `backend` variable| model_case, | ||
| torch_dtype, | ||
| max_new_tokens=32, | ||
| backend="triton", |
There was a problem hiding this comment.
Instead of hardcoding backend="triton", you could iterate over the BACKENDS list from utils.py to make the test more maintainable and easier to extend for other backends in the future.
Example:
from .utils import BACKENDS
# ...
class TestLoRACudaGraph(CustomTestCase):
def _run_without_cuda_graph_on_model_cases(self, model_cases: List[LoRAModelCase]):
# ...
for model_case in model_cases:
# ...
for torch_dtype in TORCH_DTYPES:
for backend in BACKENDS:
run_lora_test_one_by_one(
# ...
backend=backend,
# ...
)| model_case, | ||
| torch_dtype, | ||
| max_new_tokens=32, | ||
| backend="triton", |
| ): | ||
| REUSED_LORA_NAME = "lora" | ||
| max_new_tokens = 256 | ||
| backend = "triton" |
There was a problem hiding this comment.
To make this test more maintainable, consider iterating over the BACKENDS list from utils.py instead of hardcoding "triton". This would allow you to easily enable or disable backend tests from a central place.
Example:
from .utils import BACKENDS
# ...
def _run_test(
self,
# ...
):
for backend in BACKENDS:
# ... rest of the test logic using `backend` variable| for model_case in model_cases: | ||
| for torch_dtype in TORCH_DTYPES: | ||
| max_new_tokens = 32 | ||
| backend = "triton" |
There was a problem hiding this comment.
For better maintainability, I'd suggest iterating over the BACKENDS list from utils.py rather than hardcoding "triton". This makes it straightforward to add or remove backends from testing in the future.
Example:
from .utils import BACKENDS
# ...
class TestLoRAQwen3(CustomTestCase):
def _run_lora_multiple_batch_on_model_cases(self, model_cases: List[LoRAModelCase]):
for model_case in model_cases:
for torch_dtype in TORCH_DTYPES:
for backend in BACKENDS:
# ... rest of the test logic|
|
||
| torch_dtype = torch.float16 | ||
| max_new_tokens = 32 | ||
| backend = "triton" |
There was a problem hiding this comment.
Instead of hardcoding backend = "triton", consider iterating over the BACKENDS list from utils.py. This would make the test suite more flexible and easier to maintain when new backends are added or existing ones are re-enabled for testing.
Example:
from .utils import BACKENDS
# ...
class TestLoRARadixCache(CustomTestCase):
def test_lora_radix_cache(self):
# ...
for backend in BACKENDS:
# ... rest of the test logic using `backend` variable| model_case, | ||
| torch_dtype, | ||
| max_new_tokens=32, | ||
| backend="triton", |
There was a problem hiding this comment.
To improve maintainability, it would be better to iterate over the BACKENDS list from utils.py here instead of hardcoding backend="triton". This would make it easier to manage which backends are included in the test run.
Example:
from .utils import BACKENDS
# ...
class TestLoRATP(CustomTestCase):
def _run_tp_on_model_cases(self, model_cases: List[LoRAModelCase]):
# ...
for model_case in model_cases:
# ...
for tp_size in tp_list:
model_case.tp_size = tp_size
for torch_dtype in TORCH_DTYPES:
for backend in BACKENDS:
run_lora_test_one_by_one(
# ...
backend=backend,
# ...
)
This reverts commit b0d20cd.
Motivation
ref #11488 (comment)
We can close this pull request once the issue has been resolved. I am using this PR for testing purposes, and if the issue is not fixed today, I will merge it first.
Modifications
Accuracy Tests
Benchmarking and Profiling
Checklist