add sgl_kernel.fast_topk_v2 to top_k benchmark by huangzhilin-hzl · Pull Request #2461 · flashinfer-ai/flashinfer

huangzhilin-hzl · 2026-02-02T10:34:58Z

📌 Description

add sgl_kernel.fast_topk_v2 to top_k benchmark

🔍 Related Issues

🚀 Pull Request Checklist

Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete.

✅ Pre-commit Checks

I have installed pre-commit by running pip install pre-commit (or used your preferred method).
I have installed the hooks with pre-commit install.
I have run the hooks manually with pre-commit run --all-files and fixed any reported issues.

If you are unsure about how to set up pre-commit, see the pre-commit documentation.

🧪 Tests

Tests have been added or updated as needed.
All tests are passing (unittest, etc.).

Reviewer Notes

Summary by CodeRabbit

Tests
- Added SGLang comparison integration to top-k benchmarking with timing measurements and performance speedup metrics.
- Enhanced benchmark reporting to display SGLang results alongside existing comparisons.
- Improved result table formatting with conditional column expansion and dynamic layout to accommodate additional metrics.
- Maintained backward compatibility with existing benchmark features and error handling.

gemini-code-assist · 2026-02-02T10:35:14Z

Summary of Changes

Hello @huangzhilin-hzl, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request enhances the top_k benchmark by incorporating SGLang's fast_topk_v2 implementation. This addition provides a more comprehensive performance comparison for top_k operations, allowing developers to evaluate FlashInfer's efficiency against another optimized library under specific conditions. The changes also refine the benchmark's output presentation to clearly display these new comparative metrics.

Highlights

SGLang fast_topk_v2 Integration: Integrated sgl_kernel.fast_topk_v2 into the top_k benchmark suite to allow performance comparison against FlashInfer and PyTorch implementations.
Conditional Benchmarking: The SGLang benchmark is conditionally executed only when k=2048 and dtype=torch.float32, reflecting its current limitations.
Enhanced Benchmark Output: Modified the benchmark output table to include dedicated columns for SGLang performance (sglang_us) and its speedup relative to FlashInfer (speedup_vs_sglang) when the comparison is enabled.
compare_sglang Argument: Introduced a new boolean argument compare_sglang to the bench_top_k function and the main script's argument parser, allowing users to enable or disable the SGLang comparison.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

coderabbitai · 2026-02-02T10:35:16Z

📝 Walkthrough

Walkthrough

SGLang comparison functionality is integrated into the top-k benchmark with conditional execution when compare_sglang flag is enabled. The implementation enforces constraints (k=2048, float32 dtype), executes SGLang's fast_topk_v2 kernel, records timing and speedup metrics, and adjusts result reporting accordingly while maintaining backward compatibility.

Changes

Cohort / File(s)	Summary
SGLang Comparison Integration `benchmarks/bench_topk.py`	Adds SGLang kernel execution with constraint validation (k=2048, float32 required), timing measurement, and speedup calculation. Updates result table to conditionally display SGLang metrics and adjusts UI formatting (header dashes, column expansion) to accommodate optional timing columns. Maintains graceful fallback if SGLang unavailable.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 A speedy topk, now measured twice,
SGLang joins the benchmark spice,
When k is 2048, float's the way,
Speedup metrics dance and play,
Compare and contrast, side by side! ✨

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check	❓ Inconclusive	PR description includes the main objective but lacks detail in required sections. The Description section only briefly states what was added without explaining why, and Related Issues and Reviewer Notes sections are left empty.	Expand the Description section to explain the motivation and benefits of adding SGLang comparison. Fill in Related Issues if applicable, and optionally add Reviewer Notes highlighting areas of focus (e.g., SGLang integration handling, conditional metrics).

✅ Passed checks (1 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately describes the main change: adding sgl_kernel.fast_topk_v2 to the top_k benchmark.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist

Code Review

This pull request adds a benchmark for SGLang's fast_topk_v2 kernel to the existing top-k benchmark script. The changes correctly add the new benchmark logic and update the output printing to include the SGLang results. My review focuses on improving code maintainability by replacing hardcoded magic numbers related to SGLang's constraints with named constants. This will make the code cleaner and easier to update in the future.

gemini-code-assist · 2026-02-02T10:36:55Z

    result["speedup_vs_torch"] = torch_ms / fi_ms

+    # SGLang comparison (only supports k=2048 and float32)
+    if compare_sglang and HAS_SGL_KERNEL and k == 2048 and dtype == torch.float32:


The values 2048 and torch.float32 are hardcoded. These seem to be specific constraints from the SGLang kernel. To improve readability and maintainability, it's better to define them as constants at the top of the file. This would also allow reusing them in other parts of the code where the same constraints apply.

For example, you could define at the top of the file:

SGLANG_SUPPORTED_K = 2048 SGLANG_SUPPORTED_DTYPE = torch.float32

And then use these constants here.

Suggested change

if compare_sglang and HAS_SGL_KERNEL and k == 2048 and dtype == torch.float32:

if compare_sglang and HAS_SGL_KERNEL and k == SGLANG_SUPPORTED_K and dtype == SGLANG_SUPPORTED_DTYPE:

gemini-code-assist · 2026-02-02T10:36:55Z

        print("=" * 100)
        print(f"top_k: Basic radix-based top-k selection (dtype={dtype_str})")
+        if args.compare_sglang:
+            print("NOTE: SGLang only supports k=2048 and float32")


The hardcoded value 2048 should be replaced with the SGLANG_SUPPORTED_K constant for better maintainability. You could also consider deriving the 'float32' string from the SGLANG_SUPPORTED_DTYPE constant for full consistency.

Suggested change

print("NOTE: SGLang only supports k=2048 and float32")

print(f"NOTE: SGLang only supports k={SGLANG_SUPPORTED_K} and float32")

gemini-code-assist · 2026-02-02T10:36:55Z

                        )
+                        if "sglang_us" in result:
+                            line += f" {result['sglang_us']:>10.2f}us {result['speedup_vs_sglang']:>9.2f}x"
+                        elif args.compare_sglang and k == 2048:


The magic number 2048 should be replaced with the SGLANG_SUPPORTED_K constant for consistency and maintainability.

Suggested change

elif args.compare_sglang and k == 2048:

elif args.compare_sglang and k == SGLANG_SUPPORTED_K:

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In `@benchmarks/bench_topk.py`:
- Around line 326-330: The "(SGLang error)" message is being printed even when
SGLang is simply unsupported for the chosen dtype; update the conditional that
currently checks `elif args.compare_sglang and k == 2048:` to also require the
dtype be float32 (e.g., check `args.dtype == torch.float32` or equivalent) so it
only reports "(SGLang error)" for real runtime errors on float32 runs; apply the
same change to the other two identical spots in the file where
`args.compare_sglang and k == 2048` is used so the logic in those benchmark
sections matches (search for the three occurrences around the shown blocks).

coderabbitai · 2026-02-02T10:38:41Z

+                        if "sglang_us" in result:
+                            line += f" {result['sglang_us']:>10.2f}us {result['speedup_vs_sglang']:>9.2f}x"
+                        elif args.compare_sglang and k == 2048:
+                            line += " (SGLang error)"
+                        print(line)


⚠️ Potential issue | 🟡 Minor

Misleading "(SGLang error)" message when dtype is not float32.

When --compare-sglang --dtype fp16 is used with k=2048, SGLang won't run (since it only supports float32), but this condition will print "(SGLang error)" — implying a failure rather than an unsupported configuration.

The condition should also check for dtype == torch.float32 to correctly distinguish between unsupported configurations and actual runtime errors.

🔧 Proposed fix

if "sglang_us" in result: line += f" {result['sglang_us']:>10.2f}us {result['speedup_vs_sglang']:>9.2f}x" - elif args.compare_sglang and k == 2048: + elif args.compare_sglang and k == 2048 and dtype == torch.float32: line += " (SGLang error)" print(line)

Note: The same pattern appears at lines 372-373 and 416-417 for the other benchmark sections. Consider applying the same fix there for consistency.

🤖 Prompt for AI Agents

In `@benchmarks/bench_topk.py` around lines 326 - 330, The "(SGLang error)" message is being printed even when SGLang is simply unsupported for the chosen dtype; update the conditional that currently checks `elif args.compare_sglang and k == 2048:` to also require the dtype be float32 (e.g., check `args.dtype == torch.float32` or equivalent) so it only reports "(SGLang error)" for real runtime errors on float32 runs; apply the same change to the other two identical spots in the file where `args.compare_sglang and k == 2048` is used so the logic in those benchmark sections matches (search for the three occurrences around the shown blocks).

yzh119

@huangzhilin-hzl thanks for the contribution, here is the result I get for fast_topk_v2 on B200:

====================================================================================================
top_k_page_table_transform: Fused top-k + page table gather (dtype=FP32)
NOTE: SGLang only supports k=2048 and float32
====================================================================================================
 batch    seq_len      k |   FlashInfer       SGLang    Speedup
------------------------------------------------------------------------------------------
     1       4096    256 |       8.38us
     1       4096    512 |       7.62us
     1       4096   1024 |       8.67us
     1       4096   2048 |       5.34us       5.17us      0.97x
     1       4096   4096 |       4.29us
     1      16384    256 |      12.46us
     1      16384    512 |      11.87us
     1      16384   1024 |      14.67us
     1      16384   2048 |      15.07us      15.39us      1.02x
     1      16384   4096 |      20.90us
     1      65536    256 |      31.30us
     1      65536    512 |      31.62us
     1      65536   1024 |      32.16us
     1      65536   2048 |      33.76us      40.88us      1.21x
     1      65536   4096 |      35.46us
     1     131072    256 |      36.32us
     1     131072    512 |      36.32us
     1     131072   1024 |      36.86us
     1     131072   2048 |      38.38us      68.00us      1.77x
     1     131072   4096 |      40.75us
     1     262144    256 |      40.75us
     1     262144    512 |      40.99us
     1     262144   1024 |      41.22us
     1     262144   2048 |      42.85us     124.90us      2.91x
     1     262144   4096 |      45.22us
     1     524288    256 |      43.07us
     1     524288    512 |      43.14us
     1     524288   1024 |      43.23us
     1     524288   2048 |      43.84us     213.49us      4.87x
     1     524288   4096 |      45.22us
    16       4096    256 |       8.54us
    16       4096    512 |       9.50us
    16       4096   1024 |       8.80us
    16       4096   2048 |       7.97us       7.84us      0.98x
    16       4096   4096 |       4.48us
    16      16384    256 |      12.90us
    16      16384    512 |      12.90us
    16      16384   1024 |      14.85us
    16      16384   2048 |      15.01us      15.52us      1.03x
    16      16384   4096 |      20.99us
    16      65536    256 |      24.35us
    16      65536    512 |      29.58us
    16      65536   1024 |      30.53us
    16      65536   2048 |      33.60us      41.15us      1.22x
    16      65536   4096 |      39.87us
    16     131072    256 |      40.46us
    16     131072    512 |      41.20us
    16     131072   1024 |      51.65us
    16     131072   2048 |      51.78us      69.22us      1.34x
    16     131072   4096 |      45.63us
    16     262144    256 |      45.44us
    16     262144    512 |      45.62us
    16     262144   1024 |      46.21us
    16     262144   2048 |      47.54us     126.29us      2.66x
    16     262144   4096 |      49.71us
    16     524288    256 |      87.39us
    16     524288    512 |      87.84us
    16     524288   1024 |      88.67us
    16     524288   2048 |      89.50us     216.21us      2.42x
    16     524288   4096 |      92.43us
    64       4096    256 |       9.60us
    64       4096    512 |       9.12us
    64       4096   1024 |       9.73us
    64       4096   2048 |       8.48us       8.35us      0.98x
    64       4096   4096 |       4.70us
    64      16384    256 |      13.18us
    64      16384    512 |      13.65us
    64      16384   1024 |      15.26us
    64      16384   2048 |      16.10us      16.61us      1.03x
    64      16384   4096 |      21.76us
    64      65536    256 |      25.50us
    64      65536    512 |      30.46us
    64      65536   1024 |      31.30us
    64      65536   2048 |      34.85us      42.24us      1.21x
    64      65536   4096 |      41.60us
    64     131072    256 |      42.08us
    64     131072    512 |      42.03us
    64     131072   1024 |      52.34us
    64     131072   2048 |      53.52us      71.42us      1.33x
    64     131072   4096 |      90.30us
    64     262144    256 |      72.22us
    64     262144    512 |      79.20us
    64     262144   1024 |      79.63us
    64     262144   2048 |      96.35us     145.25us      1.51x
    64     262144   4096 |     146.66us
    64     524288    256 |     157.60us
    64     524288    512 |     158.18us
    64     524288   1024 |     170.53us
    64     524288   2048 |     171.41us     354.24us      2.07x
    64     524288   4096 |     229.20us
   256       4096    256 |      16.61us
   256       4096    512 |      17.70us
   256       4096   1024 |      17.22us
   256       4096   2048 |      15.10us      10.53us      0.70x
   256       4096   4096 |       7.47us
   256      16384    256 |      25.73us
   256      16384    512 |      27.20us
   256      16384   1024 |      29.10us
   256      16384   2048 |      32.26us      25.31us      0.78x
   256      16384   4096 |      42.08us
   256      65536    256 |      51.63us
   256      65536    512 |      62.42us
   256      65536   1024 |      64.74us
   256      65536   2048 |      73.12us      64.42us      0.88x
   256      65536   4096 |     153.02us
   256     131072    256 |      88.61us
   256     131072    512 |      90.22us
   256     131072   1024 |     108.10us
   256     131072   2048 |     112.61us     125.34us      1.11x
   256     131072   4096 |     261.50us
   256     262144    256 |     173.25us
   256     262144    512 |     186.21us
   256     262144   1024 |     188.06us
   256     262144   2048 |     225.66us     231.12us      1.02x
   256     262144   4096 |     422.05us
   256     524288    256 |     320.86us
   256     524288    512 |     322.24us
   256     524288   1024 |     349.15us
   256     524288   2048 |     352.86us     397.81us      1.13x
   256     524288   4096 |     835.82us

A faster version of topk will be landing in flashinfer soon.

huangzhilin-hzl · 2026-02-03T12:02:05Z

@huangzhilin-hzl thanks for the contribution, here is the result I get for fast_topk_v2 on B200:

====================================================================================================
top_k_page_table_transform: Fused top-k + page table gather (dtype=FP32)
NOTE: SGLang only supports k=2048 and float32
====================================================================================================
 batch    seq_len      k |   FlashInfer       SGLang    Speedup
------------------------------------------------------------------------------------------
     1       4096    256 |       8.38us
     1       4096    512 |       7.62us
     1       4096   1024 |       8.67us
     1       4096   2048 |       5.34us       5.17us      0.97x
     1       4096   4096 |       4.29us
     1      16384    256 |      12.46us
     1      16384    512 |      11.87us
     1      16384   1024 |      14.67us
     1      16384   2048 |      15.07us      15.39us      1.02x
     1      16384   4096 |      20.90us
     1      65536    256 |      31.30us
     1      65536    512 |      31.62us
     1      65536   1024 |      32.16us
     1      65536   2048 |      33.76us      40.88us      1.21x
     1      65536   4096 |      35.46us
     1     131072    256 |      36.32us
     1     131072    512 |      36.32us
     1     131072   1024 |      36.86us
     1     131072   2048 |      38.38us      68.00us      1.77x
     1     131072   4096 |      40.75us
     1     262144    256 |      40.75us
     1     262144    512 |      40.99us
     1     262144   1024 |      41.22us
     1     262144   2048 |      42.85us     124.90us      2.91x
     1     262144   4096 |      45.22us
     1     524288    256 |      43.07us
     1     524288    512 |      43.14us
     1     524288   1024 |      43.23us
     1     524288   2048 |      43.84us     213.49us      4.87x
     1     524288   4096 |      45.22us
    16       4096    256 |       8.54us
    16       4096    512 |       9.50us
    16       4096   1024 |       8.80us
    16       4096   2048 |       7.97us       7.84us      0.98x
    16       4096   4096 |       4.48us
    16      16384    256 |      12.90us
    16      16384    512 |      12.90us
    16      16384   1024 |      14.85us
    16      16384   2048 |      15.01us      15.52us      1.03x
    16      16384   4096 |      20.99us
    16      65536    256 |      24.35us
    16      65536    512 |      29.58us
    16      65536   1024 |      30.53us
    16      65536   2048 |      33.60us      41.15us      1.22x
    16      65536   4096 |      39.87us
    16     131072    256 |      40.46us
    16     131072    512 |      41.20us
    16     131072   1024 |      51.65us
    16     131072   2048 |      51.78us      69.22us      1.34x
    16     131072   4096 |      45.63us
    16     262144    256 |      45.44us
    16     262144    512 |      45.62us
    16     262144   1024 |      46.21us
    16     262144   2048 |      47.54us     126.29us      2.66x
    16     262144   4096 |      49.71us
    16     524288    256 |      87.39us
    16     524288    512 |      87.84us
    16     524288   1024 |      88.67us
    16     524288   2048 |      89.50us     216.21us      2.42x
    16     524288   4096 |      92.43us
    64       4096    256 |       9.60us
    64       4096    512 |       9.12us
    64       4096   1024 |       9.73us
    64       4096   2048 |       8.48us       8.35us      0.98x
    64       4096   4096 |       4.70us
    64      16384    256 |      13.18us
    64      16384    512 |      13.65us
    64      16384   1024 |      15.26us
    64      16384   2048 |      16.10us      16.61us      1.03x
    64      16384   4096 |      21.76us
    64      65536    256 |      25.50us
    64      65536    512 |      30.46us
    64      65536   1024 |      31.30us
    64      65536   2048 |      34.85us      42.24us      1.21x
    64      65536   4096 |      41.60us
    64     131072    256 |      42.08us
    64     131072    512 |      42.03us
    64     131072   1024 |      52.34us
    64     131072   2048 |      53.52us      71.42us      1.33x
    64     131072   4096 |      90.30us
    64     262144    256 |      72.22us
    64     262144    512 |      79.20us
    64     262144   1024 |      79.63us
    64     262144   2048 |      96.35us     145.25us      1.51x
    64     262144   4096 |     146.66us
    64     524288    256 |     157.60us
    64     524288    512 |     158.18us
    64     524288   1024 |     170.53us
    64     524288   2048 |     171.41us     354.24us      2.07x
    64     524288   4096 |     229.20us
   256       4096    256 |      16.61us
   256       4096    512 |      17.70us
   256       4096   1024 |      17.22us
   256       4096   2048 |      15.10us      10.53us      0.70x
   256       4096   4096 |       7.47us
   256      16384    256 |      25.73us
   256      16384    512 |      27.20us
   256      16384   1024 |      29.10us
   256      16384   2048 |      32.26us      25.31us      0.78x
   256      16384   4096 |      42.08us
   256      65536    256 |      51.63us
   256      65536    512 |      62.42us
   256      65536   1024 |      64.74us
   256      65536   2048 |      73.12us      64.42us      0.88x
   256      65536   4096 |     153.02us
   256     131072    256 |      88.61us
   256     131072    512 |      90.22us
   256     131072   1024 |     108.10us
   256     131072   2048 |     112.61us     125.34us      1.11x
   256     131072   4096 |     261.50us
   256     262144    256 |     173.25us
   256     262144    512 |     186.21us
   256     262144   1024 |     188.06us
   256     262144   2048 |     225.66us     231.12us      1.02x
   256     262144   4096 |     422.05us
   256     524288    256 |     320.86us
   256     524288    512 |     322.24us
   256     524288   1024 |     349.15us
   256     524288   2048 |     352.86us     397.81us      1.13x
   256     524288   4096 |     835.82us

A faster version of topk will be landing in flashinfer soon.

have any specific timings for the top-k optimization? and is the performance gain also expected to apply to top_k_page_table_transform?

I'm trying to integrate it into SGLang, but I'm finding that for seq_len < 65536, top_k_page_table_transform is only beneficial during the prefill phase. In the decode phase, it actually hurts performance.

bkryu · 2026-02-03T18:50:19Z

have any specific timings for the top-k optimization? and is the performance gain also expected to apply to top_k_page_table_transform?

I'm trying to integrate it into SGLang, but I'm finding that for seq_len < 65536, top_k_page_table_transform is only beneficial during the prefill phase. In the decode phase, it actually hurts performance.

Hi @huangzhilin-hzl, is there a specific timeline you and SGLang are trying to aim for? We are sorting out some final optimizations such as variable seqlen support before landing a mature versio, but I'm curious whether there is a need to accelerate our plan.

huangzhilin-hzl · 2026-02-04T02:55:32Z

have any specific timings for the top-k optimization? and is the performance gain also expected to apply to top_k_page_table_transform?
I'm trying to integrate it into SGLang, but I'm finding that for seq_len < 65536, top_k_page_table_transform is only beneficial during the prefill phase. In the decode phase, it actually hurts performance.

Hi @huangzhilin-hzl, is there a specific timeline you and SGLang are trying to aim for? We are sorting out some final optimizations such as variable seqlen support before landing a mature versio, but I'm curious whether there is a need to accelerate our plan.

@bkryu Thanks for asking! There's no urgent timeline on my end. I'm just benchmarking flashinfer against sgl-kernel. top_k_page_table_transform to explore potential performance improvements for SGLang.

Here are my H20 benchmarks for prefill and decode:
Setup: dsv3.2 with chunk-size=16384, SGLang using fast_topk_transform_fused API.

In the prefill scenario, where prefill_bs <= B, the topk_transform_prefill_kernel is called

batch	seq_len	k	FlashInfer	SGLang
1	2048	2048	22.24 us	34.56 us
1	4096	2048	202.94 us	223.95 us
1	8192	2048	636.80 us	676.74 us
1	16384	2048	1885.76 us	2022.46 us

In the decode scenario, where prefill_bs == B, and the batch_size is a multiple of 8, the topk_transform_decode_kernel is called.

batch	seq_len	k	FlashInfer	SGLang
8	1024	2048	6.59 us	6.05 us
8	2048	2048	7.09 us	6.24 us
8	4096	2048	11.49 us	11.06 us
8	8192	2048	15.41 us	14.98 us
8	16384	2048	19.44 us	19.36 us
8	65536	2048	40.16 us	46.19 us

## 📌 Description add sgl_kernel.fast_topk_v2 to top_k benchmark  ## 🔍 Related Issues  ## 🚀 Pull Request Checklist Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete. ### ✅ Pre-commit Checks - [x] I have installed `pre-commit` by running `pip install pre-commit` (or used your preferred method). - [x] I have installed the hooks with `pre-commit install`. - [x] I have run the hooks manually with `pre-commit run --all-files` and fixed any reported issues. > If you are unsure about how to set up `pre-commit`, see [the pre-commit documentation](https://pre-commit.com/). ## 🧪 Tests - [x] Tests have been added or updated as needed. - [x] All tests are passing (`unittest`, etc.). ## Reviewer Notes   ## Summary by CodeRabbit * **Tests** * Added SGLang comparison integration to top-k benchmarking with timing measurements and performance speedup metrics. * Enhanced benchmark reporting to display SGLang results alongside existing comparisons. * Improved result table formatting with conditional column expansion and dynamic layout to accommodate additional metrics. * Maintained backward compatibility with existing benchmark features and error handling.  Co-authored-by: 墨楼 <huangzhilin.hzl@antgroup.com>

add sgl_kernel.fast_topk_v2 to top_k benchmark

35990e5

huangzhilin-hzl requested review from Anerudhan, bkryu, jiahanc, kahyunnam and yzh119 as code owners February 2, 2026 10:34

gemini-code-assist Bot reviewed Feb 2, 2026

View reviewed changes

coderabbitai Bot reviewed Feb 2, 2026

View reviewed changes

yzh119 approved these changes Feb 3, 2026

View reviewed changes

yzh119 closed this Feb 3, 2026

yzh119 reopened this Feb 3, 2026

yzh119 merged commit 95d8511 into flashinfer-ai:main Feb 3, 2026
33 checks passed

This was referenced Mar 15, 2026

feat: implement deterministic topk #2661

Merged

[feat] Add optimized clusters TopK kernel and DeepSeekV3.2 sparse indexer fused kernel #2814

Open

coderabbitai Bot mentioned this pull request Apr 14, 2026

[feat] Faster topk algorithm #3009

Merged

5 tasks

	if compare_sglang and HAS_SGL_KERNEL and k == 2048 and dtype == torch.float32:
	if compare_sglang and HAS_SGL_KERNEL and k == SGLANG_SUPPORTED_K and dtype == SGLANG_SUPPORTED_DTYPE:

	print("NOTE: SGLang only supports k=2048 and float32")
	print(f"NOTE: SGLang only supports k={SGLANG_SUPPORTED_K} and float32")

	elif args.compare_sglang and k == 2048:
	elif args.compare_sglang and k == SGLANG_SUPPORTED_K:

Conversation

huangzhilin-hzl commented Feb 2, 2026 • edited by yzh119 Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

📌 Description

🔍 Related Issues

🚀 Pull Request Checklist

✅ Pre-commit Checks

🧪 Tests

Reviewer Notes

Summary by CodeRabbit

Uh oh!

gemini-code-assist Bot commented Feb 2, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

coderabbitai Bot commented Feb 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Feb 2, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Feb 2, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Feb 2, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Feb 2, 2026

Choose a reason for hiding this comment

Uh oh!

yzh119 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

huangzhilin-hzl commented Feb 3, 2026

Uh oh!

bkryu commented Feb 3, 2026

Uh oh!

huangzhilin-hzl commented Feb 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

huangzhilin-hzl commented Feb 2, 2026 •

edited by yzh119

Loading

coderabbitai Bot commented Feb 2, 2026 •

edited

Loading