[Refactor] Benchmark Phase 1: extract utils and datasets from bench_serving by Ratish1 · Pull Request #19077 · sgl-project/sglang

Ratish1 · 2026-02-20T17:23:36Z

Motivation

This PR implements Phase 1 of issue #10177 by extracting benchmark utility and dataset logic from `python/sglang/bench_serving.py `into a new package structure under `python/sglang/benchmark/`.

Modifications

Added python/sglang/benchmark/utils.py for benchmark utility helpers.
Added python/sglang/benchmark/datasets/ with dataset-specific modules:
- common.py
- sharegpt.py
- random.py
- custom.py
- openai_dataset.py
- image.py
- mmmu.py
- mooncake.py
- generated_shared_prefix.py
- __init__.py with loader mapping and get_dataset(...).
Refactored python/sglang/bench_serving.py to import/re-export moved variables and remove moved implementations.
Tested python3 -m sglang.bench_serving --help and python3 -m sglang.bench_offline_throughput --help

Accuracy Tests

Benchmarking and Profiling

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.
Follow the SGLang code style guidance.

Review Process

Ping Merge Oncalls to start the PR flow. See the PR Merge Process.
Get approvals from CODEOWNERS and other reviewers.
Trigger CI tests with comments or contact authorized users to do so.
- /tag-run-ci-label, /rerun-failed-ci, /tag-and-rerun-ci
After green CI and required approvals, ask Merge Oncalls to merge.

gemini-code-assist · 2026-02-20T17:24:01Z

Summary of Changes

Hello @Ratish1, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request undertakes a significant refactoring effort for the benchmarking infrastructure. The primary goal is to improve the organization and maintainability of the codebase by separating concerns. Core utility functions and all dataset-specific logic, previously intertwined within bench_serving.py, have been moved into a new, well-structured benchmark package. This change makes the bench_serving.py file much cleaner and easier to understand, while also providing a clearer structure for extending benchmark capabilities in the future.

Highlights

Benchmark Utilities Extraction: Extracted general benchmark utility functions from bench_serving.py into a new dedicated python/sglang/benchmark/utils.py module.
Dataset Logic Modularization: Moved all dataset-related logic and sampling functions from bench_serving.py into a new python/sglang/benchmark/datasets/ package, enhancing modularity and organization.
Centralized Dataset Loading: Introduced a get_dataset function within the new datasets package, which acts as a centralized loader for various benchmark datasets, simplifying dataset management in bench_serving.py.
Codebase Cleanup: Significantly reduced the size and complexity of bench_serving.py by removing numerous functions and imports that are now handled by the new benchmark package.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog

python/sglang/bench_serving.py
- Removed various utility functions including remove_prefix, remove_suffix, parse_custom_headers, get_model, get_tokenizer, get_processor, get_dataset, download_and_cache_hf_file, download_and_cache_file, is_file_valid_json, get_mooncake_request_over_time, sample_mmmu_requests, sample_sharegpt_requests, sample_openai_requests, sample_custom_requests, compute_random_lens, sample_random_requests, parse_image_resolution, create_mm_data_row, sample_image_requests, get_available_tokens, gen_prompt, gen_mm_prompt, get_gen_prefix_cache_path, sample_generated_shared_prefix_requests, and set_ulimit.
- Removed imports for io, pickle, resource, functools.lru_cache, json.JSONDecodeError, pybase64, datasets.load_dataset, PIL.Image, transformers.AutoProcessor, transformers.PreTrainedTokenizer, transformers.PreTrainedTokenizerFast.
- Added imports for DatasetRow, get_dataset, get_mooncake_request_over_time from sglang.benchmark.datasets.
- Added imports for get_tokenizer, parse_custom_headers, remove_prefix, set_ulimit from sglang.benchmark.utils.
python/sglang/benchmark/init.py
- Added a new __init__.py file to define the benchmark package.
- Exported various utility functions and the datasets subpackage for external access.
python/sglang/benchmark/datasets/init.py
- Added a new __init__.py file to define the datasets subpackage.
- Defined ShareGPTDatasetLoader, RandomDatasetLoader, ImageDatasetLoader, GeneratedSharedPrefixDatasetLoader, MMMUDatasetLoader, MooncakeDatasetLoader, CustomDatasetLoader, and OpenAIDatasetLoader classes to encapsulate dataset loading logic.
- Created a DATASET_MAPPING dictionary to map dataset names to their respective loaders.
- Implemented get_dataset_loader and get_dataset functions for centralized dataset retrieval.
- Exported all relevant dataset-related functions and classes.
python/sglang/benchmark/datasets/common.py
- Added a new file to define common structures and constants for datasets.
- Defined ASSISTANT_SUFFIX, SHAREGPT_REPO_ID, SHAREGPT_FILENAME, and MOONCAKE_DATASET_URL constants.
- Introduced DatasetRow dataclass to standardize request data representation.
- Defined BaseDatasetLoader abstract base class for consistent dataset loading interfaces.
python/sglang/benchmark/datasets/custom.py
- Added a new file containing the sample_custom_requests function for loading custom JSONL datasets.
python/sglang/benchmark/datasets/generated_shared_prefix.py
- Added a new file containing functions for generating shared prefix requests, including get_available_tokens, gen_prompt, gen_mm_prompt, get_gen_prefix_cache_path, and sample_generated_shared_prefix_requests.
python/sglang/benchmark/datasets/image.py
- Added a new file containing functions for handling image-related dataset sampling, including parse_image_resolution, create_mm_data_row, and sample_image_requests.
python/sglang/benchmark/datasets/mmmu.py
- Added a new file containing the sample_mmmu_requests function for sampling from the MMMU dataset.
python/sglang/benchmark/datasets/mooncake.py
- Added a new file containing load_mooncake_requests and get_mooncake_request_over_time functions for Mooncake dataset handling.
python/sglang/benchmark/datasets/openai_dataset.py
- Added a new file containing the sample_openai_requests function for loading OpenAI-compatible chat completion requests.
python/sglang/benchmark/datasets/random.py
- Added a new file containing compute_random_lens and sample_random_requests functions for generating random requests.
python/sglang/benchmark/datasets/sharegpt.py
- Added a new file containing the sample_sharegpt_requests function for sampling from the ShareGPT dataset.
python/sglang/benchmark/utils.py
- Added a new file to house general utility functions for benchmarking.
- Included remove_prefix, remove_suffix, parse_custom_headers, get_model, get_tokenizer, get_processor, download_and_cache_hf_file, download_and_cache_file, is_file_valid_json, and set_ulimit.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request refactors the benchmark serving logic by extracting utilities and dataset loaders into a dedicated python/sglang/benchmark/ package. This is a positive step towards better code organization and modularity. My feedback focuses on improving type hint accuracy and removing minor redundancies in the newly added modules to ensure better maintainability and compatibility with static analysis tools.

hnyls2002 · 2026-02-20T20:34:22Z

The first step should only contain splitting the old bench_serving.py into different files based on the dataset. We can discuss the following designs later.

@Ratish1

Co-authored-by: Xuchun Shang <107600043+xucsh@users.noreply.github.com>

hnyls2002 · 2026-02-21T20:36:57Z

/tag-and-rerun-ci

hnyls2002 · 2026-02-21T21:49:55Z

Merge this PR as all stage-a and stage-b tests passed. (unrelated piecewise cudagraph test failure)

…erving (sgl-project#19077) Co-authored-by: Xuchun Shang <107600043+xucsh@users.noreply.github.com>

Refactor benchmark utils and datasets

fe3a9f3

gemini-code-assist Bot reviewed Feb 20, 2026

View reviewed changes

Comment thread python/sglang/benchmark/datasets/common.py Outdated

Comment thread python/sglang/benchmark/datasets/image.py Outdated

Comment thread python/sglang/benchmark/datasets/image.py

add missing imports

52ecc96

Ratish1 marked this pull request as ready for review February 20, 2026 17:46

hnyls2002 requested changes Feb 20, 2026

View reviewed changes

Comment thread python/sglang/benchmark/datasets/sharegpt.py Outdated

This comment was marked as outdated.

Sign in to view

hnyls2002 requested changes Feb 20, 2026

View reviewed changes

Comment thread python/sglang/benchmark/datasets/__init__.py Outdated

Comment thread python/sglang/benchmark/utils.py Outdated

address comments

43fd9b6

hnyls2002 mentioned this pull request Feb 21, 2026

[Refactor] Benchmark Scripts Refactor #10177

Open

more

b52c7ba

hnyls2002 added the high priority label Feb 21, 2026

move common parts

a2d1b48

Co-authored-by: Xuchun Shang <107600043+xucsh@users.noreply.github.com>

hnyls2002 force-pushed the benchmark-phase1-refactor branch from 00f0c5d to a2d1b48 Compare February 21, 2026 20:36

Merge branch 'main' into benchmark-phase1-refactor

ec77215

github-actions Bot added the run-ci label Feb 21, 2026

hnyls2002 approved these changes Feb 21, 2026

View reviewed changes

hnyls2002 merged commit f158869 into sgl-project:main Feb 21, 2026
210 of 227 checks passed

Ratish1 deleted the benchmark-phase1-refactor branch February 22, 2026 05:15

sufeng-buaa mentioned this pull request Feb 25, 2026

[Bug] can not import 'sglang.benchmark.datasets' when sglang is installed from source #19330

Closed

5 tasks

magicYang1573 pushed a commit to magicYang1573/sglang that referenced this pull request Mar 9, 2026

[Refactor] Benchmark Phase 1: extract utils and datasets from bench_s…

08b569c

…erving (sgl-project#19077) Co-authored-by: Xuchun Shang <107600043+xucsh@users.noreply.github.com>

Wangzheee pushed a commit to Wangzheee/sglang that referenced this pull request Mar 21, 2026

[Refactor] Benchmark Phase 1: extract utils and datasets from bench_s…

f2b1850

…erving (sgl-project#19077) Co-authored-by: Xuchun Shang <107600043+xucsh@users.noreply.github.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Refactor] Benchmark Phase 1: extract utils and datasets from bench_serving#19077

[Refactor] Benchmark Phase 1: extract utils and datasets from bench_serving#19077
hnyls2002 merged 6 commits intosgl-project:mainfrom
Ratish1:benchmark-phase1-refactor

Ratish1 commented Feb 20, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot commented Feb 20, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

This comment was marked as outdated.

Uh oh!

Uh oh!

Uh oh!

hnyls2002 commented Feb 20, 2026 •

edited

Loading

Uh oh!

hnyls2002 commented Feb 21, 2026

Uh oh!

hnyls2002 commented Feb 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Ratish1 commented Feb 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

Review Process

Uh oh!

gemini-code-assist Bot commented Feb 20, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

This comment was marked as outdated.

Uh oh!

Uh oh!

Uh oh!

hnyls2002 commented Feb 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hnyls2002 commented Feb 21, 2026

Uh oh!

hnyls2002 commented Feb 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Ratish1 commented Feb 20, 2026 •

edited

Loading

hnyls2002 commented Feb 20, 2026 •

edited

Loading