Skip to content

[Refactor] Benchmark Phase 1: extract utils and datasets from bench_serving#19077

Merged
hnyls2002 merged 6 commits intosgl-project:mainfrom
Ratish1:benchmark-phase1-refactor
Feb 21, 2026
Merged

[Refactor] Benchmark Phase 1: extract utils and datasets from bench_serving#19077
hnyls2002 merged 6 commits intosgl-project:mainfrom
Ratish1:benchmark-phase1-refactor

Conversation

@Ratish1
Copy link
Copy Markdown
Collaborator

@Ratish1 Ratish1 commented Feb 20, 2026

Motivation

This PR implements Phase 1 of issue #10177 by extracting benchmark utility and dataset logic from `python/sglang/bench_serving.py `into a new package structure under `python/sglang/benchmark/`.

Modifications

  • Added python/sglang/benchmark/utils.py for benchmark utility helpers.
  • Added python/sglang/benchmark/datasets/ with dataset-specific modules:
    • common.py
    • sharegpt.py
    • random.py
    • custom.py
    • openai_dataset.py
    • image.py
    • mmmu.py
    • mooncake.py
    • generated_shared_prefix.py
    • __init__.py with loader mapping and get_dataset(...).
  • Refactored python/sglang/bench_serving.py to import/re-export moved variables and remove moved implementations.
  • Tested python3 -m sglang.bench_serving --help and python3 -m sglang.bench_offline_throughput --help

Accuracy Tests

Benchmarking and Profiling

Checklist

Review Process

  1. Ping Merge Oncalls to start the PR flow. See the PR Merge Process.
  2. Get approvals from CODEOWNERS and other reviewers.
  3. Trigger CI tests with comments or contact authorized users to do so.
    • /tag-run-ci-label, /rerun-failed-ci, /tag-and-rerun-ci
  4. After green CI and required approvals, ask Merge Oncalls to merge.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello @Ratish1, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request undertakes a significant refactoring effort for the benchmarking infrastructure. The primary goal is to improve the organization and maintainability of the codebase by separating concerns. Core utility functions and all dataset-specific logic, previously intertwined within bench_serving.py, have been moved into a new, well-structured benchmark package. This change makes the bench_serving.py file much cleaner and easier to understand, while also providing a clearer structure for extending benchmark capabilities in the future.

Highlights

  • Benchmark Utilities Extraction: Extracted general benchmark utility functions from bench_serving.py into a new dedicated python/sglang/benchmark/utils.py module.
  • Dataset Logic Modularization: Moved all dataset-related logic and sampling functions from bench_serving.py into a new python/sglang/benchmark/datasets/ package, enhancing modularity and organization.
  • Centralized Dataset Loading: Introduced a get_dataset function within the new datasets package, which acts as a centralized loader for various benchmark datasets, simplifying dataset management in bench_serving.py.
  • Codebase Cleanup: Significantly reduced the size and complexity of bench_serving.py by removing numerous functions and imports that are now handled by the new benchmark package.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • python/sglang/bench_serving.py
    • Removed various utility functions including remove_prefix, remove_suffix, parse_custom_headers, get_model, get_tokenizer, get_processor, get_dataset, download_and_cache_hf_file, download_and_cache_file, is_file_valid_json, get_mooncake_request_over_time, sample_mmmu_requests, sample_sharegpt_requests, sample_openai_requests, sample_custom_requests, compute_random_lens, sample_random_requests, parse_image_resolution, create_mm_data_row, sample_image_requests, get_available_tokens, gen_prompt, gen_mm_prompt, get_gen_prefix_cache_path, sample_generated_shared_prefix_requests, and set_ulimit.
    • Removed imports for io, pickle, resource, functools.lru_cache, json.JSONDecodeError, pybase64, datasets.load_dataset, PIL.Image, transformers.AutoProcessor, transformers.PreTrainedTokenizer, transformers.PreTrainedTokenizerFast.
    • Added imports for DatasetRow, get_dataset, get_mooncake_request_over_time from sglang.benchmark.datasets.
    • Added imports for get_tokenizer, parse_custom_headers, remove_prefix, set_ulimit from sglang.benchmark.utils.
  • python/sglang/benchmark/init.py
    • Added a new __init__.py file to define the benchmark package.
    • Exported various utility functions and the datasets subpackage for external access.
  • python/sglang/benchmark/datasets/init.py
    • Added a new __init__.py file to define the datasets subpackage.
    • Defined ShareGPTDatasetLoader, RandomDatasetLoader, ImageDatasetLoader, GeneratedSharedPrefixDatasetLoader, MMMUDatasetLoader, MooncakeDatasetLoader, CustomDatasetLoader, and OpenAIDatasetLoader classes to encapsulate dataset loading logic.
    • Created a DATASET_MAPPING dictionary to map dataset names to their respective loaders.
    • Implemented get_dataset_loader and get_dataset functions for centralized dataset retrieval.
    • Exported all relevant dataset-related functions and classes.
  • python/sglang/benchmark/datasets/common.py
    • Added a new file to define common structures and constants for datasets.
    • Defined ASSISTANT_SUFFIX, SHAREGPT_REPO_ID, SHAREGPT_FILENAME, and MOONCAKE_DATASET_URL constants.
    • Introduced DatasetRow dataclass to standardize request data representation.
    • Defined BaseDatasetLoader abstract base class for consistent dataset loading interfaces.
  • python/sglang/benchmark/datasets/custom.py
    • Added a new file containing the sample_custom_requests function for loading custom JSONL datasets.
  • python/sglang/benchmark/datasets/generated_shared_prefix.py
    • Added a new file containing functions for generating shared prefix requests, including get_available_tokens, gen_prompt, gen_mm_prompt, get_gen_prefix_cache_path, and sample_generated_shared_prefix_requests.
  • python/sglang/benchmark/datasets/image.py
    • Added a new file containing functions for handling image-related dataset sampling, including parse_image_resolution, create_mm_data_row, and sample_image_requests.
  • python/sglang/benchmark/datasets/mmmu.py
    • Added a new file containing the sample_mmmu_requests function for sampling from the MMMU dataset.
  • python/sglang/benchmark/datasets/mooncake.py
    • Added a new file containing load_mooncake_requests and get_mooncake_request_over_time functions for Mooncake dataset handling.
  • python/sglang/benchmark/datasets/openai_dataset.py
    • Added a new file containing the sample_openai_requests function for loading OpenAI-compatible chat completion requests.
  • python/sglang/benchmark/datasets/random.py
    • Added a new file containing compute_random_lens and sample_random_requests functions for generating random requests.
  • python/sglang/benchmark/datasets/sharegpt.py
    • Added a new file containing the sample_sharegpt_requests function for sampling from the ShareGPT dataset.
  • python/sglang/benchmark/utils.py
    • Added a new file to house general utility functions for benchmarking.
    • Included remove_prefix, remove_suffix, parse_custom_headers, get_model, get_tokenizer, get_processor, download_and_cache_hf_file, download_and_cache_file, is_file_valid_json, and set_ulimit.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the benchmark serving logic by extracting utilities and dataset loaders into a dedicated python/sglang/benchmark/ package. This is a positive step towards better code organization and modularity. My feedback focuses on improving type hint accuracy and removing minor redundancies in the newly added modules to ensure better maintainability and compatibility with static analysis tools.

Comment thread python/sglang/benchmark/datasets/common.py Outdated
Comment thread python/sglang/benchmark/datasets/image.py Outdated
Comment thread python/sglang/benchmark/datasets/image.py
@Ratish1 Ratish1 marked this pull request as ready for review February 20, 2026 17:46
Comment thread python/sglang/benchmark/datasets/sharegpt.py Outdated
hnyls2002

This comment was marked as outdated.

Comment thread python/sglang/benchmark/datasets/__init__.py Outdated
Comment thread python/sglang/benchmark/utils.py Outdated
@hnyls2002
Copy link
Copy Markdown
Collaborator

hnyls2002 commented Feb 20, 2026

The first step should only contain splitting the old bench_serving.py into different files based on the dataset. We can discuss the following designs later.

@Ratish1

Co-authored-by: Xuchun Shang <107600043+xucsh@users.noreply.github.com>
@hnyls2002 hnyls2002 force-pushed the benchmark-phase1-refactor branch from 00f0c5d to a2d1b48 Compare February 21, 2026 20:36
@hnyls2002
Copy link
Copy Markdown
Collaborator

/tag-and-rerun-ci

@hnyls2002
Copy link
Copy Markdown
Collaborator

Merge this PR as all stage-a and stage-b tests passed. (unrelated piecewise cudagraph test failure)

@hnyls2002 hnyls2002 merged commit f158869 into sgl-project:main Feb 21, 2026
210 of 227 checks passed
@Ratish1 Ratish1 deleted the benchmark-phase1-refactor branch February 22, 2026 05:15
magicYang1573 pushed a commit to magicYang1573/sglang that referenced this pull request Mar 9, 2026
…erving (sgl-project#19077)

Co-authored-by: Xuchun Shang <107600043+xucsh@users.noreply.github.com>
Wangzheee pushed a commit to Wangzheee/sglang that referenced this pull request Mar 21, 2026
…erving (sgl-project#19077)

Co-authored-by: Xuchun Shang <107600043+xucsh@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants