Upload nightly test metrics to GH artifacts by Kangyan-Zhou · Pull Request #17696 · sgl-project/sglang

Kangyan-Zhou · 2026-01-25T07:09:04Z

Motivation

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.
Follow the SGLang code style guidance.

Review Process

Ping Merge Oncalls to start the PR flow. See the PR Merge Process.
Get approvals from CODEOWNERS and other reviewers.
Trigger CI tests with comments or contact authorized users to do so.
- /tag-run-ci-label, /rerun-failed-ci, /tag-and-rerun-ci
After green CI and required approvals, ask Merge Oncalls to merge.

gemini-code-assist · 2026-01-25T07:09:18Z

Summary of Changes

Hello @Kangyan-Zhou, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request enhances the continuous integration pipeline by introducing a robust system for collecting, processing, and consolidating performance metrics from nightly benchmarks. The new scripts ensure that detailed performance data, including latency and throughput, is systematically gathered from various benchmark runs and then merged into a single artifact. This streamlined approach will provide better insights into performance trends and regressions over time, making it easier to monitor the impact of code changes.

Highlights

New Metric Collection Script: Introduced scripts/ci/save_metrics.py to collect and format performance metrics from nightly benchmark results, transforming raw benchmark data into a standardized schema.
New Metric Consolidation Script: Added scripts/ci/merge_metrics.py to consolidate per-partition metric JSON files into a single, comprehensive JSON file, including run-level metadata.
Automated Metrics Upload: These new scripts facilitate the automated upload of nightly test metrics as GitHub artifacts, improving performance tracking and visibility.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces two new scripts for collecting and merging test metrics, which is a great addition for CI monitoring. The code is well-structured and includes good error handling. I've provided a few suggestions to improve maintainability by reducing code duplication and simplifying some logic. Overall, this is a solid contribution.

gemini-code-assist · 2026-01-25T07:11:29Z

+def find_partition_files(input_dir: str) -> list[str]:
+    """Find all partition metric files in the input directory."""
+    patterns = [
+        os.path.join(input_dir, "metrics-*.json"),
+        os.path.join(input_dir, "**/metrics-*.json"),
+    ]
+    files = []
+    for pattern in patterns:
+        files.extend(glob.glob(pattern, recursive=True))
+    return list(set(files))


The logic for finding partition files can be simplified. The **/metrics-*.json pattern with recursive=True is sufficient to find files in both the input directory and its subdirectories, making the other pattern and the subsequent deduplication with set() unnecessary.

def find_partition_files(input_dir: str) -> list[str]: """Find all partition metric files in the input directory.""" pattern = os.path.join(input_dir, "**/metrics-*.json") return glob.glob(pattern, recursive=True)

gemini-code-assist · 2026-01-25T07:11:29Z

+def merge_metrics(
+    input_dir: str,
+    output_file: str,
+    run_id: str,
+    commit_sha: str,
+    branch: str | None = None,
+) -> bool:
+    """Merge all partition metrics into a consolidated file."""
+    # Find all partition files
+    partition_files = find_partition_files(input_dir)
+    print(f"Found {len(partition_files)} partition file(s)")
+
+    if not partition_files:
+        print("No partition metrics files found")
+        # Create an empty consolidated file
+        consolidated = {
+            "run_id": run_id,
+            "run_date": datetime.now(timezone.utc).isoformat(),
+            "commit_sha": commit_sha,
+            "branch": branch,
+            "results": [],
+        }
+        try:
+            os.makedirs(os.path.dirname(output_file) or ".", exist_ok=True)
+            with open(output_file, "w", encoding="utf-8") as f:
+                json.dump(consolidated, f, indent=2)
+            print(f"Created empty consolidated file: {output_file}")
+            return True
+        except OSError as e:
+            print(f"Error writing empty consolidated file: {e}")
+            return False
+
+    # Load all partition files
+    all_results = []
+    for filepath in sorted(partition_files):
+        print(f"  Reading: {filepath}")
+        metrics = load_partition_metrics(filepath)
+        if metrics and "results" in metrics:
+            all_results.extend(metrics["results"])
+
+    print(f"Total results collected: {len(all_results)}")
+
+    # Create consolidated structure
+    consolidated = {
+        "run_id": run_id,
+        "run_date": datetime.now(timezone.utc).isoformat(),
+        "commit_sha": commit_sha,
+        "branch": branch,
+        "results": all_results,
+    }
+
+    # Ensure output directory exists and write output
+    try:
+        output_dir = os.path.dirname(output_file)
+        if output_dir:
+            os.makedirs(output_dir, exist_ok=True)
+        with open(output_file, "w", encoding="utf-8") as f:
+            json.dump(consolidated, f, indent=2)
+        print(f"Saved consolidated metrics to: {output_file}")
+        return True
+    except OSError as e:
+        print(f"Error writing consolidated file: {e}")
+        return False


This function can be refactored to improve clarity and reduce code duplication. Specifically:

The timestamp datetime.now() is fetched in two separate branches. It's better to fetch it once at the beginning for consistency.

The logic for creating the output directory and writing the JSON file is duplicated. This can be consolidated into a single block at the end.

Here's a suggested refactoring that addresses these points:

def merge_metrics( input_dir: str, output_file: str, run_id: str, commit_sha: str, branch: str | None = None, ) -> bool: """Merge all partition metrics into a consolidated file.""" run_date = datetime.now(timezone.utc).isoformat() # Find all partition files partition_files = find_partition_files(input_dir) print(f"Found {len(partition_files)} partition file(s)") all_results = [] if not partition_files: print("No partition metrics files found") else: # Load all partition files for filepath in sorted(partition_files): print(f" Reading: {filepath}") metrics = load_partition_metrics(filepath) if metrics and "results" in metrics: all_results.extend(metrics["results"]) print(f"Total results collected: {len(all_results)}") # Create consolidated structure consolidated = { "run_id": run_id, "run_date": run_date, "commit_sha": commit_sha, "branch": branch, "results": all_results, } # Ensure output directory exists and write output try: os.makedirs(os.path.dirname(output_file) or ".", exist_ok=True) with open(output_file, "w", encoding="utf-8") as f: json.dump(consolidated, f, indent=2) if not partition_files: print(f"Created empty consolidated file: {output_file}") else: print(f"Saved consolidated metrics to: {output_file}") return True except OSError as e: print(f"Error writing consolidated file: {e}") return False

gemini-code-assist · 2026-01-25T07:11:29Z

+def find_result_files(search_dirs: list[str]) -> list[str]:
+    """Find all results_*.json files in the given directories."""
+    result_files = []
+    for search_dir in search_dirs:
+        if os.path.exists(search_dir):
+            pattern = os.path.join(search_dir, "results_*.json")
+            files = glob.glob(pattern)
+            result_files.extend(files)
+
+            # Also search in subdirectories (profile directories)
+            subdir_pattern = os.path.join(search_dir, "*", "results_*.json")
+            result_files.extend(glob.glob(subdir_pattern))
+
+    # Also search in current directory and test directory
+    for pattern in ["results_*.json", "test/results_*.json"]:
+        result_files.extend(glob.glob(pattern))
+
+    return list(set(result_files))  # Remove duplicates


This function can be simplified and made more robust. Using glob with a **/results_*.json pattern and recursive=True will handle searching in subdirectories more cleanly. The hardcoded paths at the end are redundant given the default search_dirs in main. Using a set to collect files will also prevent duplicates from overlapping search paths.

def find_result_files(search_dirs: list[str]) -> list[str]: """Find all results_*.json files in the given directories.""" result_files = set() for search_dir in search_dirs: if os.path.exists(search_dir): pattern = os.path.join(search_dir, "**/results_*.json") result_files.update(glob.glob(pattern, recursive=True)) return list(result_files)

gemini-code-assist · 2026-01-25T07:11:29Z

+        "latency_ms": (latency or 0) * 1000 if latency is not None else None,
+        "input_throughput": result.get("input_throughput"),
+        "output_throughput": result.get("output_throughput"),
+        "overall_throughput": result.get("overall_throughput"),
+        "ttft_ms": (last_ttft or 0) * 1000 if last_ttft is not None else None,


The expressions for calculating latency_ms and ttft_ms can be simplified. The or 0 is redundant because of the if ... is not None check.

"latency_ms": latency * 1000 if latency is not None else None, "input_throughput": result.get("input_throughput"), "output_throughput": result.get("output_throughput"), "overall_throughput": result.get("overall_throughput"), "ttft_ms": last_ttft * 1000 if last_ttft is not None else None,

gemini-code-assist · 2026-01-25T07:11:29Z

+def save_metrics(
+    gpu_config: str,
+    partition: int,
+    run_id: str,
+    output_file: str,
+    search_dirs: list[str],
+) -> bool:
+    """Collect metrics and save to output file."""
+    # Find all result files
+    result_files = find_result_files(search_dirs)
+    print(f"Found {len(result_files)} result file(s)")
+
+    if not result_files:
+        print("No benchmark result files found")
+        # Create an empty metrics file to indicate the job ran but had no results
+        metrics = {
+            "run_id": run_id,
+            "timestamp": datetime.now(timezone.utc).isoformat(),
+            "gpu_config": gpu_config,
+            "partition": partition,
+            "results": [],
+        }
+        try:
+            os.makedirs(os.path.dirname(output_file) or ".", exist_ok=True)
+            with open(output_file, "w", encoding="utf-8") as f:
+                json.dump(metrics, f, indent=2)
+            print(f"Created empty metrics file: {output_file}")
+            return True
+        except OSError as e:
+            print(f"Error writing empty metrics file: {e}")
+            return False
+
+    # Parse all result files
+    all_results = []
+    for filepath in result_files:
+        print(f"  Reading: {filepath}")
+        results = parse_result_file(filepath)
+        all_results.extend(results)
+
+    print(f"Total benchmark results: {len(all_results)}")
+
+    # Group by model/variant
+    grouped = group_results_by_model(all_results, gpu_config, partition)
+
+    # Create metrics structure
+    metrics = {
+        "run_id": run_id,
+        "timestamp": datetime.now(timezone.utc).isoformat(),
+        "gpu_config": gpu_config,
+        "partition": partition,
+        "results": grouped,
+    }
+
+    # Ensure output directory exists and write output
+    try:
+        os.makedirs(os.path.dirname(output_file) or ".", exist_ok=True)
+        with open(output_file, "w", encoding="utf-8") as f:
+            json.dump(metrics, f, indent=2)
+        print(f"Saved metrics to: {output_file}")
+        return True
+    except OSError as e:
+        print(f"Error writing metrics file: {e}")
+        return False


This function can be refactored to improve clarity and reduce code duplication, similar to merge_metrics.py. Specifically:

The timestamp datetime.now() is fetched in two separate branches. It's better to fetch it once at the beginning for consistency.

The logic for creating the output directory and writing the JSON file is duplicated. This can be consolidated into a single block at the end.

Here's a suggested refactoring:

def save_metrics( gpu_config: str, partition: int, run_id: str, output_file: str, search_dirs: list[str], ) -> bool: """Collect metrics and save to output file.""" timestamp = datetime.now(timezone.utc).isoformat() # Find all result files result_files = find_result_files(search_dirs) print(f"Found {len(result_files)} result file(s)") grouped = [] if not result_files: print("No benchmark result files found") else: # Parse all result files all_results = [] for filepath in sorted(result_files): print(f" Reading: {filepath}") results = parse_result_file(filepath) all_results.extend(results) print(f"Total benchmark results: {len(all_results)}") # Group by model/variant grouped = group_results_by_model(all_results, gpu_config, partition) # Create metrics structure metrics = { "run_id": run_id, "timestamp": timestamp, "gpu_config": gpu_config, "partition": partition, "results": grouped, } # Ensure output directory exists and write output try: os.makedirs(os.path.dirname(output_file) or ".", exist_ok=True) with open(output_file, "w", encoding="utf-8") as f: json.dump(metrics, f, indent=2) if not result_files: print(f"Created empty metrics file: {output_file}") else: print(f"Saved metrics to: {output_file}") return True except OSError as e: print(f"Error writing metrics file: {e}") return False

gemini-code-assist Bot reviewed Jan 25, 2026

View reviewed changes

Upload nightly test metrics to GH artifacts

419d1bb

Kangyan-Zhou force-pushed the nightly_test_artifacts branch from 52501bc to 419d1bb Compare January 25, 2026 07:45

Kangyan-Zhou requested review from Fridge003, ispobock and merrymercy as code owners January 25, 2026 07:45

fix variant and add server args

5cdb391

Kangyan-Zhou merged commit 344eeae into main Jan 25, 2026
79 of 86 checks passed

Kangyan-Zhou deleted the nightly_test_artifacts branch January 25, 2026 22:35

Chen-0210 pushed a commit to Chen-0210/sglang that referenced this pull request Jan 30, 2026

Upload nightly test metrics to GH artifacts (sgl-project#17696)

456557b

Johnsonms pushed a commit to Johnsonms/sglang that referenced this pull request Feb 14, 2026

Upload nightly test metrics to GH artifacts (sgl-project#17696)

40142c0

hnyls2002 mentioned this pull request Mar 23, 2026

[CI Infrastructure] Roadmap: Regression-Based CI Checks #21157

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Upload nightly test metrics to GH artifacts#17696

Upload nightly test metrics to GH artifacts#17696
Kangyan-Zhou merged 2 commits intomainfrom
nightly_test_artifacts

Kangyan-Zhou commented Jan 25, 2026

Uh oh!

gemini-code-assist Bot commented Jan 25, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Jan 25, 2026

Uh oh!

gemini-code-assist Bot Jan 25, 2026

Uh oh!

gemini-code-assist Bot Jan 25, 2026

Uh oh!

gemini-code-assist Bot Jan 25, 2026

Uh oh!

gemini-code-assist Bot Jan 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Kangyan-Zhou commented Jan 25, 2026

Motivation

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

Review Process

Uh oh!

gemini-code-assist Bot commented Jan 25, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jan 25, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jan 25, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jan 25, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jan 25, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jan 25, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant