ci: adding errors to Github summary#14778
Conversation
Summary of ChangesHello @dougyster, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly improves the debugging experience for Continuous Integration (CI) test failures. It modifies the test execution utility to capture and process standard output and error streams from test runs. By intelligently extracting and formatting critical error information, the changes ensure that detailed failure summaries are automatically posted to the GitHub Actions step summary, allowing developers to quickly identify and address issues without needing to dig through extensive logs. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a valuable feature for CI by adding a summary of test failures to the GitHub step summary, which will greatly aid in debugging. The implementation is mostly solid, capturing stdout/stderr and parsing them for errors. I've identified a potential bug in the error extraction logic that could cause some errors to be missed. I've also included a few suggestions to improve performance and maintainability.
| for j in range(start, min(i + 3, len(lines))): | ||
| if len(error_lines) < max_lines: | ||
| error_lines.append(lines[j]) | ||
| i += 3 |
There was a problem hiding this comment.
Advancing the loop counter by 3 (i += 3) will cause the loop to skip checking lines i+1 and i+2. If there are errors on these consecutive lines, they will be missed. To fix this, you should increment i by 1. This may introduce duplicate lines from overlapping contexts, which should be handled separately (see my other comment).
| i += 3 | |
| i += 1 |
| # Show key error lines | ||
| summary_lines.append("**Key Error Lines:**\n") | ||
| summary_lines.append("```python\n") | ||
| for line in failure.error_lines[:30]: # Limit to first 30 lines |
There was a problem hiding this comment.
The magic number 30 is used here and on line 65 to limit the number of error lines in the summary. It's good practice to define this as a constant (e.g., MAX_SUMMARY_ERROR_LINES = 30) at the top of the function or module. This improves readability and makes it easier to change the value in one place.
| error_patterns = [ | ||
| r"Traceback \(most recent call last\):", | ||
| r"\w*Error:", # Catches AssertionError, RuntimeError, ValueError, etc. | ||
| r"\w*Exception:", | ||
| r"FAILED", | ||
| r"ERROR", | ||
| r"Failed to", | ||
| r"raise \w+Error", | ||
| ] | ||
|
|
||
| # Find lines matching error patterns and collect context | ||
| i = 0 | ||
| while i < len(lines) and len(error_lines) < max_lines: | ||
| line = lines[i] | ||
|
|
||
| # Check if this line matches an error pattern | ||
| if any(re.search(pattern, line, re.IGNORECASE) for pattern in error_patterns): |
There was a problem hiding this comment.
For performance, it's better to compile the regex patterns once at the start of the function, rather than recompiling them for every line in the log inside the any() expression. This avoids significant overhead, especially for large logs. The compiled patterns can then be used directly.
| error_patterns = [ | |
| r"Traceback \(most recent call last\):", | |
| r"\w*Error:", # Catches AssertionError, RuntimeError, ValueError, etc. | |
| r"\w*Exception:", | |
| r"FAILED", | |
| r"ERROR", | |
| r"Failed to", | |
| r"raise \w+Error", | |
| ] | |
| # Find lines matching error patterns and collect context | |
| i = 0 | |
| while i < len(lines) and len(error_lines) < max_lines: | |
| line = lines[i] | |
| # Check if this line matches an error pattern | |
| if any(re.search(pattern, line, re.IGNORECASE) for pattern in error_patterns): | |
| # Pattern to identify error-related lines | |
| error_patterns = [ | |
| re.compile(p, re.IGNORECASE) for p in [ | |
| r"Traceback \(most recent call last\):", | |
| r"\w*Error:", # Catches AssertionError, RuntimeError, ValueError, etc. | |
| r"\w*Exception:", | |
| r"FAILED", | |
| r"ERROR", | |
| r"Failed to", | |
| r"raise \w+Error", | |
| ] | |
| ] | |
| # Find lines matching error patterns and collect context | |
| i = 0 | |
| while i < len(lines) and len(error_lines) < max_lines: | |
| line = lines[i] | |
| # Check if this line matches an error pattern | |
| if any(pattern.search(line) for pattern in error_patterns): |
| else: | ||
| i += 1 | ||
|
|
||
| return error_lines |
There was a problem hiding this comment.
To handle duplicate lines that may arise from overlapping error contexts (especially after fixing the loop increment from i += 3 to i += 1), you can deduplicate the error_lines list while preserving order before returning. A concise way to do this in Python 3.7+ is list(dict.fromkeys(error_lines)).
| return error_lines | |
| return list(dict.fromkeys(error_lines)) |
|
See github summary here: https://github.com/sgl-project/sglang/actions/runs/20090373572 |
42d6e55 to
dce337e
Compare
dce337e to
9316a1a
Compare
|
/tag-and-rerun-ci |
…n_eagle3_npu * 'main' of https://github.com/sgl-project/sglang: (89 commits) [model-gateway] Remove legacy RouterMetrics and Rename SmgMetrics to Metrics and smg_labels to metrics_labels (sgl-project#15160) [diffusion] fix: fix video model sp when resolution is not specified (sgl-project#15047) [diffusion] fix: fix pytorch non-writable array warning (sgl-project#15017) [diffusion] fix: cache dit with parallel (sgl-project#15163) chore: change npu pr-test a2 runner (sgl-project#15152) [Feature] Fuse mrope all in 1 kernel (sgl-project#14906) Fix num running requests (load) wrong cleared for ongoing requests (sgl-project#15116) Fused two elementwise kernels for k_nope and k_pe concat (sgl-project#14862) fix: adding date and fixing release name issue (sgl-project#15174) [CPU] Add Gemma3RMSNorm kernel in sgl-kernel and add ut (sgl-project#9324) feature: PR wheel (sgl-project#15170) [diffusion] model: support mutli-image input and qwen-image-edit-2509 (sgl-project#15005) fix CompressedTensorsW8A8Int8 min_capability (sgl-project#13914) Tiny improve summary text in `bench_one_batch_server.py` (sgl-project#15158) [model-gateway] add mcp and discovery metrics (sgl-project#15156) fix: move ci-bot (sgl-project#15154) Fix import warnings (sgl-project#15144) ci: adding errors to Github summary (sgl-project#14778) [model-gateway] Add streaming metrics for harmony gRPC router (sgl-project#15147) [model-gateway] upgrade axum and axum server (sgl-project#15146) ... # Conflicts: # python/sglang/srt/server_args.py
Motivation
Adding errors to github summary table for nightly accuracy tests.
Modifications
Adding error column and error capture in the try catch block in
test_mmmu_vlm_modelsandtest_mgsm_en_all_models.Accuracy Tests
N/A
Benchmarking and Profiling
N/A
Checklist