Skip to content

Implement simplest dump comparator v2#19274

Merged
fzyzcjy merged 32 commits intosgl-project:mainfrom
fzyzcjy:ac8408/0
Feb 25, 2026
Merged

Implement simplest dump comparator v2#19274
fzyzcjy merged 32 commits intosgl-project:mainfrom
fzyzcjy:ac8408/0

Conversation

@fzyzcjy
Copy link
Copy Markdown
Collaborator

@fzyzcjy fzyzcjy commented Feb 24, 2026

Motivation

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

Review Process

  1. Ping Merge Oncalls to start the PR flow. See the PR Merge Process.
  2. Get approvals from CODEOWNERS and other reviewers.
  3. Trigger CI tests with comments or contact authorized users to do so.
    • /tag-run-ci-label, /rerun-failed-ci, /tag-and-rerun-ci
  4. After green CI and required approvals, ask Merge Oncalls to merge.

…arator

These customization endpoints for manual cross-parallelism comparison
will be replaced by the dims annotation system. Removes:
- TensorDimDesc dataclass and _get_tensor_dim_descs()
- _split_einops_pattern() and _get_einops_dim_index()
- einops import and tensor_dim_desc/baseline_token_slice params
  from check_tensor_pair()
- test_einops_pattern test case
Drop the remaining customization endpoints that will be replaced by
the dims annotation system:
- LocationInfo dataclass and _get_location_info_of_target_pass_id()
- _comparison_preprocessor() no-op hook
- Simplify main() to use only offset-based step mapping
- Clean up unused imports (re, dataclass, typing)
These imports are only used by main(), and moving them avoids
ModuleNotFoundError when tests import helper functions in environments
without polars installed.
The test_main E2E test calls main() which requires polars for metadata
loading. Skip gracefully in environments without polars.
Split the monolithic check_tensor_pair() into structured components:

- Add data types: TensorStats, TensorInfo, DiffInfo, TensorComparisonInfo
- Add compare_tensors(): pure computation, returns TensorComparisonInfo
- Add print_comparison(): all printing in one place
- Extract _compute_tensor_stats(), _compute_diff(), _print_diff()
- Update main() to load tensors itself, call compare_tensors + print_comparison
- Remove check_tensor_pair() and _compute_and_print_diff()
- Remove side-effect print from _try_unify_shape()
- Replace test_compute_and_print_diff with test_compute_diff
  (checks DiffInfo fields instead of dict)
- Add test_compute_tensor_stats
- Add test_compare_tensors (new public API)
Extract utility functions from dump_comparator.py into the new
comparator package, removing the _ prefix:
- argmax_coord, compute_smaller_dtype, try_unify_shape,
  calc_rel_diff, load_object
TensorStats, TensorInfo, DiffInfo, TensorComparisonInfo for
separating computation results from printing logic.
compare_tensors returns TensorComparisonInfo with no side effects.
Internal helpers: _compute_tensor_stats, _compute_diff.
printer.py: print_comparison consumes TensorComparisonInfo,
with helpers _print_stats_comparison and _print_diff.
__init__.py re-exports compare_tensors and print_comparison.
entrypoint.py: main() orchestrates metadata reading, row matching,
tensor loading, and calls compare_tensors + print_comparison.
__main__.py: CLI entry point with argparse.
Move 6 existing tests with updated imports to comparator package.
Add new UTs: test_compute_tensor_stats (incl. large tensor),
test_compute_diff (identical + known offset), test_compare_tensors
(normal, shape mismatch, dtype mismatch), test_print_comparison.
- comparator/test_utils.py: utils function tests
- comparator/tensor_comparison/test_compare.py: compare_tensors,
  _compute_diff, _compute_tensor_stats tests
- comparator/tensor_comparison/test_printer.py: print_comparison tests
- comparator/test_entrypoint.py: E2E test with main()
- entrypoint.py: split into main() (CLI) + run(args) (programmatic),
  _parse_args() holds argument definitions, all imports at top level
- __main__.py: thin wrapper calling main()
- test: call run(args) instead of main(args)
- Plain classes instead of unittest.TestCase/CustomTestCase
- assert + pytest.approx instead of self.assertEqual/assertAlmostEqual
- tmp_path fixture instead of tempfile.TemporaryDirectory
- Global imports at top level
- sys.exit(pytest.main([__file__])) entry point
Construct TensorComparisonInfo with deterministic values and assert
exact output for normal case. Assert key substrings for shape
mismatch, downcast, shape unification, and sample printing.
- test_printer: all 5 tests now use full snapshot assertions (== exact
  string), add test_none_quantiles for skipped quantile branch
- test_entrypoint: add capsys assertions (Check count, Skip, rel_diff),
  add filter/no-baseline/step-range scenarios, fix exp_name bug
- test_compare: add quantile value checks, rel_diff known value,
  shape unification, sample generation/suppression tests
- test_utils: add 1D/2D argmax, same-shape noop, trailing mismatch,
  dict-with-value loading, similar/negated tensor rel_diff, unknown dtype
The step offset logic (baseline_step = step - start_id + baseline_start_id)
was confusing and unused in practice. Now baseline_step simply equals the
target step, which is the common case. A clearer --step-offset can be
added later if needed.
Extract _make_tensor_info in test_printer to eliminate 12 repetitive
TensorInfo constructions, and _make_dumper in test_entrypoint to
centralize DumperConfig boilerplate used in 3 places.
- compare.py: extract _quantile_or_none(), remove functools.partial
  and 4 repeated quantile blocks
- printer.py: derive stat_names from dataclasses.fields(TensorStats)
  instead of hardcoded list
- test_entrypoint.py: add num_steps param to _create_dumps, rewrite
  test_step_range to reuse it
- test_printer.py: add comment explaining intentional snapshot
  string duplication
TensorStats, TensorInfo, DiffInfo, TensorComparisonInfo are now
immutable. Restructure compare_tensors to construct TensorInfo after
sample values are known, eliminating post-construction mutation.
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@fzyzcjy fzyzcjy merged commit d7578ce into sgl-project:main Feb 25, 2026
57 of 65 checks passed
magicYang1573 pushed a commit to magicYang1573/sglang that referenced this pull request Mar 9, 2026
Wangzheee pushed a commit to Wangzheee/sglang that referenced this pull request Mar 21, 2026
JustinTong0323 pushed a commit to JustinTong0323/sglang that referenced this pull request Apr 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant