Skip to content

feat(cargo): aggregate test output into single line#85

Merged
pszymkowiak merged 1 commit intortk-ai:masterfrom
FlorianBruniaux:feat/cargo-test-aggregate
Feb 12, 2026
Merged

feat(cargo): aggregate test output into single line#85
pszymkowiak merged 1 commit intortk-ai:masterfrom
FlorianBruniaux:feat/cargo-test-aggregate

Conversation

@FlorianBruniaux
Copy link
Collaborator

Fixes #83

Problem

cargo test currently shows 24+ summary lines even when all tests pass. For LLM consumption, we only need to know IF something failed, not see 24 identical "ok" lines.

Before (24 lines):

✓ test result: ok. 2 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
✓ test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
... (x24)

After (1 line):

✓ cargo test: 137 passed (24 suites, 1.45s)

Solution

Add AggregatedTestResult struct that:

  • Parses test summary lines with regex (OnceLock for performance)
  • Merges multiple summaries when all tests pass
  • Formats compactly: N passed, M ignored, P filtered out (X suites, Ys)
  • Falls back gracefully if parsing fails
  • Preserves full details when failures occur (no aggregation)

Format examples

Case Output
All pass ✓ cargo test: 268 passed (1 suite, 0.03s)
With ignored ✓ cargo test: 63 passed, 5 ignored (2 suites, 0.70s)
With filtered ✓ cargo test: 0 passed, 268 filtered out (1 suite, 0.00s)
Failures FAILURES (1):\n═══\n[full details preserved]

Implementation details

  • Single file changed: src/cargo_cmd.rs (+278 lines)
  • No new dependencies: Uses existing regex crate
  • Zero-copy optimization: OnceLock for regex compilation
  • Backward compatible: Fallback to original behavior if regex fails

Tests

  • ✅ 6 new tests + 1 modified
  • ✅ All 268 tests pass
  • ✅ Covers: multi-suite, failures, zero tests, ignored/filtered, singular/plural, regex fallback

Edge cases handled

  • --nocapture flag works correctly
  • cargo test specific_test shows filtered count
  • ✅ Doc-tests + unit tests + integration tests all aggregate
  • ✅ Malformed output falls back gracefully

cc @bdarcus - this addresses your request for more compact test output 🎯

Checklist

  • Code formatted (cargo fmt --all)
  • Clippy clean (cargo clippy --all-targets)
  • All tests pass (cargo test)
  • Manual testing on RTK project itself
  • Edge cases verified

Problem: `cargo test` shows 24+ summary lines even when all pass.
An LLM only needs to know IF something failed, not 24x "ok".

Before (24 lines):
```
✓ test result: ok. 2 passed; 0 failed; ...
✓ test result: ok. 0 passed; 0 failed; ...
... (x24)
```

After (1 line):
```
✓ cargo test: 137 passed (24 suites, 1.45s)
```

Changes:
- Add AggregatedTestResult struct with regex parsing
- Merge multiple test summaries when all pass
- Format: "N passed, M ignored, P filtered out (X suites, Ys)"
- Fallback to original behavior if parsing fails
- Failures still show full details (no aggregation)

Tests: 6 new + 1 modified, covering all cases:
- Multi-suite aggregation
- Single suite (singular "suite")
- Zero tests
- With ignored/filtered out
- Failures → no aggregation (detail preserved)
- Regex fallback

Closes rtk-ai#83

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings February 12, 2026 17:07
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements compact aggregation of cargo test output to make it more suitable for LLM consumption. Instead of showing 24+ identical "ok" summary lines when all tests pass, it now shows a single aggregated line. The implementation is clean, well-tested, and includes graceful fallback behavior.

Changes:

  • Added AggregatedTestResult struct to parse and aggregate test summary lines across multiple test suites
  • Modified filter_cargo_test function to use aggregation when all tests pass, preserving detailed output when failures occur
  • Added 6 new comprehensive tests and updated 1 existing test to validate the new behavior

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.


if self.ignored > 0 {
parts.push(format!("{} ignored", self.ignored));
}
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The measured field is parsed and tracked in the struct but never displayed in the compact output format. If benchmark tests are run, the measured count would be silently omitted from the output. Consider adding logic to include measured tests in the output when self.measured > 0, similar to how ignored and filtered_out are handled.

Suggested change
}
}
if self.measured > 0 {
parts.push(format!("{} measured", self.measured));
}

Copilot uses AI. Check for mistakes.
self.filtered_out += other.filtered_out;
self.suites += other.suites;
self.duration_secs += other.duration_secs;
self.has_duration = self.has_duration && other.has_duration;
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The has_duration flag uses AND logic during merge, which means if any single suite lacks duration information, the aggregated result will not display duration even though most suites have it. This could result in losing timing information unnecessarily. Consider using OR logic (self.has_duration || other.has_duration) and only including partial timing information in the output, or tracking which suites have duration separately.

Suggested change
self.has_duration = self.has_duration && other.has_duration;
self.has_duration = self.has_duration || other.has_duration;

Copilot uses AI. Check for mistakes.
Comment on lines +382 to +384
regex::Regex::new(
r"test result: (\w+)\.\s+(\d+) passed;\s+(\d+) failed;\s+(\d+) ignored;\s+(\d+) measured;\s+(\d+) filtered out(?:;\s+finished in ([\d.]+)s)?"
).unwrap()
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The regex crate is used via fully qualified path regex::Regex::new() without an explicit import statement. While this works, it's inconsistent with other files in the codebase (e.g., src/deps.rs, src/filter.rs, src/grep_cmd.rs) which use use regex::Regex;. Consider adding use regex::Regex; at the top of the file for consistency.

Copilot uses AI. Check for mistakes.
@pszymkowiak pszymkowiak merged commit 06b1049 into rtk-ai:master Feb 12, 2026
8 checks passed
ahundt pushed a commit to ahundt/rtk that referenced this pull request Feb 23, 2026
…ai#85)

Problem: `cargo test` shows 24+ summary lines even when all pass.
An LLM only needs to know IF something failed, not 24x "ok".

Before (24 lines):
```
✓ test result: ok. 2 passed; 0 failed; ...
✓ test result: ok. 0 passed; 0 failed; ...
... (x24)
```

After (1 line):
```
✓ cargo test: 137 passed (24 suites, 1.45s)
```

Changes:
- Add AggregatedTestResult struct with regex parsing
- Merge multiple test summaries when all pass
- Format: "N passed, M ignored, P filtered out (X suites, Ys)"
- Fallback to original behavior if parsing fails
- Failures still show full details (no aggregation)

Tests: 6 new + 1 modified, covering all cases:
- Multi-suite aggregation
- Single suite (singular "suite")
- Zero tests
- With ignored/filtered out
- Failures → no aggregation (detail preserved)
- Regex fallback

Closes rtk-ai#83

Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

cargo test

3 participants