Add per-target timing to benchmark runner#55
Conversation
Log elapsed time for each target, show durations in the summary table, write timing.json alongside results for programmatic analysis. Closes #51 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughAdded per-target and total timing to Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested labels
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #55 +/- ##
=======================================
Coverage 83.06% 83.06%
=======================================
Files 10 10
Lines 1346 1346
=======================================
Hits 1118 1118
Misses 228 228
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@scripts/run_full_benchmark.py`:
- Around line 272-308: The timing.json currently sets "started_at" after all
targets run; use the earlier run_start to record the actual run start time:
convert run_start (captured before the loop) into an ISO 8601 UTC timestamp and
assign that to timing_data["started_at"] instead of datetime.now(...), keeping
total_seconds computed from total_elapsed; update the timing_data construction
(symbol: timing_data) to reference the start time derived from run_start and
ensure timezone is UTC and format matches .isoformat().
- Around line 156-168: Add explicit wall-clock start and finish timestamps
around the subprocess run so each target logs when it began and when it
completed/timeout; before setting t0 capture start_ts = datetime.now() and print
a "Started at {start_ts.isoformat()}" message (include cmd), and on both the
TimeoutExpired branch and the normal completion branch capture finish_ts =
datetime.now() and print "Finished at {finish_ts.isoformat()}" (alongside the
existing duration and cmd). Update the prints around variables cmd, t0, elapsed
and reuse _format_duration for duration output so both start and finish
timestamps (and durations) are logged for the function that runs the subprocess.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 71f9a0d8-1daf-4074-a3ea-a76d1d52f4f8
📒 Files selected for processing (1)
scripts/run_full_benchmark.py
Record the actual run start time (not post-run time) in timing.json. Log wall-clock start/finish timestamps for each target so long runs (e.g. 48min Kimi targets) show when they began and ended. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (1)
scripts/run_full_benchmark.py (1)
300-316: 🧹 Nitpick | 🔵 TrivialConsider adding
finished_attotiming.json.The structure meets the PR objectives, and
started_atis now captured correctly. However, downstream consumers might benefit from an explicitfinished_attimestamp rather than having to compute it fromstarted_at + total_seconds.💡 Optional enhancement
timing_data = { "model": model, "started_at": run_start_ts.isoformat(), + "finished_at": datetime.now(timezone.utc).isoformat(), "total_seconds": round(total_elapsed, 1), "targets": {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/run_full_benchmark.py` around lines 300 - 316, The timing.json lacks an explicit finished_at; compute the end timestamp (either capture run_end_ts when the run completes or derive run_end_ts = run_start_ts + timedelta(seconds=total_elapsed)) and add "finished_at": run_end_ts.isoformat() into the timing_data dict alongside "started_at" and "total_seconds" (update references to run_start_ts, total_elapsed, and timing_data/timing_path accordingly so timing_path.write_text writes the new field).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@scripts/run_full_benchmark.py`:
- Line 177: The main() function lacks a return type annotation; update its
signature to declare it returns an int (e.g., change def main(): to def main()
-> int:) and ensure all return paths within main (the places returning 0 or 1)
remain consistent with the int type annotation so type checkers and readers see
the intended return type.
- Around line 147-174: In _run, the timeout branch currently prints "Finished:"
which can be misleading; update the subprocess.TimeoutExpired except block to
print a clearer label (e.g., "Timed out at:" or "Timeout at:") instead of
"Finished:" so the console output matches the earlier "TIMEOUT" message — adjust
the print that uses finish_ts.strftime('%H:%M:%S UTC') in the exception handler
within _run to use the new label.
---
Duplicate comments:
In `@scripts/run_full_benchmark.py`:
- Around line 300-316: The timing.json lacks an explicit finished_at; compute
the end timestamp (either capture run_end_ts when the run completes or derive
run_end_ts = run_start_ts + timedelta(seconds=total_elapsed)) and add
"finished_at": run_end_ts.isoformat() into the timing_data dict alongside
"started_at" and "total_seconds" (update references to run_start_ts,
total_elapsed, and timing_data/timing_path accordingly so timing_path.write_text
writes the new field).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 673e3cdb-94ef-4a5d-8cf7-7d8df4d59999
📒 Files selected for processing (1)
scripts/run_full_benchmark.py
…json Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Summary
results/timing.jsonwith model, timestamps, and per-target seconds for programmatic analysisCloses #51
Example summary output
Test plan
ruff check .passes🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Improvements