fix: normalize rollout timestamps before deriving started_at/ended_at#556
Conversation
Claude Code, Codex, and ATIF ingesters called min()/max() on raw timestamp strings. That assumes lexicographic order matches chronological order, which breaks for ISO 8601 timestamps with mixed UTC offsets or precisions (e.g. 2025-01-01T00:30:00+01:00 is earlier than 2025-01-01T00:00:00Z but sorts later as a string). Introduce min_max_timestamps() in the shared rollout utils that parses each value as ISO 8601 (naive values treated as UTC, unparseable values skipped) and picks the chronological extremes, returning them in their original string form. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Eric W. Tramel <eric.tramel@gmail.com>
Code Review: PR #556 — fix: normalize rollout timestamps before deriving started_at/ended_atSummaryThis PR fixes a correctness bug in the agent rollout ingesters (Claude Code, Codex, ATIF) where The fix introduces two helpers in Files changed: 5 (3 ingesters, 1 utility module, 1 new test file) FindingsCorrectness
Design
Style & Conventions
Tests
Potential Concerns (non-blocking)
VerdictApprove. This is a clean, well-scoped bug fix with good test coverage. The implementation is correct, follows project conventions, and doesn't introduce unnecessary complexity. No blocking issues found. |
Greptile SummaryFixes incorrect
|
| Filename | Overview |
|---|---|
| packages/data-designer-engine/src/data_designer/engine/resources/agent_rollout/utils.py | Adds parse_iso8601() and min_max_timestamps() helpers; naive timestamps are coerced to UTC before comparison, and winners are returned as their original strings. |
| packages/data-designer-engine/src/data_designer/engine/resources/agent_rollout/atif.py | Replaces raw min/max string comparisons with min_max_timestamps(); straightforward, correct swap with no behavioral surprises. |
| packages/data-designer-engine/src/data_designer/engine/resources/agent_rollout/claude_code.py | Same min/max → min_max_timestamps() swap as atif.py; clean and correct. |
| packages/data-designer-engine/src/data_designer/engine/resources/agent_rollout/codex.py | Preserves the session_meta["timestamp"] short-circuit for started_at while adopting min_max_timestamps(); semantics are unchanged relative to the old code. |
| packages/data-designer-engine/tests/engine/resources/agent_rollout/test_utils.py | New parameterized test covering empty input, mixed offsets, mixed precisions, naive-vs-aware, and unparseable skipping — good coverage of the interesting edge cases. |
Flowchart
%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[timestamps: list of str] --> B{any items?}
B -- No --> C[return None, None]
B -- Yes --> D[for each timestamp string]
D --> E[parse_iso8601\nreplace Z → +00:00\nfromisoformat]
E -- ValueError --> F[skip entry]
E -- Success --> G{tzinfo is None?}
G -- Yes --> H[replace tzinfo=UTC]
G -- No --> I[keep as-is]
H --> J[append datetime, original_str]
I --> J
F --> D
J --> K{all skipped?}
K -- Yes --> C
K -- No --> L[min by datetime → earliest string\nmax by datetime → latest string]
L --> M[return earliest, latest]
Reviews (2): Last reviewed commit: "Merge branch 'main' into worktree-issue-..." | Re-trigger Greptile
looks like this file has the same |
nabinchha
left a comment
There was a problem hiding this comment.
Thanks for putting this together, @eric-tramel — clean fix for a subtle but real correctness issue.
Summary
This PR introduces proper ISO 8601 parsing for rollout timestamp comparisons across the Claude Code, Codex, and ATIF ingesters. The previous min()/max() on raw strings would silently produce incorrect started_at/ended_at values when timestamps had different UTC offsets or sub-second precisions. The new min_max_timestamps() helper parses before comparing, returns original strings to preserve the output contract, and gracefully skips unparseable values.
Findings
Suggestions — Take it or leave it
packages/data-designer-engine/src/data_designer/engine/resources/agent_rollout/utils.py:125-134 — Consider a single-pass min/max
- What:
min_max_timestampsiteratesparsedtwice (once formin, once formax). For typical rollout sizes this is negligible, but a single loop would be marginally cleaner. - Why: Minor efficiency and arguably more readable for a utility that explicitly computes both extremes.
- Suggestion: Optional — a single-pass reduction:
from functools import reduce
def _min_max_reduce(acc, pair):
mn, mx = acc
return (pair if pair[0] < mn[0] else mn, pair if pair[0] > mx[0] else mx)
# ...
first = parsed[0]
earliest, latest = reduce(_min_max_reduce, parsed[1:], (first, first))
return earliest[1], latest[1]Totally fine as-is for the data sizes involved — just a thought.
packages/data-designer-engine/tests/engine/resources/agent_rollout/test_utils.py — Could add a single-element test case
- What: There's no explicit test for a single valid timestamp (e.g.,
["2025-01-01T00:00:00Z"]→ same value for both earliest and latest). - Why: The code handles it correctly by construction (
minandmaxof a 1-element list), but an explicit case documents that contract. - Suggestion: Add a
pytest.param(["2025-01-01T00:00:00Z"], ("2025-01-01T00:00:00Z", "2025-01-01T00:00:00Z"), id="single-element")case.
What Looks Good
- Correct abstraction level: The helper lives in the shared
utils.pywhere all three ingesters already import from, and the function signature (list[str] → tuple[str | None, str | None]) is minimal and obvious. - Output contract preserved: Returning the original string form is the right call — downstream consumers don't need to re-serialize, and existing tests or integrations won't notice the change.
- Excellent test coverage: The parametrized test hits exactly the interesting cases (mixed offsets, mixed precision, naive-vs-aware, unparseable skipping, all-unparseable). The test IDs are descriptive and make failures immediately readable.
- Clean integration: The changes to the three ingesters are minimal and mechanical — import the helper, call it, use the result. The Codex short-circuit for
session_meta["timestamp"]is correctly preserved.
Verdict
Ship it (with nits) — Only minor suggestions above. The fix is correct, well-tested, and cleanly integrated. Ready to merge.
This review was generated by an AI assistant.
📋 Summary
Parse ISO 8601 timestamps before comparing them in the Claude Code, Codex, and ATIF rollout ingesters. Raw string
min()/max()can order chronologically-earlier timestamps as later when offsets or precisions differ (e.g.2025-01-01T00:30:00+01:00vs2025-01-01T00:00:00Z), which could produce incorrectstarted_at/ended_aton otherwise valid rollouts.🔗 Related Issue
Fixes #497
🔄 Changes
min_max_timestamps()+parse_iso8601()helpers inagent_rollout/utils.py. Naive timestamps are treated as UTC; unparseable values are silently skipped (matching the existing tolerance ofcoerce_optional_str). Winners are returned in their original string form, so the output contract is unchanged.min(timestamps)/max(timestamps)for the helper inclaude_code.py,codex.py, andatif.py. Codex'ssession_meta["timestamp"]short-circuit forstarted_atis preserved.🧪 Testing
make testpasses (engine suite, 1867 tests; pre-existing interface failures unrelated to this change)✅ Checklist