Skip to content

fix: normalize rollout timestamps before deriving started_at/ended_at#556

Merged
eric-tramel merged 2 commits into
mainfrom
worktree-issue-497-timestamp-normalization
May 7, 2026
Merged

fix: normalize rollout timestamps before deriving started_at/ended_at#556
eric-tramel merged 2 commits into
mainfrom
worktree-issue-497-timestamp-normalization

Conversation

@eric-tramel

Copy link
Copy Markdown
Contributor

📋 Summary

Parse ISO 8601 timestamps before comparing them in the Claude Code, Codex, and ATIF rollout ingesters. Raw string min()/max() can order chronologically-earlier timestamps as later when offsets or precisions differ (e.g. 2025-01-01T00:30:00+01:00 vs 2025-01-01T00:00:00Z), which could produce incorrect started_at / ended_at on otherwise valid rollouts.

🔗 Related Issue

Fixes #497

🔄 Changes

  • Add min_max_timestamps() + parse_iso8601() helpers in agent_rollout/utils.py. Naive timestamps are treated as UTC; unparseable values are silently skipped (matching the existing tolerance of coerce_optional_str). Winners are returned in their original string form, so the output contract is unchanged.
  • Swap min(timestamps) / max(timestamps) for the helper in claude_code.py, codex.py, and atif.py. Codex's session_meta["timestamp"] short-circuit for started_at is preserved.
  • Add a parameterized unit test covering the four interesting cases: mixed offsets, mixed precisions, naive-vs-aware, and unparseable-skipping.

🧪 Testing

  • make test passes (engine suite, 1867 tests; pre-existing interface failures unrelated to this change)
  • Unit tests added/updated
  • E2E tests added/updated (if applicable)

✅ Checklist

  • Follows commit message conventions
  • Commits are signed off (DCO)
  • Architecture docs updated (if applicable)

Claude Code, Codex, and ATIF ingesters called min()/max() on raw
timestamp strings. That assumes lexicographic order matches chronological
order, which breaks for ISO 8601 timestamps with mixed UTC offsets or
precisions (e.g. 2025-01-01T00:30:00+01:00 is earlier than
2025-01-01T00:00:00Z but sorts later as a string).

Introduce min_max_timestamps() in the shared rollout utils that parses
each value as ISO 8601 (naive values treated as UTC, unparseable values
skipped) and picks the chronological extremes, returning them in their
original string form.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Eric W. Tramel <eric.tramel@gmail.com>
@eric-tramel eric-tramel requested a review from a team as a code owner April 17, 2026 14:13
@github-actions

Copy link
Copy Markdown
Contributor

Code Review: PR #556 — fix: normalize rollout timestamps before deriving started_at/ended_at

Summary

This PR fixes a correctness bug in the agent rollout ingesters (Claude Code, Codex, ATIF) where started_at and ended_at were derived by calling min()/max() on raw ISO 8601 timestamp strings. Lexicographic comparison produces wrong results when timestamps have differing UTC offsets or sub-second precisions — e.g. "2025-01-01T00:30:00+01:00" sorts after "2025-01-01T00:00:00Z" as a string, but is chronologically 30 minutes earlier.

The fix introduces two helpers in utils.pyparse_iso8601() and min_max_timestamps() — that parse timestamps into datetime objects before comparison, then return the winners in their original string form so the output contract is unchanged. All three ingesters are updated to use the new helper. A parameterized test covers the key edge cases.

Files changed: 5 (3 ingesters, 1 utility module, 1 new test file)
Lines: +88 / -7

Findings

Correctness

  • Fix is sound. Parsing to datetime before comparison correctly handles mixed offsets, mixed precisions, and naive-vs-aware timestamps. The Z+00:00 replacement before fromisoformat() is the standard workaround for Python < 3.11 compatibility and is safe here since Z only appears as a trailing timezone designator in ISO 8601.
  • Output contract preserved. min_max_timestamps() returns the original string values, not stringified datetime objects, so downstream consumers are unaffected.
  • Codex short-circuit preserved. In codex.py, the session_meta.get("timestamp") fallback for started_at is correctly maintained (started_at=coerce_optional_str(session_meta.get("timestamp")) or earliest).

Design

  • Centralised helper is the right call. The same parsing logic was duplicated across three files; moving it to utils.py eliminates the duplication cleanly.
  • Silent skip of unparseable values is consistent. The PR description notes this mirrors the existing tolerance of coerce_optional_str. If all timestamps are unparseable, (None, None) is returned, which matches the previous behavior of an empty list producing None. No log warning is emitted for skipped values — this is a minor observation, not a blocker, since the ingestion layer already tolerates missing timestamps.

Style & Conventions

  • from __future__ import annotations present in both new/modified files — compliant.
  • SPDX license header present in the new test file.
  • Absolute imports only — no relative imports introduced.
  • Type annotations on all new functions and variables.
  • Naming: earliest/latest in codex.py avoids shadowing and reads naturally.

Tests

  • Coverage is good. The parameterized test (test_utils.py) covers 6 cases:
    1. Empty list → (None, None)
    2. Mixed offsets where lexicographic ordering disagrees with chronological
    3. Mixed sub-second precision
    4. Naive timestamp treated as UTC compared against aware
    5. Unparseable values skipped, valid ones retained
    6. Only unparseable values → (None, None)
  • No negative-offset test case. A case like "2025-01-01T00:00:00-05:00" vs "2025-01-01T04:00:00Z" (same instant, different representation) would exercise negative offsets, but this is not a gap since fromisoformat handles negative offsets identically to positive ones — nice-to-have at most.

Potential Concerns (non-blocking)

  1. value.replace("Z", "+00:00") — This replaces all occurrences of the character Z in the string, not just a trailing one. In practice, ISO 8601 timestamps won't contain a capital Z anywhere except as the UTC designator at the end, so this is safe. A more defensive approach would be value[:-1] + "+00:00" if value.endswith("Z") else value, but the current approach matches widespread convention and is fine.

  2. No logging on unparseable skip — If a rollout file contains garbled timestamps, the helper silently discards them. For debugging production ingestion issues, a logger.debug on skip could be useful, but this is a future enhancement, not a requirement for this fix.

Verdict

Approve. This is a clean, well-scoped bug fix with good test coverage. The implementation is correct, follows project conventions, and doesn't introduce unnecessary complexity. No blocking issues found.

@greptile-apps

greptile-apps Bot commented Apr 17, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

Fixes incorrect started_at/ended_at derivation caused by lexicographic string comparison of ISO 8601 timestamps with differing UTC offsets or sub-second precisions. The fix introduces parse_iso8601() / min_max_timestamps() helpers in utils.py and swaps the raw min/max calls in all three ingesters.

  • parse_iso8601() normalises the "Z" suffix for Python < 3.11 compatibility, coerces naive datetimes to UTC, and returns None for unparseable values so callers can skip them silently.
  • min_max_timestamps() builds a (datetime, original_str) pair list, runs min/max by the parsed datetime, and returns the original strings unchanged — preserving the output contract.
  • codex.py retains its existing session_meta[\"timestamp\"] priority for started_at, now falling back to earliest from the helper instead of a lexicographic min.

Confidence Score: 5/5

Safe to merge — the change is a targeted, well-tested correctness fix with no breaking output-contract changes.

All three ingester changes are straightforward swaps; the new helper correctly normalises UTC offsets and naive datetimes before comparing, and returns original strings so downstream consumers see no type change. The test suite covers all meaningful edge cases including the lex-vs-chrono mismatch that motivated the fix. No regressions were identified.

No files require special attention.

Important Files Changed

Filename Overview
packages/data-designer-engine/src/data_designer/engine/resources/agent_rollout/utils.py Adds parse_iso8601() and min_max_timestamps() helpers; naive timestamps are coerced to UTC before comparison, and winners are returned as their original strings.
packages/data-designer-engine/src/data_designer/engine/resources/agent_rollout/atif.py Replaces raw min/max string comparisons with min_max_timestamps(); straightforward, correct swap with no behavioral surprises.
packages/data-designer-engine/src/data_designer/engine/resources/agent_rollout/claude_code.py Same min/maxmin_max_timestamps() swap as atif.py; clean and correct.
packages/data-designer-engine/src/data_designer/engine/resources/agent_rollout/codex.py Preserves the session_meta["timestamp"] short-circuit for started_at while adopting min_max_timestamps(); semantics are unchanged relative to the old code.
packages/data-designer-engine/tests/engine/resources/agent_rollout/test_utils.py New parameterized test covering empty input, mixed offsets, mixed precisions, naive-vs-aware, and unparseable skipping — good coverage of the interesting edge cases.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[timestamps: list of str] --> B{any items?}
    B -- No --> C[return None, None]
    B -- Yes --> D[for each timestamp string]
    D --> E[parse_iso8601\nreplace Z → +00:00\nfromisoformat]
    E -- ValueError --> F[skip entry]
    E -- Success --> G{tzinfo is None?}
    G -- Yes --> H[replace tzinfo=UTC]
    G -- No --> I[keep as-is]
    H --> J[append datetime, original_str]
    I --> J
    F --> D
    J --> K{all skipped?}
    K -- Yes --> C
    K -- No --> L[min by datetime → earliest string\nmax by datetime → latest string]
    L --> M[return earliest, latest]
Loading

Reviews (2): Last reviewed commit: "Merge branch 'main' into worktree-issue-..." | Re-trigger Greptile

@andreatgretel

Copy link
Copy Markdown
Contributor

packages/data-designer-engine/src/data_designer/engine/resources/agent_rollout/pi_coding_agent.py:203-204

started_at=min(timestamps) if timestamps else None,
ended_at=max(timestamps) if timestamps else None,

looks like this file has the same min(timestamps) / max(timestamps) pattern that was fixed in the other three ingesters - worth including in this PR since the helper already exists?

@nabinchha nabinchha left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for putting this together, @eric-tramel — clean fix for a subtle but real correctness issue.

Summary

This PR introduces proper ISO 8601 parsing for rollout timestamp comparisons across the Claude Code, Codex, and ATIF ingesters. The previous min()/max() on raw strings would silently produce incorrect started_at/ended_at values when timestamps had different UTC offsets or sub-second precisions. The new min_max_timestamps() helper parses before comparing, returns original strings to preserve the output contract, and gracefully skips unparseable values.

Findings

Suggestions — Take it or leave it

packages/data-designer-engine/src/data_designer/engine/resources/agent_rollout/utils.py:125-134 — Consider a single-pass min/max

  • What: min_max_timestamps iterates parsed twice (once for min, once for max). For typical rollout sizes this is negligible, but a single loop would be marginally cleaner.
  • Why: Minor efficiency and arguably more readable for a utility that explicitly computes both extremes.
  • Suggestion: Optional — a single-pass reduction:
from functools import reduce

def _min_max_reduce(acc, pair):
    mn, mx = acc
    return (pair if pair[0] < mn[0] else mn, pair if pair[0] > mx[0] else mx)

# ...
first = parsed[0]
earliest, latest = reduce(_min_max_reduce, parsed[1:], (first, first))
return earliest[1], latest[1]

Totally fine as-is for the data sizes involved — just a thought.

packages/data-designer-engine/tests/engine/resources/agent_rollout/test_utils.py — Could add a single-element test case

  • What: There's no explicit test for a single valid timestamp (e.g., ["2025-01-01T00:00:00Z"] → same value for both earliest and latest).
  • Why: The code handles it correctly by construction (min and max of a 1-element list), but an explicit case documents that contract.
  • Suggestion: Add a pytest.param(["2025-01-01T00:00:00Z"], ("2025-01-01T00:00:00Z", "2025-01-01T00:00:00Z"), id="single-element") case.

What Looks Good

  • Correct abstraction level: The helper lives in the shared utils.py where all three ingesters already import from, and the function signature (list[str] → tuple[str | None, str | None]) is minimal and obvious.
  • Output contract preserved: Returning the original string form is the right call — downstream consumers don't need to re-serialize, and existing tests or integrations won't notice the change.
  • Excellent test coverage: The parametrized test hits exactly the interesting cases (mixed offsets, mixed precision, naive-vs-aware, unparseable skipping, all-unparseable). The test IDs are descriptive and make failures immediately readable.
  • Clean integration: The changes to the three ingesters are minimal and mechanical — import the helper, call it, use the result. The Codex short-circuit for session_meta["timestamp"] is correctly preserved.

Verdict

Ship it (with nits) — Only minor suggestions above. The fix is correct, well-tested, and cleanly integrated. Ready to merge.


This review was generated by an AI assistant.

@eric-tramel eric-tramel merged commit 8d4d593 into main May 7, 2026
49 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Normalize rollout timestamps before deriving started_at/ended_at

3 participants