fix: normalize rollout timestamps before deriving started_at/ended_at by eric-tramel · Pull Request #556 · NVIDIA-NeMo/DataDesigner

eric-tramel · 2026-04-17T14:13:42Z

📋 Summary

Parse ISO 8601 timestamps before comparing them in the Claude Code, Codex, and ATIF rollout ingesters. Raw string min()/max() can order chronologically-earlier timestamps as later when offsets or precisions differ (e.g. 2025-01-01T00:30:00+01:00 vs 2025-01-01T00:00:00Z), which could produce incorrect started_at / ended_at on otherwise valid rollouts.

🔗 Related Issue

Fixes #497

🔄 Changes

Add min_max_timestamps() + parse_iso8601() helpers in agent_rollout/utils.py. Naive timestamps are treated as UTC; unparseable values are silently skipped (matching the existing tolerance of coerce_optional_str). Winners are returned in their original string form, so the output contract is unchanged.
Swap min(timestamps) / max(timestamps) for the helper in claude_code.py, codex.py, and atif.py. Codex's session_meta["timestamp"] short-circuit for started_at is preserved.
Add a parameterized unit test covering the four interesting cases: mixed offsets, mixed precisions, naive-vs-aware, and unparseable-skipping.

🧪 Testing

make test passes (engine suite, 1867 tests; pre-existing interface failures unrelated to this change)
Unit tests added/updated
E2E tests added/updated (if applicable)

✅ Checklist

Follows commit message conventions
Commits are signed off (DCO)
Architecture docs updated (if applicable)

Claude Code, Codex, and ATIF ingesters called min()/max() on raw timestamp strings. That assumes lexicographic order matches chronological order, which breaks for ISO 8601 timestamps with mixed UTC offsets or precisions (e.g. 2025-01-01T00:30:00+01:00 is earlier than 2025-01-01T00:00:00Z but sorts later as a string). Introduce min_max_timestamps() in the shared rollout utils that parses each value as ISO 8601 (naive values treated as UTC, unparseable values skipped) and picks the chronological extremes, returning them in their original string form. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Eric W. Tramel <eric.tramel@gmail.com>

github-actions · 2026-04-17T14:15:23Z

Code Review: PR #556 — fix: normalize rollout timestamps before deriving started_at/ended_at

Summary

This PR fixes a correctness bug in the agent rollout ingesters (Claude Code, Codex, ATIF) where started_at and ended_at were derived by calling min()/max() on raw ISO 8601 timestamp strings. Lexicographic comparison produces wrong results when timestamps have differing UTC offsets or sub-second precisions — e.g. "2025-01-01T00:30:00+01:00" sorts after "2025-01-01T00:00:00Z" as a string, but is chronologically 30 minutes earlier.

The fix introduces two helpers in utils.py — parse_iso8601() and min_max_timestamps() — that parse timestamps into datetime objects before comparison, then return the winners in their original string form so the output contract is unchanged. All three ingesters are updated to use the new helper. A parameterized test covers the key edge cases.

Files changed: 5 (3 ingesters, 1 utility module, 1 new test file)
Lines: +88 / -7

Findings

Correctness

Fix is sound. Parsing to datetime before comparison correctly handles mixed offsets, mixed precisions, and naive-vs-aware timestamps. The Z → +00:00 replacement before fromisoformat() is the standard workaround for Python < 3.11 compatibility and is safe here since Z only appears as a trailing timezone designator in ISO 8601.
Output contract preserved. min_max_timestamps() returns the original string values, not stringified datetime objects, so downstream consumers are unaffected.
Codex short-circuit preserved. In codex.py, the session_meta.get("timestamp") fallback for started_at is correctly maintained (started_at=coerce_optional_str(session_meta.get("timestamp")) or earliest).

Design

Centralised helper is the right call. The same parsing logic was duplicated across three files; moving it to utils.py eliminates the duplication cleanly.
Silent skip of unparseable values is consistent. The PR description notes this mirrors the existing tolerance of coerce_optional_str. If all timestamps are unparseable, (None, None) is returned, which matches the previous behavior of an empty list producing None. No log warning is emitted for skipped values — this is a minor observation, not a blocker, since the ingestion layer already tolerates missing timestamps.

Style & Conventions

from __future__ import annotations present in both new/modified files — compliant.
SPDX license header present in the new test file.
Absolute imports only — no relative imports introduced.
Type annotations on all new functions and variables.
Naming: earliest/latest in codex.py avoids shadowing and reads naturally.

Tests

Coverage is good. The parameterized test (test_utils.py) covers 6 cases:
1. Empty list → (None, None)
2. Mixed offsets where lexicographic ordering disagrees with chronological
3. Mixed sub-second precision
4. Naive timestamp treated as UTC compared against aware
5. Unparseable values skipped, valid ones retained
6. Only unparseable values → (None, None)
No negative-offset test case. A case like "2025-01-01T00:00:00-05:00" vs "2025-01-01T04:00:00Z" (same instant, different representation) would exercise negative offsets, but this is not a gap since fromisoformat handles negative offsets identically to positive ones — nice-to-have at most.

Potential Concerns (non-blocking)

value.replace("Z", "+00:00") — This replaces all occurrences of the character Z in the string, not just a trailing one. In practice, ISO 8601 timestamps won't contain a capital Z anywhere except as the UTC designator at the end, so this is safe. A more defensive approach would be value[:-1] + "+00:00" if value.endswith("Z") else value, but the current approach matches widespread convention and is fine.
No logging on unparseable skip — If a rollout file contains garbled timestamps, the helper silently discards them. For debugging production ingestion issues, a logger.debug on skip could be useful, but this is a future enhancement, not a requirement for this fix.

Verdict

Approve. This is a clean, well-scoped bug fix with good test coverage. The implementation is correct, follows project conventions, and doesn't introduce unnecessary complexity. No blocking issues found.

greptile-apps · 2026-04-17T14:16:16Z

Greptile Summary

Fixes incorrect started_at/ended_at derivation caused by lexicographic string comparison of ISO 8601 timestamps with differing UTC offsets or sub-second precisions. The fix introduces parse_iso8601() / min_max_timestamps() helpers in utils.py and swaps the raw min/max calls in all three ingesters.

parse_iso8601() normalises the "Z" suffix for Python < 3.11 compatibility, coerces naive datetimes to UTC, and returns None for unparseable values so callers can skip them silently.
min_max_timestamps() builds a (datetime, original_str) pair list, runs min/max by the parsed datetime, and returns the original strings unchanged — preserving the output contract.
codex.py retains its existing session_meta[\"timestamp\"] priority for started_at, now falling back to earliest from the helper instead of a lexicographic min.

Confidence Score: 5/5

Safe to merge — the change is a targeted, well-tested correctness fix with no breaking output-contract changes.

All three ingester changes are straightforward swaps; the new helper correctly normalises UTC offsets and naive datetimes before comparing, and returns original strings so downstream consumers see no type change. The test suite covers all meaningful edge cases including the lex-vs-chrono mismatch that motivated the fix. No regressions were identified.

No files require special attention.

Important Files Changed

Filename	Overview
packages/data-designer-engine/src/data_designer/engine/resources/agent_rollout/utils.py	Adds `parse_iso8601()` and `min_max_timestamps()` helpers; naive timestamps are coerced to UTC before comparison, and winners are returned as their original strings.
packages/data-designer-engine/src/data_designer/engine/resources/agent_rollout/atif.py	Replaces raw `min`/`max` string comparisons with `min_max_timestamps()`; straightforward, correct swap with no behavioral surprises.
packages/data-designer-engine/src/data_designer/engine/resources/agent_rollout/claude_code.py	Same `min`/`max` → `min_max_timestamps()` swap as `atif.py`; clean and correct.
packages/data-designer-engine/src/data_designer/engine/resources/agent_rollout/codex.py	Preserves the `session_meta["timestamp"]` short-circuit for `started_at` while adopting `min_max_timestamps()`; semantics are unchanged relative to the old code.
packages/data-designer-engine/tests/engine/resources/agent_rollout/test_utils.py	New parameterized test covering empty input, mixed offsets, mixed precisions, naive-vs-aware, and unparseable skipping — good coverage of the interesting edge cases.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[timestamps: list of str] --> B{any items?}
    B -- No --> C[return None, None]
    B -- Yes --> D[for each timestamp string]
    D --> E[parse_iso8601\nreplace Z → +00:00\nfromisoformat]
    E -- ValueError --> F[skip entry]
    E -- Success --> G{tzinfo is None?}
    G -- Yes --> H[replace tzinfo=UTC]
    G -- No --> I[keep as-is]
    H --> J[append datetime, original_str]
    I --> J
    F --> D
    J --> K{all skipped?}
    K -- Yes --> C
    K -- No --> L[min by datetime → earliest string\nmax by datetime → latest string]
    L --> M[return earliest, latest]

_{Reviews (2): Last reviewed commit: "Merge branch 'main' into worktree-issue-..." | Re-trigger Greptile}

andreatgretel · 2026-04-17T14:54:59Z

packages/data-designer-engine/src/data_designer/engine/resources/agent_rollout/pi_coding_agent.py:203-204
started_at=min(timestamps) if timestamps else None,
ended_at=max(timestamps) if timestamps else None,

looks like this file has the same min(timestamps) / max(timestamps) pattern that was fixed in the other three ingesters - worth including in this PR since the helper already exists?

nabinchha

Thanks for putting this together, @eric-tramel — clean fix for a subtle but real correctness issue.

Summary

This PR introduces proper ISO 8601 parsing for rollout timestamp comparisons across the Claude Code, Codex, and ATIF ingesters. The previous min()/max() on raw strings would silently produce incorrect started_at/ended_at values when timestamps had different UTC offsets or sub-second precisions. The new min_max_timestamps() helper parses before comparing, returns original strings to preserve the output contract, and gracefully skips unparseable values.

Findings

Suggestions — Take it or leave it

packages/data-designer-engine/src/data_designer/engine/resources/agent_rollout/utils.py:125-134 — Consider a single-pass min/max

What: min_max_timestamps iterates parsed twice (once for min, once for max). For typical rollout sizes this is negligible, but a single loop would be marginally cleaner.
Why: Minor efficiency and arguably more readable for a utility that explicitly computes both extremes.
Suggestion: Optional — a single-pass reduction:

from functools import reduce

def _min_max_reduce(acc, pair):
    mn, mx = acc
    return (pair if pair[0] < mn[0] else mn, pair if pair[0] > mx[0] else mx)

# ...
first = parsed[0]
earliest, latest = reduce(_min_max_reduce, parsed[1:], (first, first))
return earliest[1], latest[1]

Totally fine as-is for the data sizes involved — just a thought.

packages/data-designer-engine/tests/engine/resources/agent_rollout/test_utils.py — Could add a single-element test case

What: There's no explicit test for a single valid timestamp (e.g., ["2025-01-01T00:00:00Z"] → same value for both earliest and latest).
Why: The code handles it correctly by construction (min and max of a 1-element list), but an explicit case documents that contract.
Suggestion: Add a pytest.param(["2025-01-01T00:00:00Z"], ("2025-01-01T00:00:00Z", "2025-01-01T00:00:00Z"), id="single-element") case.

What Looks Good

Correct abstraction level: The helper lives in the shared utils.py where all three ingesters already import from, and the function signature (list[str] → tuple[str | None, str | None]) is minimal and obvious.
Output contract preserved: Returning the original string form is the right call — downstream consumers don't need to re-serialize, and existing tests or integrations won't notice the change.
Excellent test coverage: The parametrized test hits exactly the interesting cases (mixed offsets, mixed precision, naive-vs-aware, unparseable skipping, all-unparseable). The test IDs are descriptive and make failures immediately readable.
Clean integration: The changes to the three ingesters are minimal and mechanical — import the helper, call it, use the result. The Codex short-circuit for session_meta["timestamp"] is correctly preserved.

Verdict

Ship it (with nits) — Only minor suggestions above. The fix is correct, well-tested, and cleanly integrated. Ready to merge.

This review was generated by an AI assistant.

eric-tramel requested a review from a team as a code owner April 17, 2026 14:13

eric-tramel temporarily deployed to agentic-ci April 17, 2026 14:13 — with GitHub Actions Inactive

github-actions Bot mentioned this pull request Apr 20, 2026

Agentic CI: Issue & PR Triage Tracker #562

Open

nabinchha approved these changes Apr 28, 2026

View reviewed changes

Merge branch 'main' into worktree-issue-497-timestamp-normalization

83f94b0

eric-tramel merged commit 8d4d593 into main May 7, 2026
49 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: normalize rollout timestamps before deriving started_at/ended_at#556

fix: normalize rollout timestamps before deriving started_at/ended_at#556
eric-tramel merged 2 commits into
mainfrom
worktree-issue-497-timestamp-normalization

eric-tramel commented Apr 17, 2026

Uh oh!

github-actions Bot commented Apr 17, 2026

Uh oh!

greptile-apps Bot commented Apr 17, 2026 •

edited

Loading

Confidence Score: 5/5

Flowchart

Uh oh!

andreatgretel commented Apr 17, 2026

Uh oh!

nabinchha left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

eric-tramel commented Apr 17, 2026

📋 Summary

🔗 Related Issue

🔄 Changes

🧪 Testing

✅ Checklist

Uh oh!

github-actions Bot commented Apr 17, 2026

Code Review: PR #556 — fix: normalize rollout timestamps before deriving started_at/ended_at

Summary

Findings

Correctness

Design

Style & Conventions

Tests

Potential Concerns (non-blocking)

Verdict

Uh oh!

greptile-apps Bot commented Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Flowchart

Uh oh!

andreatgretel commented Apr 17, 2026

Uh oh!

nabinchha left a comment

Choose a reason for hiding this comment

Summary

Findings

Suggestions — Take it or leave it

What Looks Good

Verdict

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

greptile-apps Bot commented Apr 17, 2026 •

edited

Loading