Skip to content

docs: add agent rollout ingestion docs entry point#499

Merged
eric-tramel merged 5 commits into
mainfrom
codex/agent-rollout-docs
Apr 8, 2026
Merged

docs: add agent rollout ingestion docs entry point#499
eric-tramel merged 5 commits into
mainfrom
codex/agent-rollout-docs

Conversation

@eric-tramel

@eric-tramel eric-tramel commented Apr 7, 2026

Copy link
Copy Markdown
Contributor

📋 Summary

This PR creates a dedicated documentation entry point for agent rollout ingestion so rollout-specific guidance can live outside the broader seed-datasets page. It also adds breadcrumbs from the existing rollout-related docs so users can consistently find that guide before the detailed content pass lands.

🔗 Related Issue

N/A

🔄 Changes

  • add docs/concepts/agent-rollout-ingestion.md as the new canonical page scaffold for rollout ingestion docs
  • add the new page to the Concepts navigation in mkdocs.yml
  • add breadcrumbs from Seed Datasets, Message Traces, the rollout distillation recipe page, and the rollout recipe card to the new page
  • keep the existing rollout overview content in place for now so the follow-up docs pass can expand the dedicated page without removing current guidance

🧪 Testing

  • uv run --group docs mkdocs build
  • make test passes (not run; docs-only change)
  • Unit tests added/updated (N/A — docs-only change)
  • E2E tests added/updated (N/A — docs-only change)
  • uv run --group docs mkdocs build --strict passes (blocked by pre-existing repo docs warnings unrelated to this PR)

✅ Checklist

  • Follows commit message conventions
  • Commits are signed off (DCO)
  • Architecture docs updated (N/A — docs-only change)

@eric-tramel eric-tramel requested a review from a team as a code owner April 7, 2026 14:08
@greptile-apps

greptile-apps Bot commented Apr 7, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR creates a dedicated docs/concepts/agent-rollout-ingestion.md page for AgentRolloutSeedSource documentation and wires it into the existing site via mkdocs.yml and breadcrumb links from four related pages (seed-datasets.md, traces.md, cards.md, and agent_rollout_distillation.md). The rollout-specific content previously inlined in seed-datasets.md is replaced with a concise pointer to the new dedicated guide.

Confidence Score: 5/5

Docs-only reorganization with no logic changes; safe to merge.

All six changed files are documentation. The new page is well-structured, relative links between pages are correct, mkdocs.yml navigation placement is appropriate, and the content moved out of seed-datasets.md is faithfully reproduced and expanded in the dedicated guide. No P0 or P1 findings.

No files require special attention.

Vulnerabilities

No security concerns identified.

Important Files Changed

Filename Overview
docs/concepts/agent-rollout-ingestion.md New canonical page for AgentRolloutSeedSource; contains Quick Start tabs, normalized field compatibility table, and two complete code examples. Content is accurate and well-structured.
docs/concepts/seed-datasets.md Rollout-specific inline content replaced with a pointer admonition to the new dedicated guide; "Custom Filesystem Readers" tip relocated to the DirectorySeedSource section where it belongs semantically.
docs/concepts/traces.md Single "See Also" bullet added linking to the new Agent Rollout Ingestion page; correct relative path used.
docs/recipes/cards.md Adds an "Ingestion Guide" button to the rollout distillation recipe card pointing to the new concepts page; relative path is correct.
docs/recipes/trace_ingestion/agent_rollout_distillation.md Adds an info admonition near the top pointing to the new ingestion guide; relative path ../../concepts/agent-rollout-ingestion.md is correct from this depth.
mkdocs.yml New page added to Concepts nav immediately after Seed Datasets; placement and file path are correct.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    SD[concepts/seed-datasets.md] -->|"!!! info pointer"| ARI[concepts/agent-rollout-ingestion.md]
    TR[concepts/traces.md] -->|"See Also link"| ARI
    RC[recipes/cards.md] -->|"Ingestion Guide button"| ARI
    ARD[recipes/trace_ingestion/agent_rollout_distillation.md] -->|"!!! info pointer"| ARI
    NAV[mkdocs.yml Concepts nav] --> ARI
    ARI --> QS["Quick Start\n(Claude Code / Codex / Hermes / ATIF)"]
    ARI --> NFC["Normalized Field\nCompatibility Table"]
    ARI --> EX1["Example: Summarize a Random Turn"]
    ARI --> EX2["Example: Tool Interaction Review Dataset"]
    ARI --> REL["Related Guides (back-links)"]
Loading

Reviews (3): Last reviewed commit: "Merge branch 'main' into codex/agent-rol..." | Re-trigger Greptile

@eric-tramel eric-tramel marked this pull request as draft April 7, 2026 14:16
@eric-tramel eric-tramel self-assigned this Apr 7, 2026
Add a dedicated concepts entry for agent rollout ingestion and
link existing rollout-related docs back to it so the detailed
guide can be filled in separately.

Signed-off-by: Eric W. Tramel <eric.tramel@gmail.com>
@eric-tramel eric-tramel force-pushed the codex/agent-rollout-docs branch from 53fc050 to b5e68ea Compare April 7, 2026 16:42
@eric-tramel eric-tramel marked this pull request as ready for review April 7, 2026 22:10
@andreatgretel andreatgretel added the agent-review Trigger agentic CI review label Apr 8, 2026
@github-actions

github-actions Bot commented Apr 8, 2026

Copy link
Copy Markdown
Contributor

Summary

This docs-only PR creates a dedicated entry point for agent rollout ingestion documentation (docs/concepts/agent-rollout-ingestion.md) and cross-links it from four existing pages. It consolidates rollout-specific detail that was previously embedded in the Seed Datasets page into a standalone guide with tabbed quick-start snippets, a normalized field compatibility table, and two worked examples (random turn summarization, tool interaction dataset). The Seed Datasets page is trimmed to a short pointer, and mkdocs.yml adds the new page to the Concepts nav.

PR: #499docs: add agent rollout ingestion docs entry point
Author: Eric W. Tramel
Base: maincodex/agent-rollout-docs
Scope: 6 files changed (+283 / −43), all under docs/ and mkdocs.yml


Findings

Accuracy — API references vs. source code

All referenced classes and enums exist in the codebase and match their documented behavior:

Docs reference Source location Status
AgentRolloutSeedSource config/seed_source.py Exists
AgentRolloutFormat.{CLAUDE_CODE,CODEX,HERMES_AGENT,ATIF} config/seed_source.py:198-202 All four variants present
ExpressionColumnConfig config/column_configs.py Exists
LLMTextColumnConfig config/column_configs.py Exists
LLMStructuredColumnConfig config/column_configs.py Exists
CustomColumnConfig config/column_configs.py Exists
@custom_column_generator decorator config/custom_column.py Exists, including side_effect_columns param
allow_resize field config/base.py:39 Exists on base SingleColumnConfig
DataDesignerConfigBuilder config/config_builder.py Exists
recursive param on FileSystemSeedSource config/seed_source.py:107 Exists

Default paths match the code:

Format Docs claim Code (seed_source.py) Match?
Claude Code ~/.claude/projects, *.jsonl get_claude_code_default_path()~/.claude/projects, *.jsonl Yes
Codex ~/.codex/sessions, *.jsonl get_codex_default_path()~/.codex/sessions, *.jsonl Yes
Hermes Agent ~/.hermes/sessions, *.json* get_hermes_agent_default_path()~/.hermes/sessions, *.json* Yes
ATIF requires explicit path (None, "*.json") Yes

Links — internal cross-references

All internal links resolve correctly:

  • seed-datasets.mdagent-rollout-ingestion.md (same directory) — valid
  • traces.mdagent-rollout-ingestion.md (same directory) — valid
  • cards.md../concepts/agent-rollout-ingestion.md — valid
  • agent_rollout_distillation.md../../concepts/agent-rollout-ingestion.md — valid (verified via realpath)
  • agent-rollout-ingestion.mdseed-datasets.md, traces.md, ../recipes/trace_ingestion/agent_rollout_distillation.md — all valid
  • The FileSystemSeedReader Plugins link (../plugins/filesystem_seed_reader.md) in seed-datasets.md — valid, target file exists

Link — external reference (low severity)

  • The ATIF tab links to https://harborframework.com/docs/trajectory-format. This URL appears only in this PR and is not referenced elsewhere in the codebase. The domain/path could not be verified in this review. If Harbor's ATIF docs live at a different URL, this will be a broken link. Recommendation: confirm this URL is correct before merge, or replace with a more stable reference.

Navigation placement

The new page sits at Concepts > Agent Rollout Ingestion, immediately after Seed Datasets. This is logical since it's a specialization of seed datasets.

Content consolidation

The original seed-datasets.md had ~40 lines of rollout-specific content (format defaults, code examples, full field list). The PR correctly moves this detail to the dedicated page and replaces it with a compact admonition pointing readers there. The quick-start snippet and the "Trace Distillation" tip are preserved so the Seed Datasets page remains self-contained for casual readers.

Normalized field table

The compatibility table is well-structured and comprehensive. The is_sidechain claims were verified against the engine parsers — ATIF, Codex, and Hermes all hardcode False; Claude Code reads isSidechain from raw records. This matches the docs.

Code examples

  1. Random turn summarization — Clean and minimal. Uses ExpressionColumnConfig with Jinja {{ messages | random }}, which is a standard Jinja2 filter. Straightforward.

  2. Tool interaction exploder — More complex. Uses @custom_column_generator with side_effect_columns and allow_resize=True on CustomColumnConfig. Both features exist in the codebase. The example is well-commented and demonstrates a realistic use case. One minor note: the example builds context_messages as a running list and copies it via list(context_messages) — this is correct and avoids mutation issues.

Minor observations (non-blocking)

  1. "Documentation in progress" admonition removed: The scaffold page (commit 1) included a !!! note "Documentation in progress" block. Commit 2 correctly removes it as the content is filled in. Good housekeeping.

  2. Custom Filesystem Readers tip moved: In seed-datasets.md, the FileSystemSeedReader Plugins tip was moved from below the rollout section to above it (under DirectorySeedSource). This is a better placement since FileSystemSeedReader is relevant to all filesystem-based seed sources, not just rollouts.

  3. Model alias: Both examples use nvidia/nemotron-3-nano-30b-a3b, which is consistent with other docs in the repo.


Verdict

Approve — This is a clean, well-structured docs PR. All API references are accurate against the current codebase, internal links resolve correctly, the nav placement is logical, and the content consolidation from seed-datasets.md is well-handled. The two code examples are correct and demonstrate realistic usage patterns.

The only item to verify before merge is the external Harbor ATIF documentation link (harborframework.com/docs/trajectory-format), which could not be confirmed during this review.

@github-actions github-actions Bot removed the agent-review Trigger agentic CI review label Apr 8, 2026
@eric-tramel eric-tramel merged commit 5f04e5d into main Apr 8, 2026
49 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants