Skip to content

Add Hermes Agent session transcript support to AgentRolloutSeedSource #496

@eric-tramel

Description

@eric-tramel

Priority Level

Medium (Nice to have)

Is your feature request related to a problem? Please describe.

The existing AgentRolloutSeedSource supports two vendor-specific formats (Claude Code and Codex). Hermes Agent stores session transcripts under ~/.hermes/sessions, and those session artifacts preserve the full agent loop with structured message roles, tool_calls, tool_call_id, and reasoning fields. That makes Hermes session transcripts a natural ingestion target for trace distillation, analysis, and training-data preparation without requiring a custom seed reader.

Describe the solution you'd like

Add a built-in Hermes Agent rollout format focused on session transcript ingestion rather than Hermes' ShareGPT trajectory export format.

The handler should support the session artifacts Hermes writes under ~/.hermes/sessions, including:

  • Gateway session transcripts stored as per-session .jsonl files
  • CLI session logs stored as session_*.json files containing top-level session metadata plus a messages array

The format should normalize both shapes into Data Designer's standard agent rollout schema, with Hermes-specific metadata stored in source_meta.

High level changes

  1. Add HERMES_AGENT to AgentRolloutFormat enum
  2. Implement HermesAgentRolloutFormatHandler
  3. Register the handler with BUILTIN_AGENT_ROLLOUT_FORMAT_HANDLERS
  4. Default the format path to ~/.hermes/sessions
  5. Parse Hermes gateway session_meta / tool-definition metadata into source_meta
  6. Normalize Hermes session messages into the standard messages payload used by AgentRolloutSeedSource

Example

from data_designer import DataDesigner, AgentRolloutSeedSource, AgentRolloutFormat

dd = DataDesigner()
config = dd.config_builder()
config.with_seed_dataset(
    AgentRolloutSeedSource(
        format=AgentRolloutFormat.HERMES_AGENT,
        path="~/.hermes/sessions",
    )
)

Describe alternatives you've considered

We could target Hermes' ShareGPT trajectory export format instead, but the session transcript format is a better fit for AgentRolloutSeedSource because it is closer to the existing Claude Code / Codex rollout handlers: one session per file, structured tool-call fields, and richer trace metadata.

Additional context

Hermes session storage currently has two related file conventions under ~/.hermes/sessions:

  • Gateway transcripts: *.jsonl
  • CLI session logs: session_*.json

A single Hermes handler can likely support both via file-shape auto-detection.

References:

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions