Original Request
hermes有telemetry和analytics吗,特别是 time/delay cost, token cost, money cost and failure/complete rate?
English Translation: Does Hermes have telemetry and analytics, especially for time/delay cost, token cost, money cost, and failure/complete rate?
Agent's Two Cents (could be wrong)
Everything below is the AI agent's best guess based on the current codebase.
Take with a grain of salt — the original request above is the only thing that came from a human.
Problem / Motivation
Hermes already tracks some usage/accounting data, but observability is fragmented. Today there are pieces of token accounting, estimated cost, session duration, and /insights, but there is no unified telemetry layer that answers the obvious operator questions: how long requests actually take, where latency comes from, how much they cost, and what percentage of runs end successfully vs timeout/abort/reset.
This matters because performance, reliability, and spend are now first-order product concerns. Without a canonical telemetry model, regressions become anecdotal and downstream users end up reverse-engineering behavior from logs and SQLite rows.
What We Checked
run_agent.py already accumulates session_prompt_tokens, session_completion_tokens, session_total_tokens, session_api_calls, and session_estimated_cost_usd.
run_agent.py logs per-API-call latency via logger.info(... latency=%.1fs ...), but this is log-level observability, not queryable analytics.
gateway/run.py already exposes /usage and /insights.
agent/insights.py already reports sessions/messages/tool calls, token totals, estimated cost, active time, average session duration, model/platform/tool breakdowns, and activity patterns.
hermes_state.py already stores session end_reason, token counts, cost fields, and timestamps.
- What still seems missing: a canonical outcome taxonomy plus analytics for success/failure/completion rate; end-to-end latency breakdowns (TTFT, tool latency, queue/wait time, per-turn wall time); and one consistent telemetry surface across CLI, gateway, cron, delegated runs, and API server.
Proposed Solution
Introduce a first-class Hermes telemetry/analytics subsystem with a canonical event schema and rollups for:
- Latency: per-turn wall time, per-API-call latency, TTFT, tool-call latency, idle/wait time, optional queueing delay
- Usage: input/output/cache/reasoning tokens, tool-call counts, message counts
- Cost: estimated and actual cost when available, with clear status/source
- Outcome: completed / timed_out / interrupted / reset / compressed / errored / unknown
- Breakdowns: by source platform, model, provider, toolset, command path, and session/job type
The initial UX can be modest: improve /insights, add a machine-readable export/CLI/API surface, and store normalized telemetry rows in the session DB. Fancy dashboards can come later.
Dependencies & Potential Blockers
- Cross-cutting change touching the agent loop, session persistence, gateway, cron, delegated runs, and possibly API/plugin hooks.
- Outcome semantics need a stable definition first; otherwise the numbers will be garbage with a nicer font.
- Backward compatibility matters because
sessions already stores partial accounting fields.
- No major external infrastructure blocker is required for a first local/SQLite-backed version.
How to Validate
- Run comparable conversations from CLI, Slack/Telegram gateway, cron, and delegated-task paths; confirm telemetry is recorded consistently for all of them.
- Verify that a normal successful run increments the "completed" bucket.
- Verify that inactivity timeout, manual interruption, session reset, and compression continuation produce distinct outcome classifications.
- Confirm
/insights (or a new export command) can report:
- average and p95 latency
- token totals
- estimated cost totals
- completion/failure rates by source and model
- Confirm existing
/usage and /insights behavior does not regress for old sessions with sparse data.
Best Validation Path
Best default path: add a deterministic integration test matrix over SessionDB + InsightsEngine + a small set of synthetic session transcripts/end reasons, then run one real smoke test per runtime path (CLI, gateway, cron, delegate) and assert that each produces the same normalized telemetry fields.
Best Human Demo
A before-vs-after terminal demo is the cleanest proof:
- run 3-4 scripted sessions that intentionally end in different ways (complete, timeout, interrupt, compression continuation)
- run
/insights 7 or a new hermes insights --json
- show one screen with latency, cost, and outcome breakdowns that were previously impossible to answer
Scope Estimate
large
Key Files/Modules Likely Involved
run_agent.py
hermes_state.py
agent/insights.py
gateway/run.py
gateway/session.py
Architecture Diagram
+-------------------+ +-------------------+
| CLI / Gateway / | | Cron / API server |
| Delegate / MCP | | / batch runners |
+---------+---------+ +---------+---------+
| |
+------------+ +------------+
v v
+-------------+
| AIAgent loop |
| run_agent.py |
+------+------+
|
+------------+-------------+
| |
v v
+--------------------+ +----------------------+
| tool / model events| | outcome transitions |
| latency, tokens, | | completed / timeout /|
| cost, model, etc. | | interrupt / reset... |
+----------+---------+ +----------+-----------+
\ /
\ /
v v
+---------------------+
| normalized telemetry|
| event + session |
| schema |
+----------+----------+
|
v
+----------------------+
| SessionDB / rollups |
| hermes_state.py |
+----------+-----------+
|
+-------------+----------------+
| |
v v
+--------------------+ +-----------------------+
| InsightsEngine | | JSON/API export / |
| human summaries | | downstream analysis |
+--------------------+ +-----------------------+
Rough Implementation Sketch
- Define a canonical telemetry schema for event-level and session-level rollups.
- Define a canonical outcome taxonomy and map existing
end_reason values into it.
- Centralize telemetry emission in the agent/runtime paths instead of scattering ad hoc counters.
- Extend
SessionDB schema and migration logic for normalized fields and/or event tables.
- Teach
InsightsEngine to compute outcome rates, latency percentiles, and cost/usage breakdowns.
- Expose the results in
/insights, CLI output, and ideally a machine-readable export.
- Add integration tests that intentionally exercise different termination paths.
Open Questions
- Should this be session-rollup only first, or should Hermes store event-level telemetry from day one?
- How should "complete" be defined for interactive chats where the user simply stops talking?
- Should
compression, session_reset, and session_switch count as neutral transitions or failed/completed outcomes?
- Is there appetite for OpenTelemetry/Langfuse-style sinks later, or should v1 stay local-only?
Potential Risks or Gotchas
- This is broad enough that piecemeal contributor patches are likely to fight each other or calcify the wrong schema.
- Downstream/local carries are especially risky here because every runtime path must agree on semantics, and drift will make analytics actively misleading.
- Because of the cross-cutting blast radius, this feels better suited to an author/core-maintainer-led design pass than a casual contributor feature branch.
- If the project ships metrics before defining outcome semantics, people will trust numbers that do not mean what they think they mean.
Related Issues
Original Request
Agent's Two Cents (could be wrong)
Problem / Motivation
Hermes already tracks some usage/accounting data, but observability is fragmented. Today there are pieces of token accounting, estimated cost, session duration, and
/insights, but there is no unified telemetry layer that answers the obvious operator questions: how long requests actually take, where latency comes from, how much they cost, and what percentage of runs end successfully vs timeout/abort/reset.This matters because performance, reliability, and spend are now first-order product concerns. Without a canonical telemetry model, regressions become anecdotal and downstream users end up reverse-engineering behavior from logs and SQLite rows.
What We Checked
run_agent.pyalready accumulatessession_prompt_tokens,session_completion_tokens,session_total_tokens,session_api_calls, andsession_estimated_cost_usd.run_agent.pylogs per-API-call latency vialogger.info(... latency=%.1fs ...), but this is log-level observability, not queryable analytics.gateway/run.pyalready exposes/usageand/insights.agent/insights.pyalready reports sessions/messages/tool calls, token totals, estimated cost, active time, average session duration, model/platform/tool breakdowns, and activity patterns.hermes_state.pyalready stores sessionend_reason, token counts, cost fields, and timestamps.Proposed Solution
Introduce a first-class Hermes telemetry/analytics subsystem with a canonical event schema and rollups for:
The initial UX can be modest: improve
/insights, add a machine-readable export/CLI/API surface, and store normalized telemetry rows in the session DB. Fancy dashboards can come later.Dependencies & Potential Blockers
sessionsalready stores partial accounting fields.How to Validate
/insights(or a new export command) can report:/usageand/insightsbehavior does not regress for old sessions with sparse data.Best Validation Path
Best default path: add a deterministic integration test matrix over
SessionDB+InsightsEngine+ a small set of synthetic session transcripts/end reasons, then run one real smoke test per runtime path (CLI, gateway, cron, delegate) and assert that each produces the same normalized telemetry fields.Best Human Demo
A before-vs-after terminal demo is the cleanest proof:
/insights 7or a newhermes insights --jsonScope Estimate
large
Key Files/Modules Likely Involved
run_agent.pyhermes_state.pyagent/insights.pygateway/run.pygateway/session.pyArchitecture Diagram
Rough Implementation Sketch
end_reasonvalues into it.SessionDBschema and migration logic for normalized fields and/or event tables.InsightsEngineto compute outcome rates, latency percentiles, and cost/usage breakdowns./insights, CLI output, and ideally a machine-readable export.Open Questions
compression,session_reset, andsession_switchcount as neutral transitions or failed/completed outcomes?Potential Risks or Gotchas
Related Issues