Problem or Use Case
Problem:
Hermes currently lacks production observability for agent operations. When running Hermes in production environments (gateway mode with Telegram/Discord/Slack, or batch processing), there's no way to:
- Trace multi-step agent executions across tool calls and subagent delegations
- Monitor token usage, latency, and error rates across sessions
- Debug failing subagent workflows (hierarchical parent-child relationships are invisible)
- Track costs and performance per user/session in multi-tenant deployments
- Export telemetry data to external observability platforms
This makes it difficult to operate Hermes at scale, debug complex agent chains, or optimize costs.
Use Case:
This feature enables Langfuse integration for comprehensive observability:
- Production Monitoring: Trace every agent execution with full context (tools called, subagents spawned, token usage, errors)
- Cost Attribution: Track LLM costs per user, session, or conversation in gateway deployments
- Debugging: Visualize hierarchical subagent executions to identify where complex workflows fail
- Performance Optimization: Identify slow tool calls or expensive model invocations
- Operational Visibility: Dashboards and alerts for production Hermes deployments
The implementation is opt-in via environment variables and gracefully degrades when Langfuse is not configured, ensuring no impact on existing users.
Proposed Solution
- Configuration (Zero CLI flags, environment-based)
# ~/.hermes/.env
LANGFUSE_PUBLIC_KEY=pk-lf-...
LANGFUSE_SECRET_KEY=sk-lf-...
LANGFUSE_HOST=https://cloud.langfuse.com # or self-hosted URL
Optional config in ~/.hermes/config.yaml:
observability:
enabled: true # Auto-detected if env vars present
sample_rate: 1.0 # 0.0-1.0, for high-volume filtering
- Core Implementation
- @observe decorator (agent/observability.py): Wraps agent methods to auto-create Langfuse traces/spans with input/output capture
- Thread-safe context management: Uses contextvars to maintain trace context across async boundaries and subagent threads
- Automatic trace hierarchy: Parent agent trace → child span per tool call → nested span per subagent delegation
- Gateway Integration
When LANGFUSE_* env vars are present:
- Each user message starts a new Langfuse trace with session_id and user_id tags
- All tool executions captured as child spans
- Subagent delegations create nested traces linked to parent via parent_trace_id
- Gateway status command (/status) shows observability health
- Subagent Hierarchical Tracing
Thread-safe parent-child relationships:
# Parent agent starts trace
# Subagent receives trace_id via context propagation
# Subagent creates child span linked to parent
# Full hierarchy visible in Langfuse UI
- Langfuse Skill
New skill at .agents/skills/langfuse/ providing:
- CLI-based Langfuse API access via /langfuse slash command
- Reference guides for instrumentation, prompt migration, SDK upgrades
- User feedback capture workflows
- Skill feedback submission to maintainers
- Graceful Degradation
- If langfuse package not installed: observability code no-ops with debug log
- If env vars not set: feature disabled, zero overhead
- Existing users unaffected unless explicitly enabled
CLI Behavior:
hermes doctor # Checks Langfuse connectivity if configured
hermes gateway status # Shows observability: connected/disabled
Verification:
Users verify it's working by:
- Setting env vars
- Running any agent task
- Checking Langfuse dashboard for traces with full tool/subagent hierarchy
Alternatives Considered
Current state (limited built-in observability):
Hermes already has basic trajectory saving and session storage, but this provides limited granularity—no token-level tracking, no hierarchical subagent visibility, no real-time dashboards, and no production monitoring capabilities. This PR addresses those gaps.
OpenTelemetry generic tracing
- Approach: Use OTel SDK for vendor-agnostic observability
- Why rejected: Loses LLM-native features (token counting, prompt versioning, model comparisons); requires separate collector infrastructure
- Proposed solution better: Langfuse is purpose-built for LLM agents with native generations, sessions, and user tracking
Log-based + external ingestion
- Approach: Structured JSON logs ingested by ELK/Loki
- Why rejected: Post-hoc analysis only; cannot correlate concurrent subagent hierarchies in real-time; complex querying
- Proposed solution better: Live hierarchical traces with immediate drill-down in Langfuse UI
Why this approach:
- Builds on existing foundation: Complements (doesn't replace) current session/trajectory features
- Industry standard: Langfuse is the de facto OSS standard for LLM observability
- Major upgrade: Transforms basic logging into production-grade distributed tracing with cost attribution
Feature Type
New tool
Scope
Large (new module or significant refactor)
Contribution
Problem or Use Case
Problem:
Hermes currently lacks production observability for agent operations. When running Hermes in production environments (gateway mode with Telegram/Discord/Slack, or batch processing), there's no way to:
This makes it difficult to operate Hermes at scale, debug complex agent chains, or optimize costs.
Use Case:
This feature enables Langfuse integration for comprehensive observability:
The implementation is opt-in via environment variables and gracefully degrades when Langfuse is not configured, ensuring no impact on existing users.
Proposed Solution
Optional config in ~/.hermes/config.yaml:
When LANGFUSE_* env vars are present:
Thread-safe parent-child relationships:
New skill at .agents/skills/langfuse/ providing:
CLI Behavior:
Verification:
Users verify it's working by:
Alternatives Considered
Current state (limited built-in observability):
Hermes already has basic trajectory saving and session storage, but this provides limited granularity—no token-level tracking, no hierarchical subagent visibility, no real-time dashboards, and no production monitoring capabilities. This PR addresses those gaps.
OpenTelemetry generic tracing
Log-based + external ingestion
Why this approach:
Feature Type
New tool
Scope
Large (new module or significant refactor)
Contribution