Skip to content

[Feature]: Add Langfuse tracing for subagents and gateway sessions #1501

@Aureliusf

Description

@Aureliusf

Problem or Use Case

Problem:
Hermes currently lacks production observability for agent operations. When running Hermes in production environments (gateway mode with Telegram/Discord/Slack, or batch processing), there's no way to:

  • Trace multi-step agent executions across tool calls and subagent delegations
  • Monitor token usage, latency, and error rates across sessions
  • Debug failing subagent workflows (hierarchical parent-child relationships are invisible)
  • Track costs and performance per user/session in multi-tenant deployments
  • Export telemetry data to external observability platforms
    This makes it difficult to operate Hermes at scale, debug complex agent chains, or optimize costs.

Use Case:
This feature enables Langfuse integration for comprehensive observability:

  • Production Monitoring: Trace every agent execution with full context (tools called, subagents spawned, token usage, errors)
  • Cost Attribution: Track LLM costs per user, session, or conversation in gateway deployments
  • Debugging: Visualize hierarchical subagent executions to identify where complex workflows fail
  • Performance Optimization: Identify slow tool calls or expensive model invocations
  • Operational Visibility: Dashboards and alerts for production Hermes deployments

The implementation is opt-in via environment variables and gracefully degrades when Langfuse is not configured, ensuring no impact on existing users.

Proposed Solution

  1. Configuration (Zero CLI flags, environment-based)
# ~/.hermes/.env
LANGFUSE_PUBLIC_KEY=pk-lf-...
LANGFUSE_SECRET_KEY=sk-lf-...
LANGFUSE_HOST=https://cloud.langfuse.com  # or self-hosted URL

Optional config in ~/.hermes/config.yaml:

observability:
  enabled: true  # Auto-detected if env vars present
  sample_rate: 1.0  # 0.0-1.0, for high-volume filtering
  1. Core Implementation
  • @observe decorator (agent/observability.py): Wraps agent methods to auto-create Langfuse traces/spans with input/output capture
  • Thread-safe context management: Uses contextvars to maintain trace context across async boundaries and subagent threads
  • Automatic trace hierarchy: Parent agent trace → child span per tool call → nested span per subagent delegation
  1. Gateway Integration
    When LANGFUSE_* env vars are present:
  • Each user message starts a new Langfuse trace with session_id and user_id tags
  • All tool executions captured as child spans
  • Subagent delegations create nested traces linked to parent via parent_trace_id
  • Gateway status command (/status) shows observability health
  1. Subagent Hierarchical Tracing
    Thread-safe parent-child relationships:
  # Parent agent starts trace
  # Subagent receives trace_id via context propagation
  # Subagent creates child span linked to parent
  # Full hierarchy visible in Langfuse UI
  1. Langfuse Skill
    New skill at .agents/skills/langfuse/ providing:
  • CLI-based Langfuse API access via /langfuse slash command
  • Reference guides for instrumentation, prompt migration, SDK upgrades
  • User feedback capture workflows
  • Skill feedback submission to maintainers
  1. Graceful Degradation
  • If langfuse package not installed: observability code no-ops with debug log
  • If env vars not set: feature disabled, zero overhead
  • Existing users unaffected unless explicitly enabled
    CLI Behavior:
hermes doctor          # Checks Langfuse connectivity if configured
hermes gateway status  # Shows observability: connected/disabled

Verification:
Users verify it's working by:

  1. Setting env vars
  2. Running any agent task
  3. Checking Langfuse dashboard for traces with full tool/subagent hierarchy

Alternatives Considered

Current state (limited built-in observability):
Hermes already has basic trajectory saving and session storage, but this provides limited granularity—no token-level tracking, no hierarchical subagent visibility, no real-time dashboards, and no production monitoring capabilities. This PR addresses those gaps.

OpenTelemetry generic tracing

  • Approach: Use OTel SDK for vendor-agnostic observability
  • Why rejected: Loses LLM-native features (token counting, prompt versioning, model comparisons); requires separate collector infrastructure
  • Proposed solution better: Langfuse is purpose-built for LLM agents with native generations, sessions, and user tracking

Log-based + external ingestion

  • Approach: Structured JSON logs ingested by ELK/Loki
  • Why rejected: Post-hoc analysis only; cannot correlate concurrent subagent hierarchies in real-time; complex querying
  • Proposed solution better: Live hierarchical traces with immediate drill-down in Langfuse UI

Why this approach:

  • Builds on existing foundation: Complements (doesn't replace) current session/trajectory features
  • Industry standard: Langfuse is the de facto OSS standard for LLM observability
  • Major upgrade: Transforms basic logging into production-grade distributed tracing with cost attribution

Feature Type

New tool

Scope

Large (new module or significant refactor)

Contribution

  • I'd like to implement this myself and submit a PR

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions