[Feature]: Add Langfuse tracing for subagents and gateway sessions

### Problem or Use Case

**Problem:**
Hermes currently lacks production observability for agent operations. When running Hermes in production environments (gateway mode with Telegram/Discord/Slack, or batch processing), there's no way to:
- Trace multi-step agent executions across tool calls and subagent delegations
- Monitor token usage, latency, and error rates across sessions
- Debug failing subagent workflows (hierarchical parent-child relationships are invisible)
- Track costs and performance per user/session in multi-tenant deployments
- Export telemetry data to external observability platforms
This makes it difficult to operate Hermes at scale, debug complex agent chains, or optimize costs.

**Use Case:**
This feature enables Langfuse integration for comprehensive observability:
- Production Monitoring: Trace every agent execution with full context (tools called, subagents spawned, token usage, errors)
- Cost Attribution: Track LLM costs per user, session, or conversation in gateway deployments
- Debugging: Visualize hierarchical subagent executions to identify where complex workflows fail
- Performance Optimization: Identify slow tool calls or expensive model invocations
- Operational Visibility: Dashboards and alerts for production Hermes deployments

The implementation is opt-in via environment variables and gracefully degrades when Langfuse is not configured, ensuring no impact on existing users.

### Proposed Solution


1. Configuration (Zero CLI flags, environment-based)
  ```
  # ~/.hermes/.env
  LANGFUSE_PUBLIC_KEY=pk-lf-...
  LANGFUSE_SECRET_KEY=sk-lf-...
  LANGFUSE_HOST=https://cloud.langfuse.com  # or self-hosted URL
  ```
Optional config in ~/.hermes/config.yaml:
```
observability:
  enabled: true  # Auto-detected if env vars present
  sample_rate: 1.0  # 0.0-1.0, for high-volume filtering
```
2. Core Implementation
- @observe decorator (agent/observability.py): Wraps agent methods to auto-create Langfuse traces/spans with input/output capture
- Thread-safe context management: Uses contextvars to maintain trace context across async boundaries and subagent threads
- Automatic trace hierarchy: Parent agent trace → child span per tool call → nested span per subagent delegation

3. Gateway Integration
When LANGFUSE_* env vars are present:
- Each user message starts a new Langfuse trace with session_id and user_id tags
- All tool executions captured as child spans
- Subagent delegations create nested traces linked to parent via parent_trace_id
- Gateway status command (/status) shows observability health
4. Subagent Hierarchical Tracing
Thread-safe parent-child relationships:
```
  # Parent agent starts trace
  # Subagent receives trace_id via context propagation
  # Subagent creates child span linked to parent
  # Full hierarchy visible in Langfuse UI
```
5. Langfuse Skill
New skill at .agents/skills/langfuse/ providing:
- CLI-based Langfuse API access via /langfuse slash command
- Reference guides for instrumentation, prompt migration, SDK upgrades
- User feedback capture workflows
- Skill feedback submission to maintainers
6. Graceful Degradation
- If langfuse package not installed: observability code no-ops with debug log
- If env vars not set: feature disabled, zero overhead
- Existing users unaffected unless explicitly enabled
CLI Behavior:
```
hermes doctor          # Checks Langfuse connectivity if configured
hermes gateway status  # Shows observability: connected/disabled
```
Verification:
Users verify it's working by:
1. Setting env vars
2. Running any agent task
3. Checking Langfuse dashboard for traces with full tool/subagent hierarchy

### Alternatives Considered

Current state (limited built-in observability):
Hermes already has basic trajectory saving and session storage, but this provides limited granularity—no token-level tracking, no hierarchical subagent visibility, no real-time dashboards, and no production monitoring capabilities. This PR addresses those gaps.

OpenTelemetry generic tracing
- Approach: Use OTel SDK for vendor-agnostic observability
- Why rejected: Loses LLM-native features (token counting, prompt versioning, model comparisons); requires separate collector infrastructure
- Proposed solution better: Langfuse is purpose-built for LLM agents with native generations, sessions, and user tracking

Log-based + external ingestion
- Approach: Structured JSON logs ingested by ELK/Loki
- Why rejected: Post-hoc analysis only; cannot correlate concurrent subagent hierarchies in real-time; complex querying
- Proposed solution better: Live hierarchical traces with immediate drill-down in Langfuse UI

Why this approach:
- Builds on existing foundation: Complements (doesn't replace) current session/trajectory features
- Industry standard: Langfuse is the de facto OSS standard for LLM observability
- Major upgrade: Transforms basic logging into production-grade distributed tracing with cost attribution

### Feature Type

New tool

### Scope

Large (new module or significant refactor)

### Contribution

- [x] I'd like to implement this myself and submit a PR

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature]: Add Langfuse tracing for subagents and gateway sessions #1501

Problem or Use Case

Proposed Solution

Alternatives Considered

Feature Type

Scope

Contribution

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[Feature]: Add Langfuse tracing for subagents and gateway sessions #1501

Description

Problem or Use Case

Proposed Solution

Alternatives Considered

Feature Type

Scope

Contribution

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions