Skip to content

Implement LLM call categorization, coordination metrics suite, and orchestration tracking (DESIGN_SPEC §10.5 M4) #135

@Aureliolo

Description

@Aureliolo

Context

M3 ships proxy metrics (turns/tokens/cost per task). M4 adds call categorization — classifying each LLM call by purpose to detect orchestration overhead — and a coordination metrics suite with 5 empirically-grounded metrics for data-driven tuning of multi-agent configurations. This builds on existing CostRecord infrastructure.

Acceptance Criteria

Call Categorization

  • Each LLM call tagged with a category: productive (direct task work), coordination (delegation, status checks), system (planning, self-reflection, error recovery)
  • Category stored as a field on CostRecord (or companion model)
  • Categorization happens at the call site (engine/agent level), not in the provider layer

Orchestration Overhead Ratio

  • orchestration_ratio = (coordination + system) / total computed per task and per agent
  • Ratio available in spending summaries and task completion metadata
  • Tiered orchestration ratio alerts: info (>30%), warn (>50%), critical (>70%)

Coordination Metrics Suite (§10.5 — new)

  • Coordination efficiency (Ec): success_rate / (turns / turns_sas) — ROI of coordination
  • Coordination overhead (O%): (turns_mas - turns_sas) / turns_sas × 100% — optimal band 200–300%
  • Error amplification (Ae): error_rate_mas / error_rate_sas — error propagation factor
  • Message density (c): inter-agent messages per reasoning turn
  • Redundancy rate (R): mean cosine similarity of agent output embeddings
  • All 5 metrics opt-in via coordination_metrics.enabled config
  • Ec and O% are cheap (turn counting); Ae requires SAS baseline; c and R require semantic analysis
  • Configurable baseline_window for establishing SAS comparison data

Analytics Queries

  • Query: breakdown by category for a given task
  • Query: breakdown by category for a given agent over time
  • Query: company-wide orchestration ratio
  • Query: coordination metrics (Ec, O%, Ae, c, R) per task and per agent

Testing

  • Unit tests for categorization logic
  • Unit tests for all 5 coordination metrics calculations
  • Integration test: multi-agent task with delegation → verify category breakdown
  • Integration test: verify coordination metrics collection with opt-in config

Dependencies

Design Spec Reference

  • §10.5 — LLM Call Analytics (M4: Call Categorization + Coordination Metrics Suite)
  • §16.3 — Agent Scaling Research (Kim et al., 2025 — empirical basis for metrics)

Metadata

Metadata

Assignees

No one assigned

    Labels

    prio:mediumShould do, but not blockingscope:medium1-3 days of workspec:agent-systemDESIGN_SPEC Section 3 - Agent Systemspec:budgetDESIGN_SPEC Section 10 - Cost & Budget Managementspec:providersDESIGN_SPEC Section 9 - Model Provider Layerspec:task-workflowDESIGN_SPEC Section 6 - Task & Workflow Enginetype:featureNew feature implementationtype:testTest coverage, test infrastructure

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions