Implement LLM call categorization, coordination metrics suite, and orchestration tracking (DESIGN_SPEC §10.5 M4)

## Context

M3 ships proxy metrics (turns/tokens/cost per task). M4 adds **call categorization** — classifying each LLM call by purpose to detect orchestration overhead — and a **coordination metrics suite** with 5 empirically-grounded metrics for data-driven tuning of multi-agent configurations. This builds on existing `CostRecord` infrastructure.

## Acceptance Criteria

### Call Categorization
- [ ] Each LLM call tagged with a category: `productive` (direct task work), `coordination` (delegation, status checks), `system` (planning, self-reflection, error recovery)
- [ ] Category stored as a field on `CostRecord` (or companion model)
- [ ] Categorization happens at the call site (engine/agent level), not in the provider layer

### Orchestration Overhead Ratio
- [ ] `orchestration_ratio = (coordination + system) / total` computed per task and per agent
- [ ] Ratio available in spending summaries and task completion metadata
- [ ] Tiered orchestration ratio alerts: info (>30%), warn (>50%), critical (>70%)

### Coordination Metrics Suite (§10.5 — new)
- [ ] **Coordination efficiency** (`Ec`): `success_rate / (turns / turns_sas)` — ROI of coordination
- [ ] **Coordination overhead** (`O%`): `(turns_mas - turns_sas) / turns_sas × 100%` — optimal band 200–300%
- [ ] **Error amplification** (`Ae`): `error_rate_mas / error_rate_sas` — error propagation factor
- [ ] **Message density** (`c`): inter-agent messages per reasoning turn
- [ ] **Redundancy rate** (`R`): mean cosine similarity of agent output embeddings
- [ ] All 5 metrics opt-in via `coordination_metrics.enabled` config
- [ ] `Ec` and `O%` are cheap (turn counting); `Ae` requires SAS baseline; `c` and `R` require semantic analysis
- [ ] Configurable `baseline_window` for establishing SAS comparison data

### Analytics Queries
- [ ] Query: breakdown by category for a given task
- [ ] Query: breakdown by category for a given agent over time
- [ ] Query: company-wide orchestration ratio
- [ ] Query: coordination metrics (Ec, O%, Ae, c, R) per task and per agent

### Testing
- [ ] Unit tests for categorization logic
- [ ] Unit tests for all 5 coordination metrics calculations
- [ ] Integration test: multi-agent task with delegation → verify category breakdown
- [ ] Integration test: verify coordination metrics collection with opt-in config

## Dependencies

- #7 — Per-call cost tracking (done)
- #21 — Task lifecycle with proxy metrics (M3)
- Multi-agent execution (M4 prerequisite for coordination calls to exist)

## Design Spec Reference

- §10.5 — LLM Call Analytics (M4: Call Categorization + Coordination Metrics Suite)
- §16.3 — Agent Scaling Research (Kim et al., 2025 — empirical basis for metrics)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement LLM call categorization, coordination metrics suite, and orchestration tracking (DESIGN_SPEC §10.5 M4) #135

Context

Acceptance Criteria

Call Categorization

Orchestration Overhead Ratio

Coordination Metrics Suite (§10.5 — new)

Analytics Queries

Testing

Dependencies

Design Spec Reference

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Implement LLM call categorization, coordination metrics suite, and orchestration tracking (DESIGN_SPEC §10.5 M4) #135

Description

Context

Acceptance Criteria

Call Categorization

Orchestration Overhead Ratio

Coordination Metrics Suite (§10.5 — new)

Analytics Queries

Testing

Dependencies

Design Spec Reference

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions