Implement agent performance tracking and metrics collection

## Context

Implement the performance tracking system defined in spec 8.3 that collects and aggregates metrics for each agent. These metrics drive hiring/firing decisions, promotions, and progressive trust.

**Metrics to track:**
- `tasks_completed` / `tasks_failed` — Task completion counts
- `avg_quality_score` — Average quality rating across completed tasks
- `avg_cost_per_task` — Average LLM/tool cost per task
- `avg_completion_time` — Average time from assignment to completion
- `collaboration_score` — Effectiveness in multi-agent interactions

## Acceptance Criteria

- [ ] Performance metrics model matching spec 8.3 fields
- [ ] Automatic metric collection triggered by task completion events
- [ ] Quality scoring from code reviews, output evaluation, and peer feedback
- [ ] Cost efficiency tracking (tokens used, tool invocations, total cost per task)
- [ ] Time tracking (assignment to completion duration)
- [ ] Collaboration score computation (message quality, helpfulness ratings)
- [ ] Queryable by agent, department, project, and time range
- [ ] Rolling averages and trend detection (improving/declining)
- [ ] Metrics persistence (survives restarts)
- [ ] Unit tests for metric collection and aggregation

## Dependencies

- #45 — HR engine must be implemented
- #20 — Task management system for completion events

## Design Spec Reference

- Spec 8.3 — Agent performance metrics

---

## Design Decisions Finalized

- **D2 — Quality Scoring:** Pluggable `QualityScoringStrategy` — layered combination as initial implementation.
- **D3 — Collaboration Scoring:** Pluggable `CollaborationScoringStrategy` — automated behavioral telemetry as initial.
- **D11 — Rolling Windows:** Pluggable `MetricsWindowStrategy` protocol. Initial: multiple simultaneous windows (7d, 30d, 90d). Min 5 data points per window; below that, report "insufficient data."
- **D12 — Trend Detection:** Pluggable `TrendDetectionStrategy` protocol. Initial: Theil-Sen regression slope per window + configurable thresholds (improving/stable/declining). Min 5 data points.

**Common pattern:** All strategies use pluggable protocol interfaces with one initial implementation. Alternative strategies are documented in DESIGN_SPEC.md for future.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement agent performance tracking and metrics collection #47

Context

Acceptance Criteria

Dependencies

Design Spec Reference

Design Decisions Finalized

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Implement agent performance tracking and metrics collection #47

Description

Context

Acceptance Criteria

Dependencies

Design Spec Reference

Design Decisions Finalized

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions