Skip to content

Implement agent performance tracking and metrics collection #47

@Aureliolo

Description

@Aureliolo

Context

Implement the performance tracking system defined in spec 8.3 that collects and aggregates metrics for each agent. These metrics drive hiring/firing decisions, promotions, and progressive trust.

Metrics to track:

  • tasks_completed / tasks_failed — Task completion counts
  • avg_quality_score — Average quality rating across completed tasks
  • avg_cost_per_task — Average LLM/tool cost per task
  • avg_completion_time — Average time from assignment to completion
  • collaboration_score — Effectiveness in multi-agent interactions

Acceptance Criteria

  • Performance metrics model matching spec 8.3 fields
  • Automatic metric collection triggered by task completion events
  • Quality scoring from code reviews, output evaluation, and peer feedback
  • Cost efficiency tracking (tokens used, tool invocations, total cost per task)
  • Time tracking (assignment to completion duration)
  • Collaboration score computation (message quality, helpfulness ratings)
  • Queryable by agent, department, project, and time range
  • Rolling averages and trend detection (improving/declining)
  • Metrics persistence (survives restarts)
  • Unit tests for metric collection and aggregation

Dependencies

Design Spec Reference

  • Spec 8.3 — Agent performance metrics

Design Decisions Finalized

  • D2 — Quality Scoring: Pluggable QualityScoringStrategy — layered combination as initial implementation.
  • D3 — Collaboration Scoring: Pluggable CollaborationScoringStrategy — automated behavioral telemetry as initial.
  • D11 — Rolling Windows: Pluggable MetricsWindowStrategy protocol. Initial: multiple simultaneous windows (7d, 30d, 90d). Min 5 data points per window; below that, report "insufficient data."
  • D12 — Trend Detection: Pluggable TrendDetectionStrategy protocol. Initial: Theil-Sen regression slope per window + configurable thresholds (improving/stable/declining). Min 5 data points.

Common pattern: All strategies use pluggable protocol interfaces with one initial implementation. Alternative strategies are documented in DESIGN_SPEC.md for future.

Metadata

Metadata

Assignees

No one assigned

    Labels

    prio:highImportant, should be prioritizedscope:medium1-3 days of workspec:agent-systemDESIGN_SPEC Section 3 - Agent Systemspec:budgetDESIGN_SPEC Section 10 - Cost & Budget Managementspec:hrDESIGN_SPEC Section 8 - HR & Workforce Managementspec:human-interactionDESIGN_SPEC Section 13 - Human Interaction Layerspec:providersDESIGN_SPEC Section 9 - Model Provider Layerspec:task-workflowDESIGN_SPEC Section 6 - Task & Workflow Enginetype:featureNew feature implementationtype:testTest coverage, test infrastructure

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions