-
Notifications
You must be signed in to change notification settings - Fork 0
Implement agent performance tracking and metrics collection #47
Copy link
Copy link
Closed
Labels
prio:highImportant, should be prioritizedImportant, should be prioritizedscope:medium1-3 days of work1-3 days of workspec:agent-systemDESIGN_SPEC Section 3 - Agent SystemDESIGN_SPEC Section 3 - Agent Systemspec:budgetDESIGN_SPEC Section 10 - Cost & Budget ManagementDESIGN_SPEC Section 10 - Cost & Budget Managementspec:hrDESIGN_SPEC Section 8 - HR & Workforce ManagementDESIGN_SPEC Section 8 - HR & Workforce Managementspec:human-interactionDESIGN_SPEC Section 13 - Human Interaction LayerDESIGN_SPEC Section 13 - Human Interaction Layerspec:providersDESIGN_SPEC Section 9 - Model Provider LayerDESIGN_SPEC Section 9 - Model Provider Layerspec:task-workflowDESIGN_SPEC Section 6 - Task & Workflow EngineDESIGN_SPEC Section 6 - Task & Workflow Enginetype:featureNew feature implementationNew feature implementationtype:testTest coverage, test infrastructureTest coverage, test infrastructure
Description
Context
Implement the performance tracking system defined in spec 8.3 that collects and aggregates metrics for each agent. These metrics drive hiring/firing decisions, promotions, and progressive trust.
Metrics to track:
tasks_completed/tasks_failed— Task completion countsavg_quality_score— Average quality rating across completed tasksavg_cost_per_task— Average LLM/tool cost per taskavg_completion_time— Average time from assignment to completioncollaboration_score— Effectiveness in multi-agent interactions
Acceptance Criteria
- Performance metrics model matching spec 8.3 fields
- Automatic metric collection triggered by task completion events
- Quality scoring from code reviews, output evaluation, and peer feedback
- Cost efficiency tracking (tokens used, tool invocations, total cost per task)
- Time tracking (assignment to completion duration)
- Collaboration score computation (message quality, helpfulness ratings)
- Queryable by agent, department, project, and time range
- Rolling averages and trend detection (improving/declining)
- Metrics persistence (survives restarts)
- Unit tests for metric collection and aggregation
Dependencies
- Implement HR engine (hiring, firing, onboarding, offboarding flows) #45 — HR engine must be implemented
- Initialize project structure with pyproject.toml and src layout #20 — Task management system for completion events
Design Spec Reference
- Spec 8.3 — Agent performance metrics
Design Decisions Finalized
- D2 — Quality Scoring: Pluggable
QualityScoringStrategy— layered combination as initial implementation. - D3 — Collaboration Scoring: Pluggable
CollaborationScoringStrategy— automated behavioral telemetry as initial. - D11 — Rolling Windows: Pluggable
MetricsWindowStrategyprotocol. Initial: multiple simultaneous windows (7d, 30d, 90d). Min 5 data points per window; below that, report "insufficient data." - D12 — Trend Detection: Pluggable
TrendDetectionStrategyprotocol. Initial: Theil-Sen regression slope per window + configurable thresholds (improving/stable/declining). Min 5 data points.
Common pattern: All strategies use pluggable protocol interfaces with one initial implementation. Alternative strategies are documented in DESIGN_SPEC.md for future.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
prio:highImportant, should be prioritizedImportant, should be prioritizedscope:medium1-3 days of work1-3 days of workspec:agent-systemDESIGN_SPEC Section 3 - Agent SystemDESIGN_SPEC Section 3 - Agent Systemspec:budgetDESIGN_SPEC Section 10 - Cost & Budget ManagementDESIGN_SPEC Section 10 - Cost & Budget Managementspec:hrDESIGN_SPEC Section 8 - HR & Workforce ManagementDESIGN_SPEC Section 8 - HR & Workforce Managementspec:human-interactionDESIGN_SPEC Section 13 - Human Interaction LayerDESIGN_SPEC Section 13 - Human Interaction Layerspec:providersDESIGN_SPEC Section 9 - Model Provider LayerDESIGN_SPEC Section 9 - Model Provider Layerspec:task-workflowDESIGN_SPEC Section 6 - Task & Workflow EngineDESIGN_SPEC Section 6 - Task & Workflow Enginetype:featureNew feature implementationNew feature implementationtype:testTest coverage, test infrastructureTest coverage, test infrastructure