Skip to content

Commit 2d091ea

Browse files
authored
feat: add HR engine and performance tracking (#45, #47) (#193)
## Summary - **HR Engine**: Full hiring pipeline (request → candidate generation → approval → instantiation), onboarding checklist management, offboarding pipeline (task reassignment → memory archival → team notification → termination), agent registry service - **Performance Tracking**: Task metric recording, collaboration scoring (behavioral telemetry strategy), quality scoring (CI-based strategy), Theil-Sen robust trend detection, multi-window rolling metrics (7d/30d/90d) - **Persistence**: SQLite repositories for lifecycle events, task metrics, and collaboration metrics (extracted to `hr_repositories.py`) - **Pre-PR Review Fixes** (9 agents, 57 findings addressed): - Critical bug fix: trend detection now filters records per time window - Re-instantiation vulnerability guard - Narrowed `except Exception` blocks to specific types across 5+ locations - Added missing `logger.warning()` before raises per CLAUDE.md rules - Fixed `NotBlankStr` type violations, added model validators - Updated DESIGN_SPEC.md, CLAUDE.md, README.md Closes #45, closes #47 ## Test plan - [x] All 5502 unit tests pass (`uv run pytest tests/ -n auto`) - [x] mypy strict passes (`uv run mypy src/ tests/`) - [x] ruff lint + format clean - [x] Pre-existing flaky test `test_circuit_breaker_after_max_errors` (not related to this PR) - [x] Coverage at 41% (pre-existing, not caused by this PR) ## Review coverage Pre-reviewed by 9 agents: code-reviewer, python-reviewer, pr-test-analyzer, silent-failure-hunter, comment-analyzer, type-design-analyzer, logging-audit, resilience-audit, docs-consistency. 57 findings triaged and implemented. 🤖 Generated with [Claude Code](https://claude.com/claude-code)
1 parent 83b7b6c commit 2d091ea

65 files changed

Lines changed: 9419 additions & 13 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

CLAUDE.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -50,6 +50,7 @@ src/ai_company/
5050
config/ # YAML company config loading and validation
5151
core/ # Shared domain models and base classes
5252
engine/ # Agent orchestration, execution loops, parallel execution, task decomposition, routing, task assignment, task lifecycle, recovery, shutdown, workspace isolation, and coordination error classification
53+
hr/ # HR engine: hiring, firing, onboarding, offboarding, agent registry, performance tracking (task metrics, collaboration scoring, trend detection)
5354
memory/ # Persistent agent memory (Mem0 initial, custom stack future — ADR-001), retrieval pipeline (ranking, injection, context formatting), shared org memory (org/), consolidation/archival (consolidation/)
5455
persistence/ # Operational data persistence — pluggable PersistenceBackend protocol, SQLite initial (§7.6)
5556
observability/ # Structured logging, correlation tracking, log sinks
@@ -83,7 +84,7 @@ src/ai_company/
8384
- **Every module** with business logic MUST have: `from ai_company.observability import get_logger` then `logger = get_logger(__name__)`
8485
- **Never** use `import logging` / `logging.getLogger()` / `print()` in application code
8586
- **Variable name**: always `logger` (not `_logger`, not `log`)
86-
- **Event names**: always use constants from the domain-specific module under `ai_company.observability.events` (e.g. `PROVIDER_CALL_START` from `events.provider`, `BUDGET_RECORD_ADDED` from `events.budget`, `CFO_ANOMALY_DETECTED` from `events.cfo`, `CONFLICT_DETECTED` from `events.conflict`, `MEETING_STARTED` from `events.meeting`, `CLASSIFICATION_START` from `events.classification`, `CONSOLIDATION_START` from `events.consolidation`, `ORG_MEMORY_QUERY_START` from `events.org_memory`, `API_REQUEST_STARTED` from `events.api`, `CODE_RUNNER_EXECUTE_START` from `events.code_runner`, `DOCKER_EXECUTE_START` from `events.docker`, `MCP_INVOKE_START` from `events.mcp`). Import directly: `from ai_company.observability.events.<domain> import EVENT_CONSTANT`
87+
- **Event names**: always use constants from the domain-specific module under `ai_company.observability.events` (e.g. `PROVIDER_CALL_START` from `events.provider`, `BUDGET_RECORD_ADDED` from `events.budget`, `CFO_ANOMALY_DETECTED` from `events.cfo`, `CONFLICT_DETECTED` from `events.conflict`, `MEETING_STARTED` from `events.meeting`, `CLASSIFICATION_START` from `events.classification`, `CONSOLIDATION_START` from `events.consolidation`, `ORG_MEMORY_QUERY_START` from `events.org_memory`, `API_REQUEST_STARTED` from `events.api`, `CODE_RUNNER_EXECUTE_START` from `events.code_runner`, `DOCKER_EXECUTE_START` from `events.docker`, `MCP_INVOKE_START` from `events.mcp`, `SECURITY_EVALUATE_START` from `events.security`, `HR_HIRING_REQUEST_CREATED` from `events.hr`, `PERF_METRIC_RECORDED` from `events.performance`). Import directly: `from ai_company.observability.events.<domain> import EVENT_CONSTANT`
8788
- **Structured kwargs**: always `logger.info(EVENT, key=value)` — never `logger.info("msg %s", val)`
8889
- **All error paths** must log at WARNING or ERROR with context before raising
8990
- **All state transitions** must log at INFO

DESIGN_SPEC.md

Lines changed: 40 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -79,9 +79,9 @@ The MVP validates the core hypothesis: **a single agent can complete a real task
7979

8080
> **How to read this spec:** Sections describe the full vision. Each section with deferred features includes an **MVP** callout box indicating what ships in M3 and what is deferred. The full design is documented upfront to inform architecture decisions — protocol interfaces are designed even for features that won't be built until later milestones.
8181
82-
> **Implementation snapshot (2026-03-09):**
83-
> - **Done:** M0–M6 (tooling, config/core, providers, single-agent engine, multi-agent orchestration, API/CLI surface). Memory layer backend selected ([ADR-001](docs/decisions/ADR-001-memory-layer.md)). Persistence backend (§7.6) completed. Memory retrieval pipeline (#41: ranking, token-budget formatting, context injection) complete. Budget enforcement complete (BudgetEnforcer + configurable cost tiers + quota/subscription tracking). CFO cost optimization complete (CostOptimizer: anomaly detection, efficiency analysis, downgrade recommendations, routing optimization, approval decisions; ReportGenerator: multi-dimensional spending reports). Shared org memory (#125: HybridPromptRetrievalBackend, OrgFactStore, access control, factory) complete. Memory consolidation/archival (#48: ConsolidationService, SimpleConsolidationStrategy, RetentionEnforcer, ArchivalStore protocol) complete.
84-
> - **In progress:** M7 — Docker sandbox (#50), MCP bridge (#53), code runner implemented. Security + approval system not started.
82+
> **Implementation snapshot (2026-03-10):**
83+
> - **Done:** M0–M6 (tooling, config/core, providers, single-agent engine, multi-agent orchestration, API/CLI surface) + Docker sandbox (#50), MCP bridge (#53), code runner + HR engine (hiring/firing/onboarding/offboarding/registry) + performance tracking (task metrics, quality scoring, collaboration scoring, trend detection, rolling windows). Memory layer backend selected ([ADR-001](docs/decisions/ADR-001-memory-layer.md)). Persistence backend (§7.6) completed. Memory retrieval pipeline (#41: ranking, token-budget formatting, context injection) complete. Budget enforcement complete (BudgetEnforcer + configurable cost tiers + quota/subscription tracking). CFO cost optimization complete (CostOptimizer: anomaly detection, efficiency analysis, downgrade recommendations, routing optimization, approval decisions; ReportGenerator: multi-dimensional spending reports). Shared org memory (#125: HybridPromptRetrievalBackend, OrgFactStore, access control, factory) complete. Memory consolidation/archival (#48: ConsolidationService, SimpleConsolidationStrategy, RetentionEnforcer, ArchivalStore protocol) complete.
84+
> - **Remaining:** M7 security + approval system (SecOps agent, progressive trust, JWT/OAuth auth).
8585
8686
### 1.5 Configuration Philosophy
8787

@@ -1652,6 +1652,13 @@ Strategy selection via config: `memory.retrieval.strategy: context | tool_based
16521652

16531653
> **MVP: Not in M3–M4.** HR features (hiring, firing, performance tracking, promotions) are M5–M7. Agent workforce is configured manually via YAML in early milestones.
16541654

1655+
> **Implementation note (M7):** Hiring pipeline (`HiringService`), offboarding pipeline
1656+
> (`OffboardingService`), onboarding checklists (`OnboardingService`), and agent registry
1657+
> (`AgentRegistryService`) are now implemented. Performance tracking subsystem
1658+
> (`hr/performance/`) complete with pluggable quality scoring, collaboration scoring,
1659+
> trend detection, and multi-window aggregation. Promotions/demotions (section 8.4)
1660+
> remain unimplemented.
1661+
16551662
### 8.1 Hiring Process
16561663

16571664
The HR system manages the agent workforce dynamically:
@@ -2823,7 +2830,33 @@ ai-company/
28232830
│ │ │ ├── scorer.py # AgentTaskScorer (skill/role/seniority matching)
28242831
│ │ │ ├── service.py # TaskRoutingService (routes subtasks to agents)
28252832
│ │ │ └── topology_selector.py # TopologySelector (auto coordination topology)
2826-
│ │ └── hr_engine.py # Hiring, firing, performance (M7)
2833+
│ ├── hr/ # HR engine: hiring, firing, onboarding, offboarding, agent registry, performance tracking
2834+
│ │ ├── __init__.py # Package exports
2835+
│ │ ├── enums.py # HR enumerations (HiringRequestStatus, FiringReason, OnboardingStep, LifecycleEventType, TrendDirection)
2836+
│ │ ├── errors.py # HR error hierarchy
2837+
│ │ ├── models.py # CandidateCard, HiringRequest, FiringRequest, OnboardingChecklist, OffboardingRecord, AgentLifecycleEvent
2838+
│ │ ├── registry.py # AgentRegistryService (agent lifecycle registry)
2839+
│ │ ├── hiring_service.py # HiringService (request → generate candidate → approval → instantiate)
2840+
│ │ ├── onboarding_service.py # OnboardingService (checklist management)
2841+
│ │ ├── offboarding_service.py # OffboardingService (reassign → archive → notify → terminate)
2842+
│ │ ├── archival_protocol.py # MemoryArchivalStrategy protocol
2843+
│ │ ├── full_snapshot_strategy.py # FullSnapshotArchivalStrategy
2844+
│ │ ├── reassignment_protocol.py # TaskReassignmentStrategy protocol
2845+
│ │ ├── queue_return_strategy.py # QueueReturnReassignmentStrategy
2846+
│ │ ├── persistence_protocol.py # HR-specific repository protocols
2847+
│ │ └── performance/ # Performance tracking subsystem
2848+
│ │ ├── __init__.py # Package exports
2849+
│ │ ├── models.py # TaskMetricRecord, CollaborationMetricRecord, WindowMetrics, TrendResult, etc.
2850+
│ │ ├── config.py # PerformanceConfig
2851+
│ │ ├── tracker.py # PerformanceTracker service
2852+
│ │ ├── quality_protocol.py # QualityScorer protocol
2853+
│ │ ├── ci_quality_strategy.py # CiQualityScorer (CI-based quality scoring)
2854+
│ │ ├── collaboration_protocol.py # CollaborationScorer protocol
2855+
│ │ ├── behavioral_collaboration_strategy.py # BehavioralCollaborationScorer
2856+
│ │ ├── trend_protocol.py # TrendDetector protocol
2857+
│ │ ├── theil_sen_strategy.py # TheilSenTrendDetector (robust trend detection)
2858+
│ │ ├── window_protocol.py # WindowAggregator protocol
2859+
│ │ └── multi_window_strategy.py # MultiWindowAggregator (multi-window rolling metrics)
28272860
│ ├── communication/ # Inter-agent communication
28282861
│ │ ├── bus_memory.py # InMemoryMessageBus implementation
28292862
│ │ ├── bus_protocol.py # MessageBus protocol interface
@@ -2921,6 +2954,7 @@ ai-company/
29212954
│ │ ├── __init__.py # Package exports
29222955
│ │ ├── backend.py # SQLitePersistenceBackend
29232956
│ │ ├── repositories.py # SQLite repository implementations
2957+
│ │ ├── hr_repositories.py # SQLite HR repositories (LifecycleEvent, TaskMetricRecord, CollaborationMetricRecord)
29242958
│ │ └── migrations.py # Schema migrations (user_version pragma)
29252959
│ ├── observability/ # Structured logging & correlation
29262960
│ │ ├── __init__.py # get_logger() entry point
@@ -2944,10 +2978,12 @@ ai-company/
29442978
│ │ │ ├── decomposition.py # DECOMPOSITION_* constants
29452979
│ │ │ ├── execution.py # EXECUTION_* constants
29462980
│ │ │ ├── git.py # GIT_* constants
2981+
│ │ │ ├── hr.py # HR_* constants
29472982
│ │ │ ├── meeting.py # MEETING_* constants
29482983
│ │ │ ├── memory.py # MEMORY_* constants
29492984
│ │ │ ├── org_memory.py # ORG_MEMORY_* constants
29502985
│ │ │ ├── parallel.py # PARALLEL_* constants
2986+
│ │ │ ├── performance.py # PERF_* constants
29512987
│ │ │ ├── persistence.py # PERSISTENCE_* constants
29522988
│ │ │ ├── personality.py # PERSONALITY_* constants
29532989
│ │ │ ├── prompt.py # PROMPT_* constants

README.md

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ AI Company lets you spin up a virtual organization staffed entirely by AI agents
1010

1111
## Current Capability Snapshot
1212

13-
### Implemented (M0–M6 complete)
13+
### Implemented (M0–M6 complete, M7 HR partial)
1414

1515
- **Company Config + Core Models** - Strong Pydantic validation, immutable config models, runtime state models
1616
- **Provider Layer** - LiteLLM-based provider abstraction with routing, retry, and rate limiting
@@ -28,17 +28,19 @@ AI Company lets you spin up a virtual organization staffed entirely by AI agents
2828
- **Human Approval Queue (M6)** - Approval submission, approve/reject with reason, list/filter by status, WebSocket notifications for approval events
2929
- **WebSocket Real-Time Feed (M6)** - Channel-based subscriptions (tasks, agents, budget, messages, system, approvals), per-channel payload filters, message-bus bridge
3030
- **Route Guards (M6)** - Role-based read/write access control (stub auth for M6; real JWT/OAuth planned for M7)
31+
- **HR Engine (M7)** - Hiring pipeline (request → generate candidate → approval → instantiate), onboarding checklists, offboarding pipeline (reassign → archive → notify → terminate), agent registry
32+
- **Performance Tracking (M7)** - Task metrics, CI-based quality scoring, behavioral collaboration scoring, Theil-Sen robust trend detection, multi-window rolling metric aggregation
3133

3234
### Not implemented yet (planned milestones)
3335

3436
- **Memory Backend Adapter (M5)** - Memory protocols, retrieval pipeline, org memory, and consolidation are complete; initial Mem0 adapter backend ([ADR-001](docs/decisions/ADR-001-memory-layer.md)) pending; research backends (GraphRAG, Temporal KG) planned
3537
- **CLI Surface** - `cli/` package is placeholder-only
3638
- **Security/Approval System (M7)** - SecOps agent with rule engine (soft-allow/hard-deny, fail-closed), audit log, output scanner, risk classifier, and ToolInvoker integration are implemented; real authentication (JWT/OAuth), progressive trust, and approval workflow gates are planned
37-
- **Advanced Product Surface** - web dashboard, HR workflows, and external integrations
39+
- **Advanced Product Surface** - web dashboard, external integrations
3840

3941
## Status
4042

41-
**M7: Security & HR** in progress (M0–M6 all done). See [DESIGN_SPEC.md](DESIGN_SPEC.md) for the full high-level specification.
43+
**M7: Security & Approval** partially complete — Docker sandbox, MCP bridge, code runner, SecOps agent, HR engine + performance tracking done; authentication/approval remain. See [DESIGN_SPEC.md](DESIGN_SPEC.md) for the full high-level specification.
4244

4345
## Tech Stack
4446

src/ai_company/communication/enums.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,7 @@ class MessageType(StrEnum):
1818
STATUS_REPORT = "status_report"
1919
ESCALATION = "escalation"
2020
MEETING_CONTRIBUTION = "meeting_contribution"
21+
HR_NOTIFICATION = "hr_notification"
2122

2223

2324
class MessagePriority(StrEnum):

src/ai_company/core/enums.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -68,6 +68,7 @@ class AgentStatus(StrEnum):
6868
"""Lifecycle status of an agent."""
6969

7070
ACTIVE = "active"
71+
ONBOARDING = "onboarding"
7172
ON_LEAVE = "on_leave"
7273
TERMINATED = "terminated"
7374

src/ai_company/hr/__init__.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
"""HR engine — agent lifecycle management and performance tracking."""
Lines changed: 77 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,77 @@
1+
"""Memory archival strategy protocol.
2+
3+
Defines the interface for pluggable strategies that handle
4+
agent memory archival during offboarding (D10).
5+
"""
6+
7+
from typing import Protocol, runtime_checkable
8+
9+
from pydantic import BaseModel, ConfigDict, Field
10+
11+
from ai_company.core.enums import SeniorityLevel # noqa: TC001
12+
from ai_company.core.types import NotBlankStr # noqa: TC001
13+
from ai_company.memory.consolidation.archival import ArchivalStore # noqa: TC001
14+
from ai_company.memory.org.protocol import OrgMemoryBackend # noqa: TC001
15+
from ai_company.memory.protocol import MemoryBackend # noqa: TC001
16+
17+
18+
class ArchivalResult(BaseModel):
19+
"""Result of a memory archival operation.
20+
21+
Attributes:
22+
agent_id: Agent whose memories were archived.
23+
total_archived: Number of memories archived.
24+
promoted_to_org: Number promoted to org memory.
25+
hot_store_cleaned: Whether the hot store was cleaned.
26+
strategy_name: Name of the archival strategy used.
27+
"""
28+
29+
model_config = ConfigDict(frozen=True, allow_inf_nan=False)
30+
31+
agent_id: NotBlankStr = Field(description="Agent whose memories were archived")
32+
total_archived: int = Field(ge=0, description="Memories archived")
33+
promoted_to_org: int = Field(ge=0, description="Promoted to org memory")
34+
hot_store_cleaned: bool = Field(description="Hot store cleaned")
35+
strategy_name: NotBlankStr = Field(description="Archival strategy used")
36+
37+
38+
@runtime_checkable
39+
class MemoryArchivalStrategy(Protocol):
40+
"""Strategy for archiving agent memories during offboarding.
41+
42+
Implementations handle the complete memory archival pipeline:
43+
retrieving from hot store, archiving to cold store, optionally
44+
promoting to org memory, and cleaning up the hot store.
45+
"""
46+
47+
@property
48+
def name(self) -> str:
49+
"""Human-readable strategy name."""
50+
...
51+
52+
async def archive(
53+
self,
54+
*,
55+
agent_id: NotBlankStr,
56+
memory_backend: MemoryBackend,
57+
archival_store: ArchivalStore,
58+
org_memory_backend: OrgMemoryBackend | None = None,
59+
agent_seniority: SeniorityLevel | None = None,
60+
) -> ArchivalResult:
61+
"""Archive all memories for a departing agent.
62+
63+
Args:
64+
agent_id: Agent whose memories to archive.
65+
memory_backend: Hot memory store.
66+
archival_store: Cold archival storage.
67+
org_memory_backend: Optional org memory for promotion.
68+
agent_seniority: Seniority level of the departing agent.
69+
Required for org memory promotion (skipped if None).
70+
71+
Returns:
72+
Result of the archival operation.
73+
74+
Raises:
75+
MemoryArchivalError: If retrieval from hot store fails.
76+
"""
77+
...

src/ai_company/hr/enums.py

Lines changed: 48 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,48 @@
1+
"""HR domain enumerations."""
2+
3+
from enum import StrEnum
4+
5+
6+
class HiringRequestStatus(StrEnum):
7+
"""Status of a hiring request through the approval pipeline."""
8+
9+
PENDING = "pending"
10+
APPROVED = "approved"
11+
REJECTED = "rejected"
12+
INSTANTIATED = "instantiated"
13+
14+
15+
class FiringReason(StrEnum):
16+
"""Reason for agent termination."""
17+
18+
MANUAL = "manual"
19+
PERFORMANCE = "performance"
20+
BUDGET = "budget"
21+
PROJECT_COMPLETION = "project_completion"
22+
23+
24+
class OnboardingStep(StrEnum):
25+
"""Steps in the agent onboarding checklist."""
26+
27+
COMPANY_CONTEXT = "company_context"
28+
PROJECT_BRIEFING = "project_briefing"
29+
TEAM_INTRODUCTIONS = "team_introductions"
30+
31+
32+
class LifecycleEventType(StrEnum):
33+
"""Type of agent lifecycle event."""
34+
35+
HIRED = "hired"
36+
ONBOARDED = "onboarded"
37+
FIRED = "fired"
38+
OFFBOARDED = "offboarded"
39+
STATUS_CHANGED = "status_changed"
40+
41+
42+
class TrendDirection(StrEnum):
43+
"""Direction of a performance metric trend."""
44+
45+
IMPROVING = "improving"
46+
STABLE = "stable"
47+
DECLINING = "declining"
48+
INSUFFICIENT_DATA = "insufficient_data"

0 commit comments

Comments
 (0)