Skip to content

feat: add HR engine and performance tracking (#45, #47)#193

Merged
Aureliolo merged 3 commits intomainfrom
feat/hr-engine-and-performance-tracking
Mar 10, 2026
Merged

feat: add HR engine and performance tracking (#45, #47)#193
Aureliolo merged 3 commits intomainfrom
feat/hr-engine-and-performance-tracking

Conversation

@Aureliolo
Copy link
Copy Markdown
Owner

Summary

  • HR Engine: Full hiring pipeline (request → candidate generation → approval → instantiation), onboarding checklist management, offboarding pipeline (task reassignment → memory archival → team notification → termination), agent registry service
  • Performance Tracking: Task metric recording, collaboration scoring (behavioral telemetry strategy), quality scoring (CI-based strategy), Theil-Sen robust trend detection, multi-window rolling metrics (7d/30d/90d)
  • Persistence: SQLite repositories for lifecycle events, task metrics, and collaboration metrics (extracted to hr_repositories.py)
  • Pre-PR Review Fixes (9 agents, 57 findings addressed):
    • Critical bug fix: trend detection now filters records per time window
    • Re-instantiation vulnerability guard
    • Narrowed except Exception blocks to specific types across 5+ locations
    • Added missing logger.warning() before raises per CLAUDE.md rules
    • Fixed NotBlankStr type violations, added model validators
    • Updated DESIGN_SPEC.md, CLAUDE.md, README.md

Closes #45, closes #47

Test plan

  • All 5502 unit tests pass (uv run pytest tests/ -n auto)
  • mypy strict passes (uv run mypy src/ tests/)
  • ruff lint + format clean
  • Pre-existing flaky test test_circuit_breaker_after_max_errors (not related to this PR)
  • Coverage at 41% (pre-existing, not caused by this PR)

Review coverage

Pre-reviewed by 9 agents: code-reviewer, python-reviewer, pr-test-analyzer, silent-failure-hunter, comment-analyzer, type-design-analyzer, logging-audit, resilience-audit, docs-consistency. 57 findings triaged and implemented.

🤖 Generated with Claude Code

Copilot AI review requested due to automatic review settings March 10, 2026 08:00
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 10, 2026

Dependency Review

✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.

Scanned Files

None

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 10, 2026

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 334d218f-afad-4bc4-b69b-f996fd1203e3

📥 Commits

Reviewing files that changed from the base of the PR and between 48f28cf and f2c49f7.

📒 Files selected for processing (65)
  • CLAUDE.md
  • DESIGN_SPEC.md
  • README.md
  • src/ai_company/communication/enums.py
  • src/ai_company/core/enums.py
  • src/ai_company/hr/__init__.py
  • src/ai_company/hr/archival_protocol.py
  • src/ai_company/hr/enums.py
  • src/ai_company/hr/errors.py
  • src/ai_company/hr/full_snapshot_strategy.py
  • src/ai_company/hr/hiring_service.py
  • src/ai_company/hr/models.py
  • src/ai_company/hr/offboarding_service.py
  • src/ai_company/hr/onboarding_service.py
  • src/ai_company/hr/performance/__init__.py
  • src/ai_company/hr/performance/behavioral_collaboration_strategy.py
  • src/ai_company/hr/performance/ci_quality_strategy.py
  • src/ai_company/hr/performance/collaboration_protocol.py
  • src/ai_company/hr/performance/config.py
  • src/ai_company/hr/performance/models.py
  • src/ai_company/hr/performance/multi_window_strategy.py
  • src/ai_company/hr/performance/quality_protocol.py
  • src/ai_company/hr/performance/theil_sen_strategy.py
  • src/ai_company/hr/performance/tracker.py
  • src/ai_company/hr/performance/trend_protocol.py
  • src/ai_company/hr/performance/window_protocol.py
  • src/ai_company/hr/persistence_protocol.py
  • src/ai_company/hr/queue_return_strategy.py
  • src/ai_company/hr/reassignment_protocol.py
  • src/ai_company/hr/registry.py
  • src/ai_company/observability/events/hr.py
  • src/ai_company/observability/events/performance.py
  • src/ai_company/observability/events/persistence.py
  • src/ai_company/persistence/protocol.py
  • src/ai_company/persistence/repositories.py
  • src/ai_company/persistence/sqlite/backend.py
  • src/ai_company/persistence/sqlite/hr_repositories.py
  • src/ai_company/persistence/sqlite/migrations.py
  • src/ai_company/persistence/sqlite/repositories.py
  • tests/unit/api/conftest.py
  • tests/unit/communication/test_enums.py
  • tests/unit/core/test_enums.py
  • tests/unit/hr/__init__.py
  • tests/unit/hr/conftest.py
  • tests/unit/hr/performance/__init__.py
  • tests/unit/hr/performance/conftest.py
  • tests/unit/hr/performance/test_behavioral_collaboration_strategy.py
  • tests/unit/hr/performance/test_ci_quality_strategy.py
  • tests/unit/hr/performance/test_models.py
  • tests/unit/hr/performance/test_multi_window_strategy.py
  • tests/unit/hr/performance/test_theil_sen_strategy.py
  • tests/unit/hr/performance/test_tracker.py
  • tests/unit/hr/test_enums.py
  • tests/unit/hr/test_errors.py
  • tests/unit/hr/test_full_snapshot_strategy.py
  • tests/unit/hr/test_hiring_service.py
  • tests/unit/hr/test_models.py
  • tests/unit/hr/test_offboarding_service.py
  • tests/unit/hr/test_onboarding_service.py
  • tests/unit/hr/test_persistence.py
  • tests/unit/hr/test_queue_return_strategy.py
  • tests/unit/hr/test_registry.py
  • tests/unit/observability/test_events.py
  • tests/unit/persistence/test_migrations_v2.py
  • tests/unit/persistence/test_protocol.py

📝 Walkthrough

Summary by CodeRabbit

  • New Features

    • Introduces a full HR domain: hiring pipeline, candidate generation, approvals, onboarding, offboarding, registry, and archival workflows.
    • Adds performance tracking: task metrics, quality scoring, collaboration scoring, multi-window aggregates, and trend detection.
    • Adds HR persistence (SQLite) and observability events for HR and performance workflows.
    • Adds new agent status ONBOARDING and HR notification message type.
  • Documentation

    • Updated docs and README to reflect HR Engine and Performance Tracking.
  • Tests

    • Comprehensive unit tests for HR, performance, persistence, and services.

Walkthrough

Adds a new modular HR package and performance subsystem: hiring, onboarding, offboarding, archival and registry services; pluggable protocols and strategies for reassignment, archival, quality, collaboration, windowing and trend detection; SQLite-backed HR persistence (v2 migration); observability event constants; and extensive unit tests and fixtures.

Changes

Cohort / File(s) Summary
HR Package (core services & init)
src/ai_company/hr/__init__.py, src/ai_company/hr/{hiring_service.py,onboarding_service.py,offboarding_service.py,registry.py}
Adds ai_company.hr package and core services: HiringService, OnboardingService, OffboardingService, and AgentRegistryService implementing workflows for hiring, onboarding, offboarding, and agent registration/status management.
HR Domain Models, Enums, Errors
src/ai_company/hr/{models.py,enums.py,errors.py}
Introduces frozen Pydantic domain models (CandidateCard, HiringRequest, FiringRequest, OnboardingChecklist, OffboardingRecord, AgentLifecycleEvent, performance models), enums for HR flows, and a hierarchical HR error taxonomy.
Archival & Reassignment
src/ai_company/hr/{archival_protocol.py,full_snapshot_strategy.py,queue_return_strategy.py,reassignment_protocol.py}
Adds MemoryArchivalStrategy protocol and FullSnapshotStrategy implementation; TaskReassignmentStrategy protocol and QueueReturnStrategy implementation with per-entry error handling and observability.
Performance subsystem (package & protocols)
src/ai_company/hr/performance/{__init__.py,models.py,config.py,tracker.py,quality_protocol.py,collaboration_protocol.py,window_protocol.py,trend_protocol.py}
New performance package: models, config, PerformanceTracker service, and pluggable protocols for quality scoring, collaboration scoring, windowing, and trend detection.
Performance strategies
src/ai_company/hr/performance/{ci_quality_strategy.py,behavioral_collaboration_strategy.py,multi_window_strategy.py,theil_sen_strategy.py}
Concrete strategies: CI signal quality scorer, BehavioralTelemetry collaboration scorer, MultiWindowStrategy for rolling aggregates, and Theil‑Sen trend detector.
Persistence: protocols & SQLite implementations
src/ai_company/hr/persistence_protocol.py, src/ai_company/persistence/protocol.py, src/ai_company/persistence/repositories.py, src/ai_company/persistence/sqlite/{backend.py,hr_repositories.py,migrations.py,repositories.py}
Adds HR persistence protocols, extends PersistenceBackend interface, implements SQLite repositories for lifecycle events, task metrics, collaboration metrics, updates backend wiring and bumps schema to v2 with migrations.
Observability events & enums
src/ai_company/observability/events/{hr.py,performance.py,persistence.py}, src/ai_company/communication/enums.py, src/ai_company/core/enums.py
Adds HR and performance event constants, extends persistence event constants, adds HR_NOTIFICATION MessageType and AgentStatus.ONBOARDING.
Tests & test fixtures
tests/unit/{hr,hr/performance,persistence,api,communication,core,observability}/...
Adds wide test coverage: unit tests for services, strategies, models, persistence (migrations + SQLite repos), protocols, and fixtures/fake repos to support the new HR/perf surface.
Docs & design
CLAUDE.md, DESIGN_SPEC.md, README.md
Updates design doc and README to reflect modular HR package and performance tracking additions and new observability logging constants.

Sequence Diagram(s)

sequenceDiagram
    actor User
    participant HiringService as "HiringService"
    participant ApprovalStore as "ApprovalStore (optional)"
    participant AgentRegistry as "AgentRegistryService"
    participant Onboarding as "OnboardingService (optional)"
    User->>HiringService: create_request(...)
    HiringService->>HiringService: persist request
    User->>HiringService: generate_candidate(request)
    HiringService->>HiringService: append candidate
    User->>HiringService: submit_for_approval(request,candidate_id)
    alt ApprovalStore configured
        HiringService->>ApprovalStore: create approval item
        ApprovalStore-->>HiringService: approval result (async)
    else auto-approve
        HiringService->>HiringService: mark APPROVED
    end
    User->>HiringService: instantiate_agent(request)
    HiringService->>AgentRegistry: register(agent_identity)
    AgentRegistry-->>HiringService: registered
    opt OnboardingService present
        HiringService->>Onboarding: start_onboarding(agent_id)
        Onboarding-->>HiringService: checklist
    end
    HiringService-->>User: AgentIdentity
Loading
sequenceDiagram
    actor User
    participant OffboardingService as "OffboardingService"
    participant Reassign as "TaskReassignmentStrategy"
    participant Archival as "MemoryArchivalStrategy"
    participant MessageBus as "MessageBus (optional)"
    participant AgentRegistry as "AgentRegistryService"

    User->>OffboardingService: offboard(firing_request)
    OffboardingService->>AgentRegistry: get(agent_id)
    OffboardingService->>Reassign: reassign(agent_id, active_tasks)
    Reassign-->>OffboardingService: updated_tasks
    OffboardingService->>Archival: archive(agent_id,...)
    Archival-->>OffboardingService: ArchivalResult
    alt MessageBus configured
        OffboardingService->>MessageBus: publish(notification)
    end
    OffboardingService->>AgentRegistry: unregister(agent_id)
    AgentRegistry-->>OffboardingService: removed
    OffboardingService-->>User: OffboardingRecord
Loading
sequenceDiagram
    participant Tracker as "PerformanceTracker"
    participant Quality as "QualityScoringStrategy"
    participant Collaboration as "CollaborationScoringStrategy"
    participant Window as "MetricsWindowStrategy"
    participant Trend as "TrendDetectionStrategy"

    Tracker->>Tracker: record_task_metric(record)
    Tracker->>Quality: score(agent_id, task_id, task_result, criteria)
    Quality-->>Tracker: QualityScoreResult
    Tracker->>Tracker: record_collaboration_event(record)
    Tracker->>Collaboration: score(agent_id, collab_records)
    Collaboration-->>Tracker: CollaborationScoreResult
    Tracker->>Window: compute_windows(records, now)
    Window-->>Tracker: tuple[WindowMetrics]
    Tracker->>Trend: detect(metric_name, values, window_size)
    Trend-->>Tracker: TrendResult
    Tracker-->>Tracker: assemble AgentPerformanceSnapshot
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related PRs

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/hr-engine-and-performance-tracking
✨ Simplify code
  • Create PR with simplified code
  • Commit simplified code in branch feat/hr-engine-and-performance-tracking

@greptile-apps
Copy link
Copy Markdown

greptile-apps bot commented Mar 10, 2026

Greptile Summary

This PR adds a complete HR engine (hiring pipeline, onboarding/offboarding orchestration, agent registry) and a performance-tracking subsystem (task metrics, collaboration scoring, Theil-Sen trend detection, multi-window rolling aggregates) backed by a new SQLite schema migration (V2). The overall architecture is sound — protocols are well-defined, error handling is explicit, and the previous round of pre-PR reviews has visibly cleaned up broad exception clauses and lock usage.

Key findings:

  • score_task_quality never persists scored records (tracker.py:163): The method returns a new TaskMetricRecord with quality_score set but does not write it back to self._task_metrics. All subsequent get_snapshot() calls will compute overall_quality_score=None and the quality trend will always be empty — the feature is silently broken for any caller following the natural record_task_metric → score_task_quality → get_snapshot flow.
  • Cost trend direction is semantically inverted (tracker.py:315): TheilSenTrendStrategy labels a positive slope IMPROVING for every metric. For quality_score this is correct, but for cost_usd a rising slope means rising cost, which should map to DECLINING. As-is, an agent spending more over time will be reported as improving.
  • memory_archive_id is permanently None (offboarding_service.py:144): OffboardingRecord has a memory_archive_id field for auditability, but ArchivalResult exposes no archive ID, so the field is hardcoded to None and can never be populated without a model change.
  • Wrong log event constant in _get_request (hiring_service.py:94): A "request not found" error is logged under HR_HIRING_REQUEST_CREATED, which will corrupt any monitoring alert or log query watching that event constant for successful creations.

Confidence Score: 2/5

  • Two silent functional bugs in the performance tracker make quality scoring and cost trend direction broken by default; needs fixes before merge.
  • The hiring pipeline, registry, offboarding pipeline, SQLite repositories, and migration are all well-implemented. However, the performance tracker has two logic bugs that silently produce incorrect results: quality scores computed by score_task_quality are never reflected in snapshots or trends, and the cost_usd trend direction is inverted (rising cost is labeled IMPROVING). These are not edge cases — they affect every standard usage of the tracker.
  • src/ai_company/hr/performance/tracker.py requires the most attention for both the score persistence gap and the cost trend inversion. src/ai_company/hr/archival_protocol.py and src/ai_company/hr/offboarding_service.py need a coordinated fix for the missing archive_id.

Important Files Changed

Filename Overview
src/ai_company/hr/performance/tracker.py Two logic bugs: score_task_quality never persists scored records back to self._task_metrics, making all snapshot quality scores and quality-trend computations return None/empty; and the cost trend direction is semantically inverted (rising cost_usd is labeled IMPROVING).
src/ai_company/hr/hiring_service.py Well-structured hiring pipeline with proper guard for re-instantiation; minor style issue: _get_request logs the wrong event constant (HR_HIRING_REQUEST_CREATED) on a not-found error path.
src/ai_company/hr/offboarding_service.py Clean four-step pipeline with correctly scoped non-fatal error handling; memory_archive_id is hardcoded to None because ArchivalResult has no archive ID field, leaving the OffboardingRecord field permanently unpopulated.
src/ai_company/hr/registry.py All operations including reads now hold async with self._lock; clean implementation with correct error types and structured logging throughout.
src/ai_company/hr/performance/theil_sen_strategy.py Correct O(n²) Theil-Sen implementation with proper insufficient-data guard; issue is in how the caller (tracker.py) applies direction semantics to different metrics, not in this file itself.
src/ai_company/hr/performance/ci_quality_strategy.py Log-scaled cost efficiency with configurable cost_budget parameter; clean implementation addressing the previously flagged score-collapse issue.
src/ai_company/persistence/sqlite/hr_repositories.py Well-structured SQLite repositories with correct parameterized queries, proper bool-to-int conversion for SQLite, and explicit error narrowing; INSERT is non-idempotent by design.
src/ai_company/persistence/sqlite/migrations.py Clean incremental migration system using user_version pragma; V2 adds lifecycle_events, task_metrics, and collaboration_metrics with appropriate composite indexes.
src/ai_company/hr/archival_protocol.py ArchivalResult is missing an archive_id field, leaving OffboardingRecord.memory_archive_id permanently None; otherwise protocol definition is clean.

Sequence Diagram

sequenceDiagram
    participant Caller
    participant HiringService
    participant AgentRegistryService
    participant OnboardingService
    participant PerformanceTracker
    participant OffboardingService

    Note over Caller,OffboardingService: Hiring Pipeline
    Caller->>HiringService: create_request(role, dept, level, ...)
    HiringService-->>Caller: HiringRequest (PENDING)
    Caller->>HiringService: generate_candidate(request)
    HiringService-->>Caller: HiringRequest (+ CandidateCard)
    Caller->>HiringService: submit_for_approval(request, candidate_id)
    alt approval_store present
        HiringService->>ApprovalStore: add(ApprovalItem)
    else auto-approve
        HiringService-->>Caller: HiringRequest (APPROVED)
    end
    Caller->>HiringService: instantiate_agent(request)
    HiringService->>AgentRegistryService: register(identity)
    HiringService->>OnboardingService: start_onboarding(agent_id)
    HiringService-->>Caller: AgentIdentity (ACTIVE)

    Note over Caller,PerformanceTracker: Performance Tracking
    Caller->>PerformanceTracker: record_task_metric(record)
    PerformanceTracker-->>Caller: TaskMetricRecord (quality_score=None)
    Caller->>PerformanceTracker: score_task_quality(task_result, criteria)
    PerformanceTracker-->>Caller: TaskMetricRecord (quality_score=X)
    Note right of PerformanceTracker: ⚠️ stored record NOT updated
    Caller->>PerformanceTracker: get_snapshot(agent_id)
    PerformanceTracker-->>Caller: AgentPerformanceSnapshot (quality=None!)

    Note over Caller,OffboardingService: Offboarding Pipeline
    Caller->>OffboardingService: offboard(FiringRequest)
    OffboardingService->>TaskRepository: list_tasks + reassign
    OffboardingService->>ArchivalStrategy: archive(agent_id, ...)
    OffboardingService->>MessageBus: publish(HR_NOTIFICATION)
    OffboardingService->>AgentRegistryService: update_status(TERMINATED)
    OffboardingService-->>Caller: OffboardingRecord (memory_archive_id=None)
Loading

Last reviewed commit: f2c49f7

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new HR subsystem (hiring/onboarding/offboarding/registry) plus performance tracking (metrics, scoring, rolling windows, trend detection), and wires persistence/observability support for the new domain.

Changes:

  • Introduces ai_company.hr package: lifecycle services (registry, hiring, onboarding, offboarding) + pluggable reassignment/archival strategies.
  • Introduces ai_company.hr.performance package: metric models, quality/collaboration scoring, multi-window aggregation, Theil–Sen trend detection, and a tracker service.
  • Extends SQLite persistence schema to v2 and adds SQLite repositories + persistence backend/protocol surface area for HR/performance tables and events.

Reviewed changes

Copilot reviewed 63 out of 65 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
tests/unit/persistence/test_protocol.py Extends persistence protocol compliance fakes to include HR/performance repos.
tests/unit/persistence/test_migrations_v2.py Adds unit tests validating schema v2 migrations, tables, and indexes.
tests/unit/observability/test_events.py Updates expected observability event modules to include hr and performance.
tests/unit/hr/test_registry.py Adds unit tests for AgentRegistryService behavior.
tests/unit/hr/test_queue_return_strategy.py Adds unit tests for queue-return task reassignment strategy.
tests/unit/hr/test_persistence.py Adds unit tests for HR SQLite repositories round-trips and filters.
tests/unit/hr/test_onboarding_service.py Adds unit tests for onboarding checklist lifecycle and activation.
tests/unit/hr/test_offboarding_service.py Adds unit tests for offboarding pipeline orchestration with fakes.
tests/unit/hr/test_models.py Adds unit tests for HR domain model validation and immutability.
tests/unit/hr/test_hiring_service.py Adds unit tests for hiring pipeline stages and error cases.
tests/unit/hr/test_full_snapshot_strategy.py Adds unit tests for snapshot archival strategy, including promotion behavior.
tests/unit/hr/test_errors.py Adds unit tests for HR error hierarchy.
tests/unit/hr/test_enums.py Adds unit tests for HR enums and expected values.
tests/unit/hr/performance/test_theil_sen_strategy.py Adds unit tests for Theil–Sen trend detection behavior.
tests/unit/hr/performance/test_multi_window_strategy.py Adds unit tests for multi-window aggregation and edge cases.
tests/unit/hr/performance/test_ci_quality_strategy.py Adds unit tests for CI-signal quality scoring strategy.
tests/unit/hr/performance/test_behavioral_collaboration_strategy.py Adds unit tests for behavioral telemetry collaboration scoring.
tests/unit/hr/performance/conftest.py Adds helpers/factories for performance tracking unit tests.
tests/unit/hr/performance/init.py Marks performance unit test package.
tests/unit/hr/conftest.py Adds shared HR fixtures and factory builders for unit tests.
tests/unit/hr/init.py Marks HR unit test package.
tests/unit/core/test_enums.py Updates expected AgentStatus enum member count.
tests/unit/communication/test_enums.py Updates expected MessageType enum member count.
tests/unit/api/conftest.py Extends fake persistence backend with HR/performance repositories.
src/ai_company/persistence/sqlite/repositories.py Updates module docstring to point HR repos to hr_repositories.py.
src/ai_company/persistence/sqlite/migrations.py Bumps schema to v2 and adds v2 migration statements for HR tables/indexes.
src/ai_company/persistence/sqlite/hr_repositories.py Adds SQLite repositories for lifecycle events, task metrics, collaboration metrics.
src/ai_company/persistence/sqlite/backend.py Wires new HR repositories into SQLite persistence backend.
src/ai_company/persistence/repositories.py Re-exports new HR repository protocols.
src/ai_company/persistence/protocol.py Extends PersistenceBackend protocol with HR repository properties.
src/ai_company/observability/events/persistence.py Adds structured event constants for HR persistence operations.
src/ai_company/observability/events/performance.py Adds structured event constants for performance tracking operations.
src/ai_company/observability/events/hr.py Adds structured event constants for HR domain operations.
src/ai_company/hr/registry.py Introduces an in-memory agent registry service with status updates and queries.
src/ai_company/hr/reassignment_protocol.py Adds protocol for task reassignment strategies.
src/ai_company/hr/queue_return_strategy.py Implements queue-return reassignment strategy (interrupt + clear assignee).
src/ai_company/hr/persistence_protocol.py Adds repository protocols for lifecycle events and performance metrics.
src/ai_company/hr/performance/window_protocol.py Adds protocol for rolling-window aggregation strategies.
src/ai_company/hr/performance/trend_protocol.py Adds protocol for trend detection strategies.
src/ai_company/hr/performance/tracker.py Adds in-memory performance tracker for recording/scoring/querying snapshots.
src/ai_company/hr/performance/theil_sen_strategy.py Implements Theil–Sen robust slope trend detector.
src/ai_company/hr/performance/quality_protocol.py Adds protocol for quality scoring strategies.
src/ai_company/hr/performance/multi_window_strategy.py Implements multi-window rolling aggregation (7d/30d/90d, etc.).
src/ai_company/hr/performance/models.py Adds performance tracking Pydantic models (records, results, windows, snapshots).
src/ai_company/hr/performance/config.py Adds performance configuration model (windows, thresholds, weights).
src/ai_company/hr/performance/collaboration_protocol.py Adds protocol for collaboration scoring strategies.
src/ai_company/hr/performance/ci_quality_strategy.py Implements CI-signal-based quality scoring strategy.
src/ai_company/hr/performance/behavioral_collaboration_strategy.py Implements behavioral telemetry collaboration scoring strategy.
src/ai_company/hr/performance/init.py Marks performance package.
src/ai_company/hr/onboarding_service.py Implements onboarding checklist management and activation on completion.
src/ai_company/hr/offboarding_service.py Implements offboarding pipeline orchestration (reassign/archive/notify/terminate).
src/ai_company/hr/models.py Adds HR Pydantic models for hiring/firing/onboarding/offboarding/lifecycle events.
src/ai_company/hr/hiring_service.py Implements hiring pipeline (request → candidates → approval → instantiation).
src/ai_company/hr/full_snapshot_strategy.py Implements full-snapshot memory archival with optional org-memory promotion.
src/ai_company/hr/errors.py Adds HR error hierarchy (hiring/offboarding/registry/performance).
src/ai_company/hr/enums.py Adds HR enums (request status, reasons, steps, event types, trend direction).
src/ai_company/hr/archival_protocol.py Adds protocol + result model for memory archival strategies.
src/ai_company/hr/init.py Marks HR package.
src/ai_company/core/enums.py Adds AgentStatus.ONBOARDING.
src/ai_company/communication/enums.py Adds MessageType.HR_NOTIFICATION.
README.md Updates milestone feature list to include HR engine and performance tracking.
DESIGN_SPEC.md Updates package tree documentation to reflect new HR/performance modules and persistence files.
CLAUDE.md Updates repository overview and logging conventions to include HR/performance event constants.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +122 to +126
name_lower = name.lower()
for identity in self._agents.values():
if str(identity.name).lower() == name_lower:
return identity
return None
Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These iterations over self._agents.values() are not protected by self._lock. If another task registers/unregisters concurrently, Python can raise RuntimeError: dictionary changed size during iteration. Consider taking a snapshot of self._agents.values() under the lock (or holding the lock during the loop).

Copilot uses AI. Check for mistakes.
Returns:
Tuple of active agent identities.
"""
return tuple(a for a in self._agents.values() if a.status == AgentStatus.ACTIVE)
Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

list_active() reads/iterates registry state without holding self._lock, so concurrent register()/unregister() can trigger RuntimeError: dictionary changed size during iteration or return inconsistent results. Consider snapshotting under the lock (or holding the lock while building the tuple).

Copilot uses AI. Check for mistakes.
Comment on lines +148 to +151
dept_lower = department.lower()
return tuple(
a for a in self._agents.values() if str(a.department).lower() == dept_lower
)
Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

list_by_department() iterates self._agents.values() without self._lock. Concurrent writes can raise RuntimeError: dictionary changed size during iteration. Consider snapshotting the values under the lock before filtering.

Copilot uses AI. Check for mistakes.
Comment on lines +331 to +338
except Exception as exc:
msg = f"Failed to instantiate agent for request {request.id!r}"
logger.exception(
HR_HIRING_INSTANTIATION_FAILED,
request_id=str(request.id),
error=str(exc),
)
raise HiringError(msg) from exc
Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Catching a blanket Exception here will also swallow unexpected programmer errors (e.g., AttributeError/TypeError) and re-raise them as HiringError, which can make debugging harder. Consider catching the specific exception types you expect from AgentIdentity/ModelConfig validation and registry.register() (e.g., Pydantic validation errors, AgentAlreadyRegisteredError, etc.), and let genuinely unexpected exceptions propagate.

Copilot uses AI. Check for mistakes.
Comment on lines +55 to +71
def __init__(
self,
*,
quality_strategy: QualityScoringStrategy,
collaboration_strategy: CollaborationScoringStrategy,
window_strategy: MetricsWindowStrategy,
trend_strategy: TrendDetectionStrategy,
config: PerformanceConfig | None = None,
) -> None:
self._quality_strategy = quality_strategy
self._collaboration_strategy = collaboration_strategy
self._window_strategy = window_strategy
self._trend_strategy = trend_strategy
self._config = config or PerformanceConfig()
self._task_metrics: dict[str, list[TaskMetricRecord]] = {}
self._collab_metrics: dict[str, list[CollaborationMetricRecord]] = {}

Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PerformanceConfig includes windows, improving_threshold, declining_threshold, and collaboration_weights, but this tracker only uses config.min_data_points. This can lead to confusing/ineffective configuration (changing thresholds/windows has no effect). Either wire these config values into the injected strategies (or construct default strategies from config) or remove the unused fields/config parameter from PerformanceTracker.

Copilot uses AI. Check for mistakes.
Comment on lines +100 to +107
# Compute all pairwise slopes.
slopes: list[float] = []
for (t1, v1), (t2, v2) in combinations(values, 2):
dt_days = (t2.timestamp() - t1.timestamp()) / _SECONDS_PER_DAY
if abs(dt_days) < _MIN_DELTA_DAYS:
continue
slope = (v2 - v1) / dt_days
slopes.append(slope)
Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Theil-Sen slope calculation depends on time ordering. combinations(values, 2) uses the incoming order, so if values isn’t sorted by timestamp you can get negative dt_days and inverted slopes, which can flip the detected trend direction. Consider sorting by timestamp (or otherwise ensuring x2 > x1) before computing slopes.

Copilot uses AI. Check for mistakes.
Comment on lines +139 to +145
org_category = _CATEGORY_MAP[entry.category]
author = OrgFactAuthor(agent_id=agent_id)
write_req = OrgFactWriteRequest(
content=NotBlankStr(entry.content),
category=org_category,
)
await org_memory_backend.write(write_req, author=author)
Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Org memory promotion appears to be effectively disabled: OrgFactAuthor(agent_id=agent_id) will fail validation because non-human authors require both agent_id and seniority (see OrgFactAuthor validator). Since the exception is caught, promotable entries will be skipped and promoted_count will stay 0, contradicting the docstring. Pass an author with seniority (e.g., derive from AgentIdentity/registry) or change the promotion API to accept the author explicitly.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 30

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
tests/unit/communication/test_enums.py (1)

19-31: ⚠️ Potential issue | 🟡 Minor

Add assertion for the new HR_NOTIFICATION value.

The member count was updated to 10, but test_values doesn't verify the new HR_NOTIFICATION member. For consistency with other enum tests in this file (which verify all member values), add the missing assertion.

💚 Proposed fix
     def test_values(self) -> None:
         assert MessageType.TASK_UPDATE.value == "task_update"
         assert MessageType.QUESTION.value == "question"
         assert MessageType.ANNOUNCEMENT.value == "announcement"
         assert MessageType.REVIEW_REQUEST.value == "review_request"
         assert MessageType.APPROVAL.value == "approval"
         assert MessageType.DELEGATION.value == "delegation"
         assert MessageType.STATUS_REPORT.value == "status_report"
         assert MessageType.ESCALATION.value == "escalation"
         assert MessageType.MEETING_CONTRIBUTION.value == "meeting_contribution"
+        assert MessageType.HR_NOTIFICATION.value == "hr_notification"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/communication/test_enums.py` around lines 19 - 31, The test suite
added a new enum member but test_values doesn't assert it; update test_values in
tests/unit/communication/test_enums.py to include an assertion that
MessageType.HR_NOTIFICATION.value == "hr_notification" (alongside the other
MessageType assertions) so the new member is verified and the member count
matches test_member_count.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@DESIGN_SPEC.md`:
- Line 2827: The structure map entry incorrectly points to hr.enums for
AgentStatus; update the map so the AgentStatus reference points to the module
where it is actually defined (core.enums) instead of hr.enums, i.e., replace the
AgentStatus module reference in the map from hr.enums to core.enums so readers
are directed to the correct definition.

In `@src/ai_company/hr/archival_protocol.py`:
- Around line 51-70: The archive method's docstring is missing a Raises section
documenting MemoryArchivalError; update the docstring for async def archive (in
archival_protocol.py) to include a "Raises:" block that lists
MemoryArchivalError (and when it is raised, e.g., on archival failure or backend
errors) to match TaskReassignmentStrategy's style and the project's errors.py
hierarchy so callers know to handle this exception.

In `@src/ai_company/hr/full_snapshot_strategy.py`:
- Around line 65-186: The archive() method is too long and should be broken into
helpers; extract the phases into private methods to improve readability and
testability: implement a _retrieve_entries(agent_id, memory_backend) that wraps
memory_backend.retrieve and error handling, an _archive_entries(entries,
archival_store, agent_id, now) that constructs ArchivalEntry, calls
archival_store.archive and returns (archived_count, deleted_ids), a
_promote_to_org(entries, org_memory_backend, agent_id) that handles category
mapping via _CATEGORY_MAP, OrgFactAuthor/OrgFactWriteRequest and returns
promoted_count, and a _clean_hot_store(deleted_ids, memory_backend, agent_id)
that deletes ids and returns hot_store_cleaned; update archive() to call these
helpers and assemble the ArchivalResult while preserving existing logging and
exception behavior (use function names: archive, _retrieve_entries,
_archive_entries, _promote_to_org, _clean_hot_store, ArchivalEntry,
archival_store.archive, org_memory_backend.write, memory_backend.delete).
- Around line 156-169: The delete loop is re-wrapping already-validated IDs with
NotBlankStr; remove the redundant wrap and pass memory_id directly to
memory_backend.delete from the hot store cleanup loop (symbols: deleted_ids,
memory_id, memory_backend.delete, NotBlankStr, hot_store_cleaned); ensure
consistency with how deleted_ids is populated (either keep deleted_ids as
list[NotBlankStr] and pass the elements unchanged, or keep them as list[str] and
pass the raw str) and update the code that builds deleted_ids accordingly so
validation happens only once when entries are read from the backend.

In `@src/ai_company/hr/hiring_service.py`:
- Around line 318-329: The new AgentIdentity is not copying the candidate's
skills, causing AgentIdentity.skills to remain the default empty SkillSet;
update the AgentIdentity construction (AgentIdentity(...)) to set its skills
from the CandidateCard (e.g., use candidate.skills or convert candidate.skills
into the expected SkillSet type) so the hired agent retains the
CandidateCard.skills for downstream skill-based routing and gap-filling logic.
Ensure any necessary conversion or validation is applied so the types match
between CandidateCard.skills and AgentIdentity.skills.
- Around line 129-168: The generate_candidate method is mutating _requests using
the caller-supplied HiringRequest snapshot (request) which can cause
lost-updates; change it to re-load the authoritative request from self._requests
by id (self._requests[str(request.id)]) before creating the CandidateCard, then
apply changes to that fresh instance (use model_copy or a helper like
_apply_update_to_request) and write back into self._requests; encapsulate the
read-modify-write in a small helper (e.g., _get_current_request and
_save_updated_request) and use those helpers here (and similarly in the other
methods flagged at lines 170-244) to avoid overwriting concurrent updates and
preserve existing candidates/approval fields.
- Around line 327-349: The code always creates the new identity with
AgentStatus.ONBOARDING which leaves agents stuck if no onboarding service exists
or if start_onboarding fails; fix by: when instantiating the identity choose
status = AgentStatus.ONBOARDING only if self._onboarding_service is not None,
otherwise use AgentStatus.ACTIVE; additionally wrap the await
self._onboarding_service.start_onboarding(...) call in a try/except so that if
start_onboarding raises you perform compensation: call the registry
rollback/unregister method (e.g., self._registry.unregister(identity.id) or
appropriate removal), remove or revert the request entry in self._requests, and
raise a HiringError from the exception; reference symbols:
AgentStatus.ONBOARDING, AgentStatus.ACTIVE, _onboarding_service,
_registry.register, start_onboarding, self._requests,
HiringRequestStatus.INSTANTIATED.
- Line 154: The assignment for estimated_monthly_cost currently uses
"request.budget_limit_monthly or 50.0", which treats 0.0 as missing; change it
to preserve an explicit 0.0 by using a None-check (e.g., use
request.budget_limit_monthly if request.budget_limit_monthly is not None else
50.0) so estimated_monthly_cost uses the provided 0.0 but falls back to 50.0
only when budget_limit_monthly is None; update the expression where
estimated_monthly_cost is set (referencing request.budget_limit_monthly and
estimated_monthly_cost) accordingly.

In `@src/ai_company/hr/models.py`:
- Around line 270-273: Replace the plain string type on the tasks_reassigned
field with the NotBlankStr type so identifiers are validated: change the
annotation from tuple[str, ...] to tuple[NotBlankStr, ...] on the
tasks_reassigned Field, keep the default=() and description, and add an import
for NotBlankStr from core.types if it’s not already imported; ensure any type
hints/usages that rely on tasks_reassigned reflect the new NotBlankStr element
type.

In `@src/ai_company/hr/offboarding_service.py`:
- Around line 109-114: The warning logs the wrong event constant when the agent
lookup fails: replace the HR_FIRING_INITIATED constant used in logger.warning
with HR_FIRING_COMPLETE and include the error context (msg) as before; locate
the agent lookup around the self._registry.get(agent_id) call where identity is
checked, update the logger.warning call that currently references
HR_FIRING_INITIATED to use HR_FIRING_COMPLETE, keeping agent_id and error=msg,
then raise AgentNotFoundError as before.

In `@src/ai_company/hr/onboarding_service.py`:
- Around line 46-78: Check registry membership for agent_id at the start of
start_onboarding and reject missing/invalid agents before creating the
OnboardingChecklist; add a lookup (e.g., via existing registry/service used
elsewhere) and raise OnboardingError with a clear message if not found. Extract
the finalization logic that persists/completes a checklist into a small helper
(e.g., _finalize_checklist or finalize_checklist_write) and call it from
complete_step/update_status so the write-after-await path isn't embedded in the
long method. Ensure start_onboarding uses the registry check and only constructs
and stores the checklist after validation, and update
complete_step/update_status to call the new helper to persist the final
checklist.

In `@src/ai_company/hr/performance/multi_window_strategy.py`:
- Around line 111-159: The code currently computes success_rate for any non-zero
count even when the window is below the minimum sample size; change it so
success_rate is only computed and returned when has_enough is True (otherwise
set it to None). Concretely, remove or stop using the unconditional success_rate
= completed / count assignment and instead compute success_rate inside the
has_enough branch (or set success_rate = None when not has_enough), then pass
round(success_rate, 4) only when success_rate is not None in the WindowMetrics
constructor (refer to the has_enough variable, the success_rate name, and the
final WindowMetrics return block to locate the change).

In `@src/ai_company/hr/performance/quality_protocol.py`:
- Around line 25-27: Change the name property return type from plain str to
NotBlankStr to enforce non-blank strategy names: update the annotation on the
name property in the protocol (def name(self) -> NotBlankStr), add the required
import from core.types (NotBlankStr), and ensure any callers/implementations and
tests return/consume NotBlankStr-compatible values (matching existing
QualityScoreResult.strategy_name usage).

In `@src/ai_company/hr/performance/tracker.py`:
- Around line 302-324: get_collaboration_metrics currently only supports a since
filter while get_task_metrics supports both since and until; add an optional
until: AwareDatetime | None = None parameter to get_collaboration_metrics,
update the docstring to document it, and apply the same filter logic used in
get_task_metrics (filter records by recorded_at >= since and recorded_at <=
until when those args are provided) using the same _collab_metrics traversal
(preserve behavior when agent_id is provided); ensure the default remains None
for backward compatibility and update any callers/tests that expect the new
signature.
- Around line 221-272: _compute_trends contains duplicated logic for building
(timestamp, value) tuples and calling self._trend_strategy.detect for
quality_score and cost_usd; extract that into a small helper (e.g.,
_compute_metric_trend or compute_and_append_trend) that accepts metric_name
(NotBlankStr), a value extractor (lambda or attr name) and window.window_size,
builds the values tuple from window_records, calls self._trend_strategy.detect
and appends the result when values exist; update _compute_trends to call this
helper for "quality_score" (skip None quality_score) and "cost_usd" to remove
the duplicated blocks while preserving filtering, window_size param and
appending behavior.

In `@src/ai_company/hr/queue_return_strategy.py`:
- Around line 75-87: The code uses the same event constant
HR_FIRING_TASKS_REASSIGNED for both the failure warning and the success info
log, which harms observability; update the failure path in the logger.warning
call inside the except block to use a distinct constant (e.g.,
HR_FIRING_TASK_REASSIGN_FAILED) or create that constant if it doesn't exist,
keep the same structured fields (agent_id, task_id, error=msg), and leave the
raise TaskReassignmentError(msg) from exc unchanged so success logs (logger.info
with HR_FIRING_TASKS_REASSIGNED) remain distinct from failures.

In `@src/ai_company/hr/reassignment_protocol.py`:
- Around line 27-42: The protocol method reassign lacks a Raises docstring
entry; update its docstring to document that implementations (e.g.,
QueueReturnStrategy) may raise TaskReassignmentError when task state transitions
fail—add a Raises section to reassign describing TaskReassignmentError, when it
is raised (on transition failures/failed clearing of assigned_to) and any
conditions or guarantees callers can expect.

In `@src/ai_company/hr/registry.py`:
- Around line 39-47: The constructor stores a MessageBus in self._message_bus
but register(), unregister(), and update_status() only log events; integrate the
bus so lifecycle changes are published: in AgentRegistry.register,
AgentRegistry.unregister, and AgentRegistry.update_status (and any helper that
mutates self._agents) after acquiring self._lock and updating self._agents, call
self._message_bus.publish/send (guarding for None) with a concise event object
(include event type like
"agent_registered"/"agent_unregistered"/"agent_status_updated", the
AgentIdentity and new status) so callers wired with a MessageBus receive
notifications; preserve existing locking and error handling and make publishing
best-effort (log but don’t raise if bus fails).

In `@src/ai_company/persistence/sqlite/hr_repositories.py`:
- Around line 307-343: The CollaborationMetricRepository.query method lacks a
deterministic ordering; update the SQL in query (method name: query of
CollaborationMetricRepository) to append an ORDER BY clause (e.g., ORDER BY
recorded_at ASC, id ASC) after the WHERE block (or on the base SELECT when no
WHERE) so results are consistently ordered before executing self._db.execute;
keep parameters and error handling unchanged and still convert rows via
self._row_to_record.
- Around line 204-244: The TaskMetricRepository.query method (and the analogous
method in CollaborationMetricRepository) returns rows without a deterministic
order; update the SQL constructed in TaskMetricRepository.query to append an
ORDER BY clause (e.g., ORDER BY completed_at DESC, id DESC or your chosen
deterministic columns) when building the final sql string so results are
consistently ordered, ensure the clause is added after the WHERE when clauses
exist (in the same place where sql is currently appended), and keep the existing
parameter handling and error logging (PERSISTENCE_TASK_METRIC_QUERY_FAILED /
PERSISTENCE_TASK_METRIC_QUERIED) unchanged.

In `@src/ai_company/persistence/sqlite/migrations.py`:
- Around line 123-140: Add composite indexes so agent+time predicates are
covered: create "idx_tm_agent_id_completed_at" on task_metrics(agent_id,
completed_at) and "idx_cm_agent_id_recorded_at" on
collaboration_metrics(agent_id, recorded_at). Insert these CREATE INDEX IF NOT
EXISTS statements into the migrations list alongside the existing single-column
indexes (near idx_tm_agent_id / idx_tm_completed_at and idx_cm_agent_id /
idx_cm_recorded_at) so TaskMetricRepository.query and
CollaborationMetricRepository.query (which filter by agent_id plus time range)
can use the composite indexes; ensure the column order is agent_id then the
timestamp (completed_at / recorded_at).

In `@tests/unit/api/conftest.py`:
- Around line 116-138: The list_events implementation in
FakeLifecycleEventRepository ignores the since parameter; update list_events
(and the in-memory filtering logic in the FakeLifecycleEventRepository class) to
filter events by their timestamp/creation time: when since is provided, only
include events whose event.timestamp (or created_at if your event model uses
that attribute) is >= since; handle both datetime and numeric timestamps
consistently (convert types as needed) and ensure the result variable is updated
before returning so time-based tests behave correctly.
- Around line 162-180: The query method of FakeCollaborationMetricRepository
currently ignores the since parameter; update query (in class
FakeCollaborationMetricRepository, method query) to filter self._records by the
since timestamp when provided (e.g., keep only records where record.created_at
or record.timestamp >= since), in addition to the existing agent_id filter, and
return the filtered tuple; ensure you handle both datetime and string/ISO inputs
consistently (convert/compare as appropriate) and don't modify other method
signatures.
- Around line 140-159: The query method of FakeTaskMetricRepository currently
ignores the since and until parameters; update the
FakeTaskMetricRepository.query implementation to apply time-range filtering
after the agent_id filter by comparing each record's timestamp field (e.g.,
record.timestamp or record.created_at — use the actual timestamp attribute used
by your task metric objects) against the since and until values, only including
records where (since is None or record.timestamp >= since) and (until is None or
record.timestamp <= until); keep the existing agent_id filter and return a tuple
of the filtered list so tests relying on time-based filtering behave correctly.

In `@tests/unit/core/test_enums.py`:
- Around line 42-43: Update the test_agent_status_values test to include an
assertion for the new AgentStatus.ONBOARDING member: locate the
test_agent_status_values function and add one more assertion that checks
AgentStatus.ONBOARDING exists and equals the expected enum value using the same
assertion style as the other values in that test (i.e., mirror how
AgentStatus.ACTIVE/INACTIVE/OTHER are asserted to ensure consistency).

In `@tests/unit/hr/performance/test_behavioral_collaboration_strategy.py`:
- Around line 81-106: Update the docstring on test_all_components_none to match
the assertions: replace the incorrect "All optional components None -> neutral
5.0, confidence 0.0." with a description that reflects the actual scenario and
expected values, e.g., "Only loop_prevention present -> score 10.0, confidence
0.1." in the test_all_components_none test function (referencing the use of
make_collab_metric, NOW, and the call to strategy.score with
NotBlankStr("agent-001")).

In `@tests/unit/hr/performance/test_ci_quality_strategy.py`:
- Around line 129-157: Merge the two separate tests
test_cost_efficiency_zero_cost and test_cost_efficiency_high_cost into the
existing parametrized test_cost_efficiency_parametrized: add parameter rows for
(cost_usd=0.0, expected=10.0) and (cost_usd=15.0, expected=0.0), use the same
setup via self._make_strategy() and make_task_metric, call
strategy.score(agent_id=NotBlankStr("agent-001"),
task_id=NotBlankStr("task-001"), task_result=task_result,
acceptance_criteria=()), and assert dict(result.breakdown)["cost_efficiency"] ==
expected; then remove the now-duplicate test_cost_efficiency_zero_cost and
test_cost_efficiency_high_cost (and the other duplicate pair noted) so only the
parametrized test covers these cases.

In `@tests/unit/hr/performance/test_tracker.py`:
- Around line 253-285: The test test_snapshot_with_windows_and_trends asserts
trends are present despite only recording one TaskMetricRecord and not supplying
the time-series that TrendStrategy.detect uses; either (A) make the scenario
satisfy the trend-detection threshold by adding enough TaskMetricRecord entries
(call tracker.record_task_metric multiple times at different timestamps so
WindowMetrics.data_point_count and TrendStrategy.detect receive sufficient
samples) before calling tracker.get_snapshot, or (B) change the assertion to
expect no trends (assert snapshot.trends == ()) to verify the threshold guard;
update references in the test around
MockWindowStrategy/WindowMetrics.data_point_count, tracker.record_task_metric,
and tracker.get_snapshot accordingly.

In `@tests/unit/hr/test_offboarding_service.py`:
- Around line 38-52: The test double list_tasks in
tests/unit/hr/test_offboarding_service.py declares assigned_to: str | None but
OffboardingService calls it with NotBlankStr(agent_id); change the signature to
assigned_to: NotBlankStr | None to match the expected type, add the appropriate
import for NotBlankStr from the module where it’s defined, and update any
related type hints in that file so the fake ListTasks function signature aligns
with the OffboardingService usage (function name: list_tasks).

In `@tests/unit/persistence/test_protocol.py`:
- Around line 76-117: Add protocol-compliance tests for the new fake repos so
they assert isinstance against the repository protocols: add three tests in the
TestProtocolCompliance suite that check _FakeLifecycleEventRepository() is a
LifecycleEventRepository, _FakeTaskMetricRepository() is a TaskMetricRepository,
and _FakeCollaborationMetricRepository() is a CollaborationMetricRepository;
import the protocols (LifecycleEventRepository, TaskMetricRepository,
CollaborationMetricRepository) and place the assertions alongside the existing
protocol tests (e.g., the block around TestProtocolCompliance where other
isinstance checks live).

---

Outside diff comments:
In `@tests/unit/communication/test_enums.py`:
- Around line 19-31: The test suite added a new enum member but test_values
doesn't assert it; update test_values in tests/unit/communication/test_enums.py
to include an assertion that MessageType.HR_NOTIFICATION.value ==
"hr_notification" (alongside the other MessageType assertions) so the new member
is verified and the member count matches test_member_count.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: e95b47f7-a83d-4db9-853b-3da1d0d5e8a2

📥 Commits

Reviewing files that changed from the base of the PR and between 8c39742 and 48f28cf.

📒 Files selected for processing (65)
  • CLAUDE.md
  • DESIGN_SPEC.md
  • README.md
  • src/ai_company/communication/enums.py
  • src/ai_company/core/enums.py
  • src/ai_company/hr/__init__.py
  • src/ai_company/hr/archival_protocol.py
  • src/ai_company/hr/enums.py
  • src/ai_company/hr/errors.py
  • src/ai_company/hr/full_snapshot_strategy.py
  • src/ai_company/hr/hiring_service.py
  • src/ai_company/hr/models.py
  • src/ai_company/hr/offboarding_service.py
  • src/ai_company/hr/onboarding_service.py
  • src/ai_company/hr/performance/__init__.py
  • src/ai_company/hr/performance/behavioral_collaboration_strategy.py
  • src/ai_company/hr/performance/ci_quality_strategy.py
  • src/ai_company/hr/performance/collaboration_protocol.py
  • src/ai_company/hr/performance/config.py
  • src/ai_company/hr/performance/models.py
  • src/ai_company/hr/performance/multi_window_strategy.py
  • src/ai_company/hr/performance/quality_protocol.py
  • src/ai_company/hr/performance/theil_sen_strategy.py
  • src/ai_company/hr/performance/tracker.py
  • src/ai_company/hr/performance/trend_protocol.py
  • src/ai_company/hr/performance/window_protocol.py
  • src/ai_company/hr/persistence_protocol.py
  • src/ai_company/hr/queue_return_strategy.py
  • src/ai_company/hr/reassignment_protocol.py
  • src/ai_company/hr/registry.py
  • src/ai_company/observability/events/hr.py
  • src/ai_company/observability/events/performance.py
  • src/ai_company/observability/events/persistence.py
  • src/ai_company/persistence/protocol.py
  • src/ai_company/persistence/repositories.py
  • src/ai_company/persistence/sqlite/backend.py
  • src/ai_company/persistence/sqlite/hr_repositories.py
  • src/ai_company/persistence/sqlite/migrations.py
  • src/ai_company/persistence/sqlite/repositories.py
  • tests/unit/api/conftest.py
  • tests/unit/communication/test_enums.py
  • tests/unit/core/test_enums.py
  • tests/unit/hr/__init__.py
  • tests/unit/hr/conftest.py
  • tests/unit/hr/performance/__init__.py
  • tests/unit/hr/performance/conftest.py
  • tests/unit/hr/performance/test_behavioral_collaboration_strategy.py
  • tests/unit/hr/performance/test_ci_quality_strategy.py
  • tests/unit/hr/performance/test_models.py
  • tests/unit/hr/performance/test_multi_window_strategy.py
  • tests/unit/hr/performance/test_theil_sen_strategy.py
  • tests/unit/hr/performance/test_tracker.py
  • tests/unit/hr/test_enums.py
  • tests/unit/hr/test_errors.py
  • tests/unit/hr/test_full_snapshot_strategy.py
  • tests/unit/hr/test_hiring_service.py
  • tests/unit/hr/test_models.py
  • tests/unit/hr/test_offboarding_service.py
  • tests/unit/hr/test_onboarding_service.py
  • tests/unit/hr/test_persistence.py
  • tests/unit/hr/test_queue_return_strategy.py
  • tests/unit/hr/test_registry.py
  • tests/unit/observability/test_events.py
  • tests/unit/persistence/test_migrations_v2.py
  • tests/unit/persistence/test_protocol.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: CI Pass
  • GitHub Check: Agent
  • GitHub Check: Greptile Review
🧰 Additional context used
📓 Path-based instructions (4)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: No from __future__ import annotations — Python 3.14 has PEP 649 native lazy annotations
Use PEP 758 except syntax: use except A, B: (no parentheses) — ruff enforces this on Python 3.14
All public functions must have type hints; mypy strict mode is enforced
Public classes and functions must have Google-style docstrings (enforced by ruff D rules)
Every module with business logic must include logging setup: from ai_company.observability import get_logger then logger = get_logger(__name__). Never use import logging or print() in application code. Variable name must always be logger
Use structured logging with event name constants: logger.info(EVENT_CONSTANT, key=value) — never use string interpolation like logger.info('msg %s', val). Import event constants directly from ai_company.observability.events.<domain>
All error paths must log at WARNING or ERROR with context before raising; all state transitions must log at INFO; DEBUG for object creation, internal flow, and entry/exit of key functions
Line length maximum is 88 characters (enforced by ruff)
Functions must be < 50 lines; files must be < 800 lines
Handle errors explicitly; never silently swallow exceptions
Validate at system boundaries: user input, external APIs, and config files. Do not validate unnecessarily at internal boundaries

Files:

  • src/ai_company/hr/performance/window_protocol.py
  • src/ai_company/persistence/protocol.py
  • tests/unit/hr/test_hiring_service.py
  • src/ai_company/observability/events/performance.py
  • tests/unit/hr/performance/test_tracker.py
  • src/ai_company/core/enums.py
  • src/ai_company/hr/performance/__init__.py
  • tests/unit/hr/performance/conftest.py
  • tests/unit/hr/test_errors.py
  • tests/unit/hr/performance/test_behavioral_collaboration_strategy.py
  • tests/unit/hr/performance/test_ci_quality_strategy.py
  • tests/unit/hr/performance/test_models.py
  • src/ai_company/persistence/sqlite/migrations.py
  • src/ai_company/hr/performance/ci_quality_strategy.py
  • tests/unit/hr/test_offboarding_service.py
  • src/ai_company/hr/queue_return_strategy.py
  • src/ai_company/hr/hiring_service.py
  • src/ai_company/hr/performance/trend_protocol.py
  • src/ai_company/communication/enums.py
  • tests/unit/hr/test_enums.py
  • src/ai_company/hr/performance/config.py
  • src/ai_company/hr/persistence_protocol.py
  • src/ai_company/hr/enums.py
  • src/ai_company/hr/performance/behavioral_collaboration_strategy.py
  • src/ai_company/observability/events/persistence.py
  • src/ai_company/hr/performance/collaboration_protocol.py
  • tests/unit/hr/performance/test_theil_sen_strategy.py
  • src/ai_company/persistence/repositories.py
  • tests/unit/hr/test_onboarding_service.py
  • src/ai_company/hr/performance/theil_sen_strategy.py
  • src/ai_company/hr/offboarding_service.py
  • src/ai_company/persistence/sqlite/backend.py
  • src/ai_company/hr/models.py
  • src/ai_company/hr/full_snapshot_strategy.py
  • src/ai_company/hr/registry.py
  • tests/unit/hr/test_models.py
  • src/ai_company/persistence/sqlite/repositories.py
  • src/ai_company/hr/errors.py
  • tests/unit/hr/test_queue_return_strategy.py
  • tests/unit/communication/test_enums.py
  • tests/unit/core/test_enums.py
  • tests/unit/hr/performance/test_multi_window_strategy.py
  • src/ai_company/hr/reassignment_protocol.py
  • src/ai_company/hr/onboarding_service.py
  • src/ai_company/hr/archival_protocol.py
  • src/ai_company/persistence/sqlite/hr_repositories.py
  • tests/unit/hr/test_registry.py
  • tests/unit/persistence/test_migrations_v2.py
  • src/ai_company/hr/__init__.py
  • tests/unit/hr/conftest.py
  • tests/unit/hr/test_full_snapshot_strategy.py
  • src/ai_company/observability/events/hr.py
  • tests/unit/api/conftest.py
  • src/ai_company/hr/performance/quality_protocol.py
  • src/ai_company/hr/performance/models.py
  • tests/unit/observability/test_events.py
  • tests/unit/hr/test_persistence.py
  • tests/unit/persistence/test_protocol.py
  • src/ai_company/hr/performance/multi_window_strategy.py
  • src/ai_company/hr/performance/tracker.py
src/ai_company/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/ai_company/**/*.py: Use Pydantic v2 with frozen models for config/identity; use separate mutable-via-copy models (with model_copy(update=...)) for runtime state that evolves. Never mix static config fields with mutable runtime fields in one model
Use @computed_field for derived values instead of storing + validating redundant fields (e.g. TokenUsage.total_tokens); use NotBlankStr from core.types for all identifier/name fields—including optional (NotBlankStr | None) and tuple variants—instead of manual whitespace validators
Create new objects for immutability; never mutate existing ones. For non-Pydantic internal collections (registries, BaseTool), use copy.deepcopy() at construction + MappingProxyType for read-only enforcement. For dict/list fields in frozen Pydantic models, rely on frozen=True and copy.deepcopy() at system boundaries (tool execution, LLM provider serialization, inter-agent delegation, persistence serialization)
Prefer asyncio.TaskGroup for fan-out/fan-in parallel operations in new code (multiple tool invocations, parallel agent calls). Prefer structured concurrency over bare create_task

Files:

  • src/ai_company/hr/performance/window_protocol.py
  • src/ai_company/persistence/protocol.py
  • src/ai_company/observability/events/performance.py
  • src/ai_company/core/enums.py
  • src/ai_company/hr/performance/__init__.py
  • src/ai_company/persistence/sqlite/migrations.py
  • src/ai_company/hr/performance/ci_quality_strategy.py
  • src/ai_company/hr/queue_return_strategy.py
  • src/ai_company/hr/hiring_service.py
  • src/ai_company/hr/performance/trend_protocol.py
  • src/ai_company/communication/enums.py
  • src/ai_company/hr/performance/config.py
  • src/ai_company/hr/persistence_protocol.py
  • src/ai_company/hr/enums.py
  • src/ai_company/hr/performance/behavioral_collaboration_strategy.py
  • src/ai_company/observability/events/persistence.py
  • src/ai_company/hr/performance/collaboration_protocol.py
  • src/ai_company/persistence/repositories.py
  • src/ai_company/hr/performance/theil_sen_strategy.py
  • src/ai_company/hr/offboarding_service.py
  • src/ai_company/persistence/sqlite/backend.py
  • src/ai_company/hr/models.py
  • src/ai_company/hr/full_snapshot_strategy.py
  • src/ai_company/hr/registry.py
  • src/ai_company/persistence/sqlite/repositories.py
  • src/ai_company/hr/errors.py
  • src/ai_company/hr/reassignment_protocol.py
  • src/ai_company/hr/onboarding_service.py
  • src/ai_company/hr/archival_protocol.py
  • src/ai_company/persistence/sqlite/hr_repositories.py
  • src/ai_company/hr/__init__.py
  • src/ai_company/observability/events/hr.py
  • src/ai_company/hr/performance/quality_protocol.py
  • src/ai_company/hr/performance/models.py
  • src/ai_company/hr/performance/multi_window_strategy.py
  • src/ai_company/hr/performance/tracker.py
**/*.md

📄 CodeRabbit inference engine (CLAUDE.md)

DESIGN_SPEC.md is MANDATORY reading before implementing any feature. The spec is the starting point for architecture, data models, and behavior. If implementation deviates from the spec, alert the user and explain why — user decides whether to proceed or update the spec. Do not silently diverge. When spec sections are referenced (e.g. 'Section 10.2'), read that section verbatim before coding. When approved deviations occur, update DESIGN_SPEC.md to reflect the new reality

Files:

  • CLAUDE.md
  • README.md
  • DESIGN_SPEC.md
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Use pytest markers: @pytest.mark.unit, @pytest.mark.integration, @pytest.mark.e2e, @pytest.mark.slow. Minimum coverage 80% (enforced in CI). Async mode: asyncio_mode = 'auto' — no manual @pytest.mark.asyncio needed. Per-test timeout: 30 seconds. Use pytest-xdist via -n auto for parallelism. Prefer @pytest.mark.parametrize for testing similar cases
NEVER use real vendor names (Anthropic, OpenAI, Claude, GPT, etc.) in project-owned code, docstrings, comments, tests, or config examples. Use generic names: example-provider, example-large-001, example-medium-001, example-small-001, large/medium/small as aliases. Tests must use test-provider, test-small-001, etc. Vendor names may only appear in: (1) DESIGN_SPEC.md provider list, (2) .claude/ files, (3) third-party import paths

Files:

  • tests/unit/hr/test_hiring_service.py
  • tests/unit/hr/performance/test_tracker.py
  • tests/unit/hr/performance/conftest.py
  • tests/unit/hr/test_errors.py
  • tests/unit/hr/performance/test_behavioral_collaboration_strategy.py
  • tests/unit/hr/performance/test_ci_quality_strategy.py
  • tests/unit/hr/performance/test_models.py
  • tests/unit/hr/test_offboarding_service.py
  • tests/unit/hr/test_enums.py
  • tests/unit/hr/performance/test_theil_sen_strategy.py
  • tests/unit/hr/test_onboarding_service.py
  • tests/unit/hr/test_models.py
  • tests/unit/hr/test_queue_return_strategy.py
  • tests/unit/communication/test_enums.py
  • tests/unit/core/test_enums.py
  • tests/unit/hr/performance/test_multi_window_strategy.py
  • tests/unit/hr/test_registry.py
  • tests/unit/persistence/test_migrations_v2.py
  • tests/unit/hr/conftest.py
  • tests/unit/hr/test_full_snapshot_strategy.py
  • tests/unit/api/conftest.py
  • tests/unit/observability/test_events.py
  • tests/unit/hr/test_persistence.py
  • tests/unit/persistence/test_protocol.py
🧠 Learnings (8)
📚 Learning: 2026-03-09T20:32:32.549Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-09T20:32:32.549Z
Learning: Applies to **/*.py : Every module with business logic must include logging setup: `from ai_company.observability import get_logger` then `logger = get_logger(__name__)`. Never use `import logging` or `print()` in application code. Variable name must always be `logger`

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-09T20:32:32.549Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-09T20:32:32.549Z
Learning: Applies to **/*.py : Use structured logging with event name constants: `logger.info(EVENT_CONSTANT, key=value)` — never use string interpolation like `logger.info('msg %s', val)`. Import event constants directly from `ai_company.observability.events.<domain>`

Applied to files:

  • CLAUDE.md
  • src/ai_company/observability/events/performance.py
  • src/ai_company/observability/events/hr.py
📚 Learning: 2026-03-09T20:32:32.549Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-09T20:32:32.549Z
Learning: Applies to **/*.py : All error paths must log at WARNING or ERROR with context before raising; all state transitions must log at INFO; DEBUG for object creation, internal flow, and entry/exit of key functions

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-09T20:32:32.549Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-09T20:32:32.549Z
Learning: Applies to tests/**/*.py : Use pytest markers: `pytest.mark.unit`, `pytest.mark.integration`, `pytest.mark.e2e`, `pytest.mark.slow`. Minimum coverage 80% (enforced in CI). Async mode: `asyncio_mode = 'auto'` — no manual `pytest.mark.asyncio` needed. Per-test timeout: 30 seconds. Use `pytest-xdist` via `-n auto` for parallelism. Prefer `pytest.mark.parametrize` for testing similar cases

Applied to files:

  • tests/unit/hr/performance/test_ci_quality_strategy.py
📚 Learning: 2026-03-09T20:32:32.549Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-09T20:32:32.549Z
Learning: Applies to src/ai_company/**/*.py : Use Pydantic v2 with frozen models for config/identity; use separate mutable-via-copy models (with `model_copy(update=...)`) for runtime state that evolves. Never mix static config fields with mutable runtime fields in one model

Applied to files:

  • src/ai_company/hr/performance/config.py
  • src/ai_company/hr/models.py
  • src/ai_company/hr/performance/models.py
📚 Learning: 2026-03-09T20:32:32.549Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-09T20:32:32.549Z
Learning: Applies to src/ai_company/**/*.py : Use `computed_field` for derived values instead of storing + validating redundant fields (e.g. `TokenUsage.total_tokens`); use `NotBlankStr` from `core.types` for all identifier/name fields—including optional (`NotBlankStr | None`) and tuple variants—instead of manual whitespace validators

Applied to files:

  • src/ai_company/hr/models.py
  • src/ai_company/hr/performance/models.py
📚 Learning: 2026-03-09T20:32:32.549Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-09T20:32:32.549Z
Learning: Applies to src/ai_company/**/*.py : Prefer `asyncio.TaskGroup` for fan-out/fan-in parallel operations in new code (multiple tool invocations, parallel agent calls). Prefer structured concurrency over bare `create_task`

Applied to files:

  • src/ai_company/hr/reassignment_protocol.py
📚 Learning: 2026-03-09T20:32:32.549Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-09T20:32:32.549Z
Learning: Applies to src/ai_company/**/*.py : Create new objects for immutability; never mutate existing ones. For non-Pydantic internal collections (registries, `BaseTool`), use `copy.deepcopy()` at construction + `MappingProxyType` for read-only enforcement. For `dict`/`list` fields in frozen Pydantic models, rely on `frozen=True` and `copy.deepcopy()` at system boundaries (tool execution, LLM provider serialization, inter-agent delegation, persistence serialization)

Applied to files:

  • src/ai_company/hr/performance/models.py
🧬 Code graph analysis (43)
src/ai_company/hr/performance/window_protocol.py (5)
src/ai_company/hr/performance/models.py (2)
  • TaskMetricRecord (22-64)
  • WindowMetrics (195-249)
src/ai_company/hr/performance/theil_sen_strategy.py (1)
  • name (69-71)
src/ai_company/hr/archival_protocol.py (1)
  • name (47-49)
src/ai_company/hr/performance/collaboration_protocol.py (1)
  • name (25-27)
src/ai_company/hr/performance/multi_window_strategy.py (2)
  • name (66-68)
  • compute_windows (75-100)
src/ai_company/persistence/protocol.py (4)
src/ai_company/hr/persistence_protocol.py (3)
  • CollaborationMetricRepository (98-130)
  • LifecycleEventRepository (22-56)
  • TaskMetricRepository (60-94)
src/ai_company/persistence/sqlite/backend.py (3)
  • lifecycle_events (241-247)
  • task_metrics (250-256)
  • collaboration_metrics (259-267)
tests/unit/api/conftest.py (3)
  • lifecycle_events (228-229)
  • task_metrics (232-233)
  • collaboration_metrics (236-237)
tests/unit/persistence/test_protocol.py (3)
  • lifecycle_events (154-155)
  • task_metrics (158-159)
  • collaboration_metrics (162-163)
tests/unit/hr/performance/test_tracker.py (6)
src/ai_company/hr/performance/models.py (6)
  • CollaborationMetricRecord (67-122)
  • CollaborationScoreResult (150-172)
  • QualityScoreResult (125-147)
  • TaskMetricRecord (22-64)
  • TrendResult (175-192)
  • WindowMetrics (195-249)
src/ai_company/hr/performance/config.py (1)
  • PerformanceConfig (10-59)
src/ai_company/hr/performance/collaboration_protocol.py (2)
  • name (25-27)
  • score (29-46)
src/ai_company/hr/performance/quality_protocol.py (2)
  • name (26-28)
  • score (30-49)
src/ai_company/hr/performance/trend_protocol.py (2)
  • name (25-27)
  • detect (29-46)
src/ai_company/hr/performance/window_protocol.py (3)
  • name (27-29)
  • min_data_points (32-34)
  • compute_windows (36-51)
tests/unit/hr/performance/conftest.py (3)
src/ai_company/core/enums.py (2)
  • Complexity (247-253)
  • TaskType (227-235)
src/ai_company/core/task.py (1)
  • AcceptanceCriterion (26-42)
src/ai_company/hr/performance/models.py (2)
  • CollaborationMetricRecord (67-122)
  • TaskMetricRecord (22-64)
tests/unit/hr/test_errors.py (1)
src/ai_company/hr/errors.py (15)
  • AgentAlreadyRegisteredError (64-65)
  • AgentNotFoundError (60-61)
  • AgentRegistryError (56-57)
  • FiringError (30-31)
  • HiringApprovalRequiredError (15-16)
  • HiringError (11-12)
  • HiringRejectedError (19-20)
  • HRError (4-5)
  • InsufficientDataError (75-76)
  • InvalidCandidateError (23-24)
  • MemoryArchivalError (42-43)
  • OffboardingError (34-35)
  • OnboardingError (49-50)
  • PerformanceError (71-72)
  • TaskReassignmentError (38-39)
tests/unit/hr/performance/test_behavioral_collaboration_strategy.py (2)
src/ai_company/hr/performance/behavioral_collaboration_strategy.py (2)
  • name (58-60)
  • score (62-140)
tests/unit/hr/performance/conftest.py (1)
  • make_collab_metric (44-65)
tests/unit/hr/performance/test_ci_quality_strategy.py (1)
src/ai_company/hr/performance/ci_quality_strategy.py (3)
  • CISignalQualityStrategy (28-114)
  • name (41-43)
  • score (45-114)
tests/unit/hr/performance/test_models.py (3)
src/ai_company/core/enums.py (2)
  • Complexity (247-253)
  • TaskType (227-235)
src/ai_company/hr/enums.py (1)
  • TrendDirection (42-48)
src/ai_company/hr/performance/models.py (6)
  • AgentPerformanceSnapshot (252-287)
  • CollaborationMetricRecord (67-122)
  • CollaborationScoreResult (150-172)
  • QualityScoreResult (125-147)
  • TrendResult (175-192)
  • WindowMetrics (195-249)
src/ai_company/persistence/sqlite/migrations.py (1)
tests/unit/hr/test_persistence.py (1)
  • db (29-35)
src/ai_company/hr/performance/ci_quality_strategy.py (2)
src/ai_company/hr/performance/models.py (2)
  • QualityScoreResult (125-147)
  • TaskMetricRecord (22-64)
src/ai_company/core/task.py (1)
  • AcceptanceCriterion (26-42)
tests/unit/hr/test_offboarding_service.py (2)
src/ai_company/hr/errors.py (1)
  • AgentNotFoundError (60-61)
src/ai_company/hr/offboarding_service.py (2)
  • OffboardingService (44-228)
  • offboard (85-228)
src/ai_company/hr/queue_return_strategy.py (3)
src/ai_company/core/enums.py (1)
  • TaskStatus (198-224)
src/ai_company/hr/errors.py (1)
  • TaskReassignmentError (38-39)
src/ai_company/core/task.py (1)
  • Task (45-261)
src/ai_company/hr/hiring_service.py (8)
src/ai_company/core/agent.py (2)
  • AgentIdentity (265-323)
  • ModelConfig (149-178)
src/ai_company/core/approval.py (1)
  • ApprovalItem (24-96)
src/ai_company/core/enums.py (4)
  • ActionType (372-384)
  • AgentStatus (67-73)
  • ApprovalRiskLevel (419-425)
  • SeniorityLevel (6-21)
src/ai_company/hr/enums.py (1)
  • HiringRequestStatus (6-12)
src/ai_company/hr/models.py (2)
  • CandidateCard (29-66)
  • HiringRequest (69-141)
src/ai_company/hr/registry.py (2)
  • AgentRegistryService (29-193)
  • register (48-74)
src/ai_company/api/approval_store.py (1)
  • ApprovalStore (30-156)
src/ai_company/hr/onboarding_service.py (2)
  • OnboardingService (27-166)
  • start_onboarding (46-78)
src/ai_company/hr/performance/trend_protocol.py (3)
src/ai_company/hr/performance/models.py (1)
  • TrendResult (175-192)
src/ai_company/hr/performance/theil_sen_strategy.py (2)
  • name (69-71)
  • detect (73-143)
tests/unit/hr/performance/test_tracker.py (5)
  • name (35-36)
  • name (62-63)
  • name (93-94)
  • name (120-121)
  • detect (123-136)
tests/unit/hr/test_enums.py (1)
src/ai_company/hr/enums.py (5)
  • FiringReason (15-21)
  • HiringRequestStatus (6-12)
  • LifecycleEventType (32-39)
  • OnboardingStep (24-29)
  • TrendDirection (42-48)
src/ai_company/hr/performance/config.py (4)
src/ai_company/hr/performance/multi_window_strategy.py (1)
  • min_data_points (71-73)
src/ai_company/hr/performance/window_protocol.py (1)
  • min_data_points (32-34)
tests/unit/hr/performance/test_tracker.py (1)
  • min_data_points (97-98)
src/ai_company/tools/base.py (1)
  • description (115-117)
src/ai_company/hr/persistence_protocol.py (3)
src/ai_company/hr/enums.py (1)
  • LifecycleEventType (32-39)
src/ai_company/hr/models.py (1)
  • AgentLifecycleEvent (302-331)
src/ai_company/hr/performance/models.py (2)
  • CollaborationMetricRecord (67-122)
  • TaskMetricRecord (22-64)
src/ai_company/hr/performance/behavioral_collaboration_strategy.py (1)
src/ai_company/hr/performance/models.py (2)
  • CollaborationMetricRecord (67-122)
  • CollaborationScoreResult (150-172)
src/ai_company/hr/performance/collaboration_protocol.py (2)
src/ai_company/hr/performance/models.py (2)
  • CollaborationMetricRecord (67-122)
  • CollaborationScoreResult (150-172)
src/ai_company/hr/performance/behavioral_collaboration_strategy.py (2)
  • name (58-60)
  • score (62-140)
tests/unit/hr/performance/test_theil_sen_strategy.py (1)
src/ai_company/hr/performance/theil_sen_strategy.py (3)
  • TheilSenTrendStrategy (43-143)
  • name (69-71)
  • detect (73-143)
src/ai_company/persistence/repositories.py (1)
src/ai_company/hr/persistence_protocol.py (3)
  • CollaborationMetricRepository (98-130)
  • LifecycleEventRepository (22-56)
  • TaskMetricRepository (60-94)
tests/unit/hr/test_onboarding_service.py (5)
src/ai_company/core/enums.py (1)
  • AgentStatus (67-73)
src/ai_company/hr/enums.py (1)
  • OnboardingStep (24-29)
src/ai_company/hr/errors.py (1)
  • OnboardingError (49-50)
tests/unit/hr/conftest.py (2)
  • registry (174-176)
  • make_agent_identity (37-55)
src/ai_company/hr/registry.py (3)
  • AgentRegistryService (29-193)
  • register (48-74)
  • get (102-111)
src/ai_company/hr/performance/theil_sen_strategy.py (3)
src/ai_company/hr/enums.py (1)
  • TrendDirection (42-48)
src/ai_company/hr/performance/models.py (1)
  • TrendResult (175-192)
src/ai_company/observability/_logger.py (1)
  • get_logger (8-28)
src/ai_company/persistence/sqlite/backend.py (2)
src/ai_company/persistence/sqlite/hr_repositories.py (3)
  • SQLiteCollaborationMetricRepository (247-343)
  • SQLiteLifecycleEventRepository (45-145)
  • SQLiteTaskMetricRepository (148-244)
src/ai_company/persistence/protocol.py (3)
  • lifecycle_events (99-101)
  • task_metrics (104-106)
  • collaboration_metrics (109-111)
src/ai_company/hr/models.py (2)
src/ai_company/core/enums.py (1)
  • SeniorityLevel (6-21)
src/ai_company/hr/enums.py (4)
  • FiringReason (15-21)
  • HiringRequestStatus (6-12)
  • LifecycleEventType (32-39)
  • OnboardingStep (24-29)
src/ai_company/hr/full_snapshot_strategy.py (7)
src/ai_company/core/enums.py (2)
  • MemoryCategory (101-108)
  • OrgFactCategory (120-130)
src/ai_company/hr/archival_protocol.py (3)
  • ArchivalResult (17-34)
  • name (47-49)
  • archive (51-70)
src/ai_company/memory/consolidation/models.py (1)
  • ArchivalEntry (56-91)
src/ai_company/memory/models.py (1)
  • MemoryQuery (153-230)
src/ai_company/memory/org/models.py (2)
  • OrgFactAuthor (19-88)
  • OrgFactWriteRequest (113-124)
src/ai_company/memory/org/protocol.py (1)
  • OrgMemoryBackend (20-110)
src/ai_company/memory/protocol.py (1)
  • MemoryBackend (20-182)
src/ai_company/hr/registry.py (4)
src/ai_company/core/enums.py (1)
  • AgentStatus (67-73)
src/ai_company/hr/errors.py (2)
  • AgentAlreadyRegisteredError (64-65)
  • AgentNotFoundError (60-61)
src/ai_company/communication/bus_protocol.py (1)
  • MessageBus (20-209)
src/ai_company/core/agent.py (1)
  • AgentIdentity (265-323)
tests/unit/hr/test_models.py (4)
src/ai_company/core/enums.py (1)
  • SeniorityLevel (6-21)
src/ai_company/hr/enums.py (4)
  • FiringReason (15-21)
  • HiringRequestStatus (6-12)
  • LifecycleEventType (32-39)
  • OnboardingStep (24-29)
src/ai_company/hr/models.py (5)
  • AgentLifecycleEvent (302-331)
  • OffboardingRecord (250-299)
  • OnboardingChecklist (219-247)
  • OnboardingStepRecord (187-216)
  • is_complete (245-247)
tests/unit/hr/conftest.py (3)
  • make_candidate_card (58-83)
  • make_firing_request (121-137)
  • make_hiring_request (86-118)
tests/unit/hr/test_queue_return_strategy.py (1)
src/ai_company/hr/queue_return_strategy.py (2)
  • reassign (42-88)
  • name (38-40)
tests/unit/communication/test_enums.py (1)
src/ai_company/communication/enums.py (1)
  • MessageType (6-21)
tests/unit/core/test_enums.py (1)
src/ai_company/core/enums.py (1)
  • AgentStatus (67-73)
tests/unit/hr/performance/test_multi_window_strategy.py (1)
src/ai_company/hr/performance/multi_window_strategy.py (4)
  • MultiWindowStrategy (44-160)
  • min_data_points (71-73)
  • name (66-68)
  • compute_windows (75-100)
src/ai_company/hr/reassignment_protocol.py (2)
src/ai_company/core/task.py (1)
  • Task (45-261)
src/ai_company/hr/queue_return_strategy.py (2)
  • name (38-40)
  • reassign (42-88)
src/ai_company/hr/onboarding_service.py (5)
src/ai_company/core/enums.py (1)
  • AgentStatus (67-73)
src/ai_company/hr/enums.py (1)
  • OnboardingStep (24-29)
src/ai_company/hr/errors.py (1)
  • OnboardingError (49-50)
src/ai_company/hr/models.py (3)
  • OnboardingChecklist (219-247)
  • OnboardingStepRecord (187-216)
  • is_complete (245-247)
src/ai_company/hr/registry.py (2)
  • AgentRegistryService (29-193)
  • update_status (153-188)
src/ai_company/hr/archival_protocol.py (3)
src/ai_company/memory/consolidation/archival.py (1)
  • ArchivalStore (15-75)
src/ai_company/memory/org/protocol.py (1)
  • OrgMemoryBackend (20-110)
src/ai_company/memory/protocol.py (1)
  • MemoryBackend (20-182)
tests/unit/hr/test_registry.py (3)
src/ai_company/core/enums.py (1)
  • AgentStatus (67-73)
src/ai_company/hr/errors.py (2)
  • AgentAlreadyRegisteredError (64-65)
  • AgentNotFoundError (60-61)
tests/unit/hr/conftest.py (2)
  • registry (174-176)
  • make_agent_identity (37-55)
tests/unit/persistence/test_migrations_v2.py (1)
src/ai_company/persistence/sqlite/migrations.py (4)
  • _apply_v1 (175-178)
  • get_user_version (146-150)
  • run_migrations (195-263)
  • set_user_version (153-172)
tests/unit/hr/conftest.py (5)
src/ai_company/core/agent.py (2)
  • AgentIdentity (265-323)
  • ModelConfig (149-178)
src/ai_company/hr/enums.py (2)
  • FiringReason (15-21)
  • HiringRequestStatus (6-12)
src/ai_company/hr/models.py (3)
  • CandidateCard (29-66)
  • FiringRequest (144-184)
  • HiringRequest (69-141)
src/ai_company/hr/onboarding_service.py (1)
  • OnboardingService (27-166)
src/ai_company/hr/registry.py (1)
  • AgentRegistryService (29-193)
tests/unit/hr/test_full_snapshot_strategy.py (1)
src/ai_company/hr/full_snapshot_strategy.py (3)
  • FullSnapshotStrategy (47-186)
  • archive (65-186)
  • name (61-63)
tests/unit/api/conftest.py (2)
src/ai_company/persistence/sqlite/backend.py (3)
  • lifecycle_events (241-247)
  • task_metrics (250-256)
  • collaboration_metrics (259-267)
src/ai_company/persistence/protocol.py (3)
  • lifecycle_events (99-101)
  • task_metrics (104-106)
  • collaboration_metrics (109-111)
src/ai_company/hr/performance/quality_protocol.py (3)
src/ai_company/core/task.py (1)
  • AcceptanceCriterion (26-42)
src/ai_company/hr/performance/models.py (2)
  • QualityScoreResult (125-147)
  • TaskMetricRecord (22-64)
src/ai_company/hr/performance/ci_quality_strategy.py (2)
  • name (41-43)
  • score (45-114)
src/ai_company/hr/performance/multi_window_strategy.py (1)
src/ai_company/hr/performance/models.py (2)
  • TaskMetricRecord (22-64)
  • WindowMetrics (195-249)
src/ai_company/hr/performance/tracker.py (6)
src/ai_company/hr/performance/models.py (6)
  • AgentPerformanceSnapshot (252-287)
  • CollaborationMetricRecord (67-122)
  • CollaborationScoreResult (150-172)
  • TaskMetricRecord (22-64)
  • TrendResult (175-192)
  • WindowMetrics (195-249)
src/ai_company/hr/performance/collaboration_protocol.py (2)
  • CollaborationScoringStrategy (17-46)
  • score (29-46)
src/ai_company/hr/performance/quality_protocol.py (2)
  • QualityScoringStrategy (18-49)
  • score (30-49)
src/ai_company/hr/performance/trend_protocol.py (2)
  • TrendDetectionStrategy (17-46)
  • detect (29-46)
src/ai_company/hr/performance/window_protocol.py (3)
  • MetricsWindowStrategy (19-51)
  • compute_windows (36-51)
  • min_data_points (32-34)
src/ai_company/hr/performance/multi_window_strategy.py (2)
  • compute_windows (75-100)
  • min_data_points (71-73)
🪛 LanguageTool
CLAUDE.md

[style] ~87-~87: A comma is missing here.
Context: ...nder ai_company.observability.events (e.g. PROVIDER_CALL_START from `events.prov...

(EG_NO_COMMA)

README.md

[typographical] ~31-~31: To join two clauses or introduce examples, consider using an em dash.
Context: ...uth planned for M7) - HR Engine (M7) - Hiring pipeline (request → generate cand...

(DASH_RULE)


[typographical] ~32-~32: To join two clauses or introduce examples, consider using an em dash.
Context: ...registry - Performance Tracking (M7) - Task metrics, CI-based quality scoring, ...

(DASH_RULE)


[grammar] ~32-~32: Please add a punctuation mark at the end of paragraph.
Context: ... detection, multi-window rolling metric aggregation ### Not implemented yet (planned miles...

(PUNCTUATION_PARAGRAPH_END)


[typographical] ~36-~36: To join two clauses or introduce examples, consider using an em dash.
Context: ...ones) - Memory Backend Adapter (M5) - Memory protocols, retrieval pipeline, or...

(DASH_RULE)


[typographical] ~37-~37: To join two clauses or introduce examples, consider using an em dash.
Context: ..., Temporal KG) planned - CLI Surface - cli/ package is placeholder-only - **S...

(DASH_RULE)


[typographical] ~38-~38: To join two clauses or introduce examples, consider using an em dash.
Context: ...only - Security/Approval System (M7) - security/ package is placeholder-only;...

(DASH_RULE)


[typographical] ~39-~39: To join two clauses or introduce examples, consider using an em dash.
Context: ...Ops agent - Advanced Product Surface - web dashboard, external integrations ##...

(DASH_RULE)

DESIGN_SPEC.md Outdated
│ │ └── hr_engine.py # Hiring, firing, performance (M7)
│ ├── hr/ # HR engine: hiring, firing, onboarding, offboarding, agent registry, performance tracking
│ │ ├── __init__.py # Package exports
│ │ ├── enums.py # HR enumerations (AgentStatus, OnboardingStep, etc.)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix the AgentStatus module reference in the structure map.

AgentStatus is defined in src/ai_company/core/enums.py in this PR, not in src/ai_company/hr/enums.py. Leaving this entry as-is will send readers to the wrong module.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@DESIGN_SPEC.md` at line 2827, The structure map entry incorrectly points to
hr.enums for AgentStatus; update the map so the AgentStatus reference points to
the module where it is actually defined (core.enums) instead of hr.enums, i.e.,
replace the AgentStatus module reference in the map from hr.enums to core.enums
so readers are directed to the correct definition.

Comment on lines +65 to +186
async def archive(
self,
*,
agent_id: NotBlankStr,
memory_backend: MemoryBackend,
archival_store: ArchivalStore,
org_memory_backend: OrgMemoryBackend | None = None,
) -> ArchivalResult:
"""Archive all memories for a departing agent.

Args:
agent_id: Agent whose memories to archive.
memory_backend: Hot memory store.
archival_store: Cold archival storage.
org_memory_backend: Optional org memory for promotion.

Returns:
Result of the archival operation.

Raises:
MemoryArchivalError: If retrieval from hot store fails.
"""
try:
entries = await memory_backend.retrieve(
agent_id,
MemoryQuery(limit=_MAX_MEMORIES_PER_ARCHIVAL),
)
except Exception as exc:
msg = f"Failed to retrieve memories for agent {agent_id!r}"
logger.error( # noqa: TRY400
HR_ARCHIVAL_ENTRY_FAILED,
agent_id=agent_id,
phase="retrieve",
error=str(exc),
error_type=type(exc).__name__,
)
raise MemoryArchivalError(msg) from exc

now = datetime.now(UTC)
archived_count = 0
promoted_count = 0
deleted_ids: list[str] = []

for entry in entries:
# Archive to cold storage.
try:
archival_entry = ArchivalEntry(
original_id=entry.id,
agent_id=agent_id,
content=NotBlankStr(entry.content),
category=entry.category,
metadata=entry.metadata,
created_at=entry.created_at,
archived_at=now,
)
await archival_store.archive(archival_entry)
archived_count += 1
deleted_ids.append(str(entry.id))
except (OSError, ValueError, ValidationError) as exc:
logger.warning(
HR_ARCHIVAL_ENTRY_FAILED,
agent_id=agent_id,
memory_id=str(entry.id),
phase="archive",
error=str(exc),
)
continue

# Promote to org memory if eligible.
if (
org_memory_backend is not None
and entry.category in _PROMOTABLE_CATEGORIES
):
try:
org_category = _CATEGORY_MAP[entry.category]
author = OrgFactAuthor(agent_id=agent_id)
write_req = OrgFactWriteRequest(
content=NotBlankStr(entry.content),
category=org_category,
)
await org_memory_backend.write(write_req, author=author)
promoted_count += 1
except (OSError, ValueError, KeyError) as exc:
logger.warning(
HR_ARCHIVAL_ENTRY_FAILED,
agent_id=agent_id,
memory_id=str(entry.id),
phase="promote",
error=str(exc),
)

# Clean hot store.
hot_store_cleaned = True
for memory_id in deleted_ids:
try:
await memory_backend.delete(agent_id, NotBlankStr(memory_id))
except (OSError, ValueError) as exc:
hot_store_cleaned = False
logger.warning(
HR_ARCHIVAL_ENTRY_FAILED,
agent_id=agent_id,
memory_id=memory_id,
phase="delete",
error=str(exc),
)

result = ArchivalResult(
agent_id=agent_id,
total_archived=archived_count,
promoted_to_org=promoted_count,
hot_store_cleaned=hot_store_cleaned,
strategy_name=NotBlankStr(self.name),
)

logger.info(
HR_FIRING_MEMORY_ARCHIVED,
agent_id=agent_id,
archived=archived_count,
promoted=promoted_count,
cleaned=hot_store_cleaned,
)
return result
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Consider extracting helper methods to reduce function length.

The archive() method spans ~120 lines, exceeding the 50-line guideline. The distinct phases (retrieve → archive to cold → promote to org → clean hot store) are natural candidates for extraction into private helper methods (e.g., _archive_entries(), _promote_to_org(), _clean_hot_store()). This would improve readability and testability.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ai_company/hr/full_snapshot_strategy.py` around lines 65 - 186, The
archive() method is too long and should be broken into helpers; extract the
phases into private methods to improve readability and testability: implement a
_retrieve_entries(agent_id, memory_backend) that wraps memory_backend.retrieve
and error handling, an _archive_entries(entries, archival_store, agent_id, now)
that constructs ArchivalEntry, calls archival_store.archive and returns
(archived_count, deleted_ids), a _promote_to_org(entries, org_memory_backend,
agent_id) that handles category mapping via _CATEGORY_MAP,
OrgFactAuthor/OrgFactWriteRequest and returns promoted_count, and a
_clean_hot_store(deleted_ids, memory_backend, agent_id) that deletes ids and
returns hot_store_cleaned; update archive() to call these helpers and assemble
the ArchivalResult while preserving existing logging and exception behavior (use
function names: archive, _retrieve_entries, _archive_entries, _promote_to_org,
_clean_hot_store, ArchivalEntry, archival_store.archive,
org_memory_backend.write, memory_backend.delete).

Comment on lines +156 to +169
# Clean hot store.
hot_store_cleaned = True
for memory_id in deleted_ids:
try:
await memory_backend.delete(agent_id, NotBlankStr(memory_id))
except (OSError, ValueError) as exc:
hot_store_cleaned = False
logger.warning(
HR_ARCHIVAL_ENTRY_FAILED,
agent_id=agent_id,
memory_id=memory_id,
phase="delete",
error=str(exc),
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Description: Check MemoryBackend.delete signature for memory_id type

rg -n --type=py -A 3 'async def delete.*agent_id.*memory_id' src/ai_company/memory/

Repository: Aureliolo/ai-company

Length of output: 46


🏁 Script executed:

#!/bin/bash
# Find MemoryBackend definition
rg -n --type=py 'class MemoryBackend|def delete.*agent_id' src/ai_company/ | head -20

Repository: Aureliolo/ai-company

Length of output: 131


🏁 Script executed:

#!/bin/bash
# Search for memory_backend usage and imports
rg -n --type=py 'memory_backend.delete' src/ai_company/ -B 2 -A 2

Repository: Aureliolo/ai-company

Length of output: 527


🏁 Script executed:

#!/bin/bash
# List files in memory directory
fd . src/ai_company/memory/ --type f -name "*.py" | head -20

Repository: Aureliolo/ai-company

Length of output: 294


🏁 Script executed:

#!/bin/bash
cat src/ai_company/memory/protocol.py

Repository: Aureliolo/ai-company

Length of output: 5317


🏁 Script executed:

#!/bin/bash
# Read the relevant sections of full_snapshot_strategy.py to see where deleted_ids comes from
head -200 src/ai_company/hr/full_snapshot_strategy.py | tail -100

Repository: Aureliolo/ai-company

Length of output: 3727


🏁 Script executed:

#!/bin/bash
# Search for where deleted_ids is defined/assigned
rg -n --type=py 'deleted_ids.*=' src/ai_company/hr/full_snapshot_strategy.py -B 3 -A 1

Repository: Aureliolo/ai-company

Length of output: 206


🏁 Script executed:

#!/bin/bash
# Check MemoryEntry model to see what type entry.id is
rg -n --type=py 'class MemoryEntry' src/ai_company/memory/ -A 20

Repository: Aureliolo/ai-company

Length of output: 1677


Remove redundant NotBlankStr wrapping in delete call.

deleted_ids stores str(entry.id) (line 119), but entry.id is already NotBlankStr from the backend response. Re-wrapping with NotBlankStr(memory_id) at line 160 re-validates unnecessarily at an internal boundary. Either keep deleted_ids as list[NotBlankStr] and pass directly, or if storing as str, pass it directly without re-wrapping—validation already occurred when the entry was retrieved from the backend.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ai_company/hr/full_snapshot_strategy.py` around lines 156 - 169, The
delete loop is re-wrapping already-validated IDs with NotBlankStr; remove the
redundant wrap and pass memory_id directly to memory_backend.delete from the hot
store cleanup loop (symbols: deleted_ids, memory_id, memory_backend.delete,
NotBlankStr, hot_store_cleaned); ensure consistency with how deleted_ids is
populated (either keep deleted_ids as list[NotBlankStr] and pass the elements
unchanged, or keep them as list[str] and pass the raw str) and update the code
that builds deleted_ids accordingly so validation happens only once when entries
are read from the backend.

Comment on lines +129 to +157
async def test_cost_efficiency_zero_cost(self) -> None:
"""Zero cost -> max cost efficiency score."""
strategy = self._make_strategy()
task_result = make_task_metric(cost_usd=0.0)

result = await strategy.score(
agent_id=NotBlankStr("agent-001"),
task_id=NotBlankStr("task-001"),
task_result=task_result,
acceptance_criteria=(),
)

breakdown_dict = dict(result.breakdown)
assert breakdown_dict["cost_efficiency"] == 10.0

async def test_cost_efficiency_high_cost(self) -> None:
"""Cost >= 10 -> zero cost efficiency score."""
strategy = self._make_strategy()
task_result = make_task_metric(cost_usd=15.0)

result = await strategy.score(
agent_id=NotBlankStr("agent-001"),
task_id=NotBlankStr("task-001"),
task_result=task_result,
acceptance_criteria=(),
)

breakdown_dict = dict(result.breakdown)
assert breakdown_dict["cost_efficiency"] == 0.0
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Collapse the cost-efficiency assertions into the parametrized test.

test_cost_efficiency_zero_cost and test_cost_efficiency_high_cost hit the same mapping branches that test_cost_efficiency_parametrized already covers, so keeping both forms only adds drift and maintenance.

As per coding guidelines "Prefer @pytest.mark.parametrize for testing similar cases".

Also applies to: 159-185

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/hr/performance/test_ci_quality_strategy.py` around lines 129 -
157, Merge the two separate tests test_cost_efficiency_zero_cost and
test_cost_efficiency_high_cost into the existing parametrized
test_cost_efficiency_parametrized: add parameter rows for (cost_usd=0.0,
expected=10.0) and (cost_usd=15.0, expected=0.0), use the same setup via
self._make_strategy() and make_task_metric, call
strategy.score(agent_id=NotBlankStr("agent-001"),
task_id=NotBlankStr("task-001"), task_result=task_result,
acceptance_criteria=()), and assert dict(result.breakdown)["cost_efficiency"] ==
expected; then remove the now-duplicate test_cost_efficiency_zero_cost and
test_cost_efficiency_high_cost (and the other duplicate pair noted) so only the
parametrized test covers these cases.

Comment on lines +253 to +285
async def test_snapshot_with_windows_and_trends(self) -> None:
"""Snapshot includes windows and trends from strategies."""
window = WindowMetrics(
window_size=NotBlankStr("7d"),
data_point_count=10,
tasks_completed=8,
tasks_failed=2,
success_rate=0.8,
)
tracker = _make_tracker(
window=MockWindowStrategy(
windows=(window,),
min_data_points=5,
),
)
# Add a scored task record so trend computation has data
record = make_task_metric(
completed_at=NOW - timedelta(hours=1),
quality_score=7.5,
)
await tracker.record_task_metric(record)

snapshot = await tracker.get_snapshot(
NotBlankStr("agent-001"),
now=NOW,
)

assert len(snapshot.windows) == 1
assert snapshot.windows[0].window_size == "7d"
# Trends should be computed (quality_score + cost_usd)
assert len(snapshot.trends) == 2
assert snapshot.overall_quality_score == 7.5

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don't assert trends from a single task sample.

This test only records one TaskMetricRecord, but the feature contract requires minimum data points before trend detection. The mocked WindowMetrics.data_point_count=10 does not supply the time-series values that TrendStrategy.detect(...) consumes, so len(snapshot.trends) == 2 bakes in under-sampled trend output.

Make the scenario satisfy the threshold instead
-        # Add a scored task record so trend computation has data
-        record = make_task_metric(
-            completed_at=NOW - timedelta(hours=1),
-            quality_score=7.5,
-        )
-        await tracker.record_task_metric(record)
+        # Seed at least min_data_points scored records so trend computation is valid.
+        for hours_ago, quality_score in enumerate(
+            (7.1, 7.2, 7.3, 7.4, 7.5),
+            start=1,
+        ):
+            await tracker.record_task_metric(
+                make_task_metric(
+                    task_id=f"task-{hours_ago:03d}",
+                    completed_at=NOW - timedelta(hours=hours_ago),
+                    quality_score=quality_score,
+                )
+            )
If the intent was to verify the threshold guard instead, assert `snapshot.trends == ()`.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/hr/performance/test_tracker.py` around lines 253 - 285, The test
test_snapshot_with_windows_and_trends asserts trends are present despite only
recording one TaskMetricRecord and not supplying the time-series that
TrendStrategy.detect uses; either (A) make the scenario satisfy the
trend-detection threshold by adding enough TaskMetricRecord entries (call
tracker.record_task_metric multiple times at different timestamps so
WindowMetrics.data_point_count and TrendStrategy.detect receive sufficient
samples) before calling tracker.get_snapshot, or (B) change the assertion to
expect no trends (assert snapshot.trends == ()) to verify the threshold guard;
update references in the test around
MockWindowStrategy/WindowMetrics.data_point_count, tracker.record_task_metric,
and tracker.get_snapshot accordingly.

Comment on lines +38 to +52
async def list_tasks(
self,
*,
status: TaskStatus | None = None,
assigned_to: str | None = None,
project: str | None = None,
) -> tuple[Task, ...]:
result = list(self._tasks.values())
if status is not None:
result = [t for t in result if t.status == status]
if assigned_to is not None:
result = [t for t in result if t.assigned_to == assigned_to]
if project is not None:
result = [t for t in result if t.project == project]
return tuple(result)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Type hint for assigned_to parameter should be NotBlankStr | None.

The list_tasks method signature uses assigned_to: str | None = None, but looking at the OffboardingService (line 121 in offboarding_service.py), it calls list_tasks(assigned_to=NotBlankStr(agent_id)). The fake should match the expected interface more closely. However, since this is a test double and Python's duck typing allows it to work, this is minor.

🧹 Proposed fix for type consistency
     async def list_tasks(
         self,
         *,
         status: TaskStatus | None = None,
-        assigned_to: str | None = None,
+        assigned_to: NotBlankStr | None = None,
         project: str | None = None,
     ) -> tuple[Task, ...]:
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/hr/test_offboarding_service.py` around lines 38 - 52, The test
double list_tasks in tests/unit/hr/test_offboarding_service.py declares
assigned_to: str | None but OffboardingService calls it with
NotBlankStr(agent_id); change the signature to assigned_to: NotBlankStr | None
to match the expected type, add the appropriate import for NotBlankStr from the
module where it’s defined, and update any related type hints in that file so the
fake ListTasks function signature aligns with the OffboardingService usage
(function name: list_tasks).

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a significant new "HR Engine" and "Performance Tracking" system, fundamentally expanding the capabilities of the AI company platform. The HR engine manages the entire lifecycle of AI agents, from their initial hiring and onboarding to their eventual offboarding, including task reassignment and memory archival. Complementing this, the performance tracking system provides detailed insights into agent efficiency, quality, and collaboration through various metrics and trend analysis. These additions are supported by new persistence layers and enhanced observability, making the platform more robust and manageable for operating a virtual organization of AI agents.

Highlights

  • HR Engine Implementation: Introduced a comprehensive HR engine covering the full agent lifecycle from hiring to offboarding, including candidate generation, approval, onboarding checklists, and task reassignment.
  • Performance Tracking System: Added a robust performance tracking system for agents, featuring task metric recording, collaboration scoring, quality scoring, Theil-Sen robust trend detection, and multi-window rolling metrics.
  • Dedicated Persistence for HR Data: Implemented SQLite repositories for HR lifecycle events, task metrics, and collaboration metrics, ensuring dedicated and structured storage for this new domain.
  • Enhanced Observability: Integrated new HR and performance-related event constants into the logging system for better monitoring and debugging of agent lifecycle and performance.
  • Database Schema Update: Migrated the SQLite database schema to version 2 to accommodate the new HR and performance tracking data structures.
Changelog
  • CLAUDE.md
    • Updated the directory structure description to include the new "hr/" module.
    • Expanded the logging event names section to include new HR and performance-related constants.
  • DESIGN_SPEC.md
    • Updated the project directory structure to detail the new "hr/" module and its subdirectories for performance tracking.
    • Added "hr_repositories.py" to the "persistence/sqlite" section.
    • Included "hr.py" and "performance.py" in the "observability/events" section.
  • README.md
    • Updated the "Implemented features" section to include "HR Engine" and "Performance Tracking" for M7.
    • Removed "HR workflows" from the "Not implemented yet" section.
  • src/ai_company/communication/enums.py
    • Added "HR_NOTIFICATION" to the "MessageType" enumeration.
  • src/ai_company/core/enums.py
    • Added "ONBOARDING" status to the "AgentStatus" enumeration.
  • src/ai_company/hr/init.py
    • Added package initialization for the HR engine.
  • src/ai_company/hr/archival_protocol.py
    • Defined the "MemoryArchivalStrategy" protocol and "ArchivalResult" model for agent memory archival during offboarding.
  • src/ai_company/hr/enums.py
    • Added HR-specific enumerations: "HiringRequestStatus", "FiringReason", "OnboardingStep", "LifecycleEventType", and "TrendDirection".
  • src/ai_company/hr/errors.py
    • Introduced a comprehensive HR-specific error hierarchy, including errors for hiring, firing, onboarding, registry, and performance.
  • src/ai_company/hr/full_snapshot_strategy.py
    • Implemented the "FullSnapshotStrategy" for archiving agent memories, including promotion of semantic and procedural memories to organizational memory.
  • src/ai_company/hr/hiring_service.py
    • Implemented the "HiringService" to orchestrate the agent hiring pipeline, covering request creation, candidate generation, approval submission, and agent instantiation.
  • src/ai_company/hr/models.py
    • Defined Pydantic models for HR domain entities: "CandidateCard", "HiringRequest", "FiringRequest", "OnboardingStepRecord", "OnboardingChecklist", "OffboardingRecord", and "AgentLifecycleEvent".
  • src/ai_company/hr/offboarding_service.py
    • Implemented the "OffboardingService" to manage the agent offboarding pipeline, including task reassignment, memory archival, and team notifications.
  • src/ai_company/hr/onboarding_service.py
    • Implemented the "OnboardingService" to manage agent onboarding checklists and automatically transition agents to active status upon completion.
  • src/ai_company/hr/performance/init.py
    • Added package initialization for the performance tracking subsystem.
  • src/ai_company/hr/performance/behavioral_collaboration_strategy.py
    • Implemented "BehavioralTelemetryStrategy" for scoring agent collaboration based on behavioral telemetry.
  • src/ai_company/hr/performance/ci_quality_strategy.py
    • Implemented "CISignalQualityStrategy" for scoring task quality based on CI signals like acceptance criteria, task success, and cost efficiency.
  • src/ai_company/hr/performance/collaboration_protocol.py
    • Defined the "CollaborationScoringStrategy" protocol for evaluating agent collaboration behavior.
  • src/ai_company/hr/performance/config.py
    • Defined "PerformanceConfig" for configuring the performance tracking system, including data points, windows, and trend thresholds.
  • src/ai_company/hr/performance/models.py
    • Defined Pydantic models for performance tracking: "TaskMetricRecord", "CollaborationMetricRecord", "QualityScoreResult", "CollaborationScoreResult", "TrendResult", "WindowMetrics", and "AgentPerformanceSnapshot".
  • src/ai_company/hr/performance/multi_window_strategy.py
    • Implemented "MultiWindowStrategy" for computing aggregate metrics across multiple rolling time windows (7d, 30d, 90d).
  • src/ai_company/hr/performance/quality_protocol.py
    • Defined the "QualityScoringStrategy" protocol for evaluating task completion quality.
  • src/ai_company/hr/performance/theil_sen_strategy.py
    • Implemented "TheilSenTrendStrategy" for robust non-parametric trend detection using the Theil-Sen estimator.
  • src/ai_company/hr/performance/tracker.py
    • Implemented the "PerformanceTracker" service, centralizing the recording and querying of agent performance metrics and delegating to various strategies.
  • src/ai_company/hr/performance/trend_protocol.py
    • Defined the "TrendDetectionStrategy" protocol for analyzing metric time series trends.
  • src/ai_company/hr/performance/window_protocol.py
    • Defined the "MetricsWindowStrategy" protocol for rolling-window aggregation of metrics.
  • src/ai_company/hr/persistence_protocol.py
    • Defined HR-specific persistence protocols: "LifecycleEventRepository", "TaskMetricRepository", and "CollaborationMetricRepository".
  • src/ai_company/hr/queue_return_strategy.py
    • Implemented "QueueReturnStrategy" for reassigning tasks by marking them as interrupted during offboarding.
  • src/ai_company/hr/reassignment_protocol.py
    • Defined the "TaskReassignmentStrategy" protocol for handling task reassignment during agent termination.
  • src/ai_company/hr/registry.py
    • Implemented the "AgentRegistryService" for tracking active agents, their identities, and lifecycle status transitions.
  • src/ai_company/observability/events/hr.py
    • Added new event constants for HR-related structured logging, covering hiring, firing, onboarding, and registry events.
  • src/ai_company/observability/events/performance.py
    • Added new event constants for performance tracking structured logging, covering metric recording, scoring, snapshots, and trends.
  • src/ai_company/observability/events/persistence.py
    • Added new persistence event constants for HR lifecycle events, task metrics, and collaboration metrics.
  • src/ai_company/persistence/protocol.py
    • Updated the "PersistenceBackend" protocol to include properties for the new "LifecycleEventRepository", "TaskMetricRepository", and "CollaborationMetricRepository".
  • src/ai_company/persistence/repositories.py
    • Updated the "all" export list to include the new HR-related repository protocols.
  • src/ai_company/persistence/sqlite/backend.py
    • Updated "SQLitePersistenceBackend" to initialize and expose the new "SQLiteLifecycleEventRepository", "SQLiteTaskMetricRepository", and "SQLiteCollaborationMetricRepository".
  • src/ai_company/persistence/sqlite/hr_repositories.py
    • Added SQLite implementations for "LifecycleEventRepository", "TaskMetricRepository", and "CollaborationMetricRepository".
  • src/ai_company/persistence/sqlite/migrations.py
    • Updated "SCHEMA_VERSION" to 2 and added "_V2_STATEMENTS" to create tables for "lifecycle_events", "task_metrics", and "collaboration_metrics".
  • src/ai_company/persistence/sqlite/repositories.py
    • Updated the module docstring to clarify the separation of HR-related repositories.
  • tests/unit/api/conftest.py
    • Added "FakeLifecycleEventRepository", "FakeTaskMetricRepository", and "FakeCollaborationMetricRepository" to support testing the new HR persistence.
    • Updated "FakePersistenceBackend" to include the new fake HR repositories.
  • tests/unit/communication/test_enums.py
    • Updated the test for "MessageType" to reflect the addition of "HR_NOTIFICATION".
  • tests/unit/core/test_enums.py
    • Updated the test for "AgentStatus" to reflect the addition of "ONBOARDING".
  • tests/unit/hr/conftest.py
    • Added helper functions ("make_agent_identity", "make_candidate_card", "make_hiring_request", "make_firing_request", "make_task") and fixtures ("registry", "onboarding_service", "hiring_service") for HR unit tests.
  • tests/unit/hr/performance/conftest.py
    • Added helper functions ("make_task_metric", "make_collab_metric", "make_acceptance_criterion") for performance tracking unit tests.
  • tests/unit/hr/performance/test_behavioral_collaboration_strategy.py
    • Added unit tests for the "BehavioralTelemetryStrategy".
  • tests/unit/hr/performance/test_ci_quality_strategy.py
    • Added unit tests for the "CISignalQualityStrategy".
  • tests/unit/hr/performance/test_models.py
    • Added unit tests for the performance tracking models ("TaskMetricRecord", "CollaborationMetricRecord", "QualityScoreResult", "CollaborationScoreResult", "TrendResult", "WindowMetrics", "AgentPerformanceSnapshot").
  • tests/unit/hr/performance/test_multi_window_strategy.py
    • Added unit tests for the "MultiWindowStrategy".
  • tests/unit/hr/performance/test_theil_sen_strategy.py
    • Added unit tests for the "TheilSenTrendStrategy".
  • tests/unit/hr/performance/test_tracker.py
    • Added unit tests for the "PerformanceTracker" service, including mock strategies.
  • tests/unit/hr/test_enums.py
    • Added unit tests for HR enumerations.
  • tests/unit/hr/test_errors.py
    • Added unit tests for the HR error hierarchy.
  • tests/unit/hr/test_full_snapshot_strategy.py
    • Added unit tests for the "FullSnapshotStrategy", including fake backends.
  • tests/unit/hr/test_hiring_service.py
    • Added unit tests for the "HiringService".
  • tests/unit/hr/test_models.py
    • Added unit tests for HR domain models.
  • tests/unit/hr/test_offboarding_service.py
    • Added unit tests for the "OffboardingService", including fake collaborators.
  • tests/unit/hr/test_onboarding_service.py
    • Added unit tests for the "OnboardingService".
  • tests/unit/hr/test_persistence.py
    • Added unit tests for the SQLite HR repository implementations, including a database fixture.
  • tests/unit/hr/test_queue_return_strategy.py
    • Added unit tests for the "QueueReturnStrategy".
  • tests/unit/hr/test_registry.py
    • Added unit tests for the "AgentRegistryService".
  • tests/unit/observability/test_events.py
    • Updated "test_all_domain_modules_discovered" to include "hr" and "performance" event modules.
  • tests/unit/persistence/test_migrations_v2.py
    • Added new unit tests to verify the v2 database migration, including table and index creation.
Activity
  • The pull request was pre-reviewed by 9 different AI agents (code-reviewer, python-reviewer, pr-test-analyzer, silent-failure-hunter, comment-analyzer, type-design-analyzer, logging-audit, resilience-audit, docs-consistency).
  • 57 findings from these pre-reviews were triaged and implemented, indicating a thorough internal review process before human review.
  • Critical bug fixes were applied, such as ensuring trend detection filters records per time window and adding a re-instantiation vulnerability guard.
  • Code quality improvements were made by narrowing "except Exception" blocks and adding missing "logger.warning()" calls as per "CLAUDE.md" rules.
  • "NotBlankStr" type violations were fixed, and model validators were added for improved data integrity.
  • Documentation files ("DESIGN_SPEC.md", "CLAUDE.md", "README.md") were updated to reflect the new features.
  • All 5502 unit tests passed, "mypy strict" passed, and "ruff lint + format" were clean.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a comprehensive HR engine and performance tracking system, including a full hiring pipeline, onboarding/offboarding processes, an agent registry, and detailed performance metrics. The persistence layer has been extended with SQLite repositories for new HR-related data. However, two significant race conditions were identified in the HiringService and OnboardingService due to asynchronous operations between reading and updating state, which can lead to duplicate agent instantiation and inconsistent onboarding checklists. It is recommended to use asynchronous locks to ensure atomic state transitions. Additionally, several key services (HiringService, OnboardingService, PerformanceTracker, AgentRegistryService) are currently using in-memory storage, leading to state loss on restart, and should be refactored to utilize the new persistence layer. There are also opportunities for more specific exception handling.

Comment on lines +69 to +70
self._task_metrics: dict[str, list[TaskMetricRecord]] = {}
self._collab_metrics: dict[str, list[CollaborationMetricRecord]] = {}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The PerformanceTracker service uses in-memory dictionaries for storing task and collaboration metrics. This is a critical issue as all performance data will be lost upon service restart. The PR includes the implementation of SQLiteTaskMetricRepository and SQLiteCollaborationMetricRepository. This service should be refactored to use these repositories for persistent storage.

*,
message_bus: MessageBus | None = None,
) -> None:
self._agents: dict[str, AgentIdentity] = {}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The AgentRegistryService stores agent identities in an in-memory dictionary. This is a critical flaw, as the entire state of the company's workforce will be lost if the application restarts. The agent registry must be backed by a persistent store. The PR includes persistence layers for other HR entities, and the agent registry should be treated with the same importance.

self._registry = registry
self._approval_store = approval_store
self._onboarding_service = onboarding_service
self._requests: dict[str, HiringRequest] = {}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The HiringService uses an in-memory dictionary self._requests to store hiring requests. This makes the state of hiring requests volatile, as it will be lost if the service restarts. The PR correctly adds persistence repositories for HR data, but this service doesn't seem to use them. The service should be refactored to accept and use a repository for HiringRequest persistence.

registry: AgentRegistryService,
) -> None:
self._registry = registry
self._checklists: dict[str, OnboardingChecklist] = {}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The OnboardingService uses an in-memory dictionary self._checklists to store onboarding checklists. This state is not persistent and will be lost on service restart. This service should be updated to use a persistent repository, similar to how other parts of the system are designed.

Comment on lines +264 to +344
if request.status == HiringRequestStatus.INSTANTIATED:
msg = f"Hiring request {request.id!r} is already instantiated"
logger.warning(
HR_HIRING_INSTANTIATION_FAILED,
request_id=str(request.id),
error=msg,
)
raise HiringError(msg)
if request.status == HiringRequestStatus.REJECTED:
msg = f"Hiring request {request.id!r} was rejected"
logger.warning(
HR_HIRING_INSTANTIATION_FAILED,
request_id=str(request.id),
error=msg,
)
raise HiringRejectedError(msg)
if request.status == HiringRequestStatus.PENDING:
msg = f"Hiring request {request.id!r} requires approval"
logger.warning(
HR_HIRING_INSTANTIATION_FAILED,
request_id=str(request.id),
error=msg,
)
raise HiringApprovalRequiredError(msg)
if request.selected_candidate_id is None:
msg = f"No candidate selected on request {request.id!r}"
logger.warning(
HR_HIRING_INSTANTIATION_FAILED,
request_id=str(request.id),
error=msg,
)
raise InvalidCandidateError(msg)

candidate = next(
(
c
for c in request.candidates
if str(c.id) == request.selected_candidate_id
),
None,
)
if candidate is None:
msg = (
f"Selected candidate {request.selected_candidate_id!r} "
f"not found on request {request.id!r}"
)
logger.warning(
HR_HIRING_INSTANTIATION_FAILED,
request_id=str(request.id),
error=msg,
)
raise InvalidCandidateError(msg)

try:
identity = AgentIdentity(
name=candidate.name,
role=candidate.role,
department=candidate.department,
level=candidate.level,
model=ModelConfig(
provider=NotBlankStr("default-provider"),
model_id=NotBlankStr("default-model-001"),
),
status=AgentStatus.ONBOARDING,
hiring_date=datetime.now(UTC).date(),
)
await self._registry.register(identity)
except Exception as exc:
msg = f"Failed to instantiate agent for request {request.id!r}"
logger.exception(
HR_HIRING_INSTANTIATION_FAILED,
request_id=str(request.id),
error=str(exc),
)
raise HiringError(msg) from exc

# Update request status.
updated = request.model_copy(
update={"status": HiringRequestStatus.INSTANTIATED},
)
self._requests[str(updated.id)] = updated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-medium medium

The instantiate_agent method is vulnerable to a race condition. It checks if a hiring request has already been instantiated at the beginning of the method, but then performs an asynchronous operation (await self._registry.register(identity)) before updating the request status to INSTANTIATED. In a concurrent environment, multiple tasks could pass the initial status check before the first one completes and updates the status, leading to multiple agents being hired for a single approved request, potentially resulting in unauthorized resource allocation and budget bypass. Additionally, the except Exception block is too broad and can hide specific issues during agent registration, such as AgentAlreadyRegisteredError. Consider catching more specific exceptions first, and use except Exception as a final catch-all if necessary.

Comment on lines +103 to +145
checklist = self._checklists.get(agent_id)
if checklist is None:
msg = f"No onboarding checklist for agent {agent_id!r}"
logger.warning(
HR_ONBOARDING_STEP_COMPLETE,
agent_id=agent_id,
error=msg,
)
raise OnboardingError(msg)

now = datetime.now(UTC)
step_found = any(s.step == step and not s.completed for s in checklist.steps)
if not step_found:
logger.debug(
HR_ONBOARDING_STEP_COMPLETE,
agent_id=agent_id,
step=step.value,
skipped="step_not_found_or_already_completed",
)
return checklist

updated_steps = tuple(
s.model_copy(
update={
"completed": True,
"completed_at": now,
"notes": notes,
},
)
if s.step == step and not s.completed
else s
for s in checklist.steps
)

updated = checklist.model_copy(update={"steps": updated_steps})

# Check if all steps are now complete.
if updated.is_complete and not checklist.is_complete:
updated = updated.model_copy(update={"completed_at": now})
await self._registry.update_status(agent_id, AgentStatus.ACTIVE)
logger.info(HR_ONBOARDING_COMPLETE, agent_id=agent_id)

self._checklists[agent_id] = updated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-medium medium

The complete_step method contains a race condition that can lead to state inconsistency. It reads the current checklist from self._checklists, performs an asynchronous operation (await self._registry.update_status(...)) if the checklist is completed, and then saves the updated checklist back to the dictionary. If multiple steps for the same agent are completed concurrently, one update may overwrite another because each task is working with a potentially stale copy of the checklist read before the other task's update was saved. This could result in completed steps being reverted to a pending state.

Comment on lines +92 to +101
except Exception as exc:
msg = f"Failed to retrieve memories for agent {agent_id!r}"
logger.error( # noqa: TRY400
HR_ARCHIVAL_ENTRY_FAILED,
agent_id=agent_id,
phase="retrieve",
error=str(exc),
error_type=type(exc).__name__,
)
raise MemoryArchivalError(msg) from exc
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This except Exception block is too broad and can mask unexpected errors. The PR description mentions that broad exception blocks were narrowed, but this one seems to have been missed. It's better to catch more specific exceptions that you expect memory_backend.retrieve to raise, such as IOError, ConnectionError, or a custom MemoryRetrievalError if one exists in the project.

Suggested change
except Exception as exc:
msg = f"Failed to retrieve memories for agent {agent_id!r}"
logger.error( # noqa: TRY400
HR_ARCHIVAL_ENTRY_FAILED,
agent_id=agent_id,
phase="retrieve",
error=str(exc),
error_type=type(exc).__name__,
)
raise MemoryArchivalError(msg) from exc
except (IOError, ConnectionError) as exc: # Or other specific expected errors
msg = f"Failed to retrieve memories for agent {agent_id!r}"
logger.error( # noqa: TRY400
HR_ARCHIVAL_ENTRY_FAILED,
agent_id=agent_id,
phase="retrieve",
error=str(exc),
error_type=type(exc).__name__,
)
raise MemoryArchivalError(msg) from exc

Implement the full HR module (agent lifecycle management) and performance
tracking system for M7. Includes:

- Performance models, protocols, and 4 pluggable strategies (CI quality
  scoring, behavioral collaboration, multi-window metrics, Theil-Sen
  trend detection)
- PerformanceTracker service with in-memory storage
- HR models (CandidateCard, HiringRequest, FiringRequest, onboarding/
  offboarding records, lifecycle events)
- AgentRegistryService with asyncio.Lock-protected hot-pluggable registry
- HiringService (request → candidate → approval → instantiation)
- OnboardingService (checklist tracking, auto-activation)
- OffboardingService (task reassignment → memory archival → team notify)
- QueueReturnStrategy (D9) and FullSnapshotStrategy (D10)
- SQLite schema v2 migration (lifecycle_events, task_metrics,
  collaboration_metrics tables)
- 3 new SQLite repository implementations
- HR and performance persistence protocols
- AgentStatus.ONBOARDING and MessageType.HR_NOTIFICATION enums
- 16 HR + 6 performance observability event constants
- 286 unit tests at 91.6% coverage
Pre-reviewed by 9 agents, 57 findings addressed:
- Fix critical bug: trend detection now filters records per time window
- Fix re-instantiation vulnerability: guard against INSTANTIATED status
- Narrow except Exception blocks to specific types across 5+ locations
- Add missing logger.warning() before raises per CLAUDE.md rules
- Extract HR repos to hr_repositories.py (800-line file limit)
- Fix since/until types from object to AwareDatetime in repo queries
- Add model validators (threshold ordering, temporal order, step consistency)
- Fix NotBlankStr type violations in 5 model fields
- Update DESIGN_SPEC.md, CLAUDE.md, README.md for HR module
- Fix ActionType.HIRING → ActionType.ORG_HIRE (new enum from SecOps merge)
- Resolve rebase conflicts in CLAUDE.md, DESIGN_SPEC.md, README.md
- Merge security event constants into CLAUDE.md logging section
- Update implementation snapshot to include SecOps + Docker/MCP/HR
- Fix test_events.py to include security, hr, and performance modules
- Fix hiring_service: _get_request pattern, specific exception handling,
  budget_limit_monthly validation, configurable default_model_config,
  refactored into small helpers
- Fix offboarding_service: default hot_store_cleaned=False, termination
  raises OffboardingError, broader exception handling, refactored helpers
- Fix full_snapshot_strategy: OrgFactAuthor receives agent_seniority,
  narrowed exception handling, refactored into helpers
- Fix performance tracker: correct default strategy imports, optional
  strategy params with PerformanceConfig defaults, window label warning
- Fix ci_quality_strategy: log-scaled cost scoring replaces broken linear
- Fix multi_window_strategy: success_rate=None below min_data_points
- Fix collaboration_protocol docstring accuracy
- Fix theil_sen_strategy _median docstring for empty-list sentinel
- Lint fixes: TRY400, TC001, PLC0415, EM101, TRY003, N817, I001, E501
- Test updates for all source changes + new edge-case tests
@Aureliolo Aureliolo force-pushed the feat/hr-engine-and-performance-tracking branch from 48f28cf to f2c49f7 Compare March 10, 2026 09:28
@Aureliolo Aureliolo merged commit 2d091ea into main Mar 10, 2026
8 of 9 checks passed
@Aureliolo Aureliolo deleted the feat/hr-engine-and-performance-tracking branch March 10, 2026 09:28
Comment on lines +315 to +324
# Cost trend.
cost_values = tuple((r.completed_at, r.cost_usd) for r in window_records)
if cost_values:
trends.append(
self._trend_strategy.detect(
metric_name=NotBlankStr("cost_usd"),
values=cost_values,
window_size=window.window_size,
)
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cost trend direction is semantically inverted

TheilSenTrendStrategy uses the same sign convention for every metric: positive slope → IMPROVING, negative slope → DECLINING. This is correct for quality_score (higher is better), but wrong for cost_usd where a positive slope (rising costs) should be DECLINING.

As written, an agent whose cost_usd trend is increasing over a window will be reported as IMPROVING, and a cost-efficient agent reducing spend will be reported as DECLINING. The result is that cost trend data in AgentPerformanceSnapshot is actively misleading.

The fix requires either:

  1. Passing a higher_is_better: bool flag into TrendDetectionStrategy.detect() so it can invert the direction for cost metrics, or
  2. Inverting the values before passing them (e.g., pass -cost_usd) for metrics where lower is better.
# Cost trend — negate cost so that declining cost = positive slope = IMPROVING.
cost_values = tuple((r.completed_at, -r.cost_usd) for r in window_records)
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/ai_company/hr/performance/tracker.py
Line: 315-324

Comment:
**Cost trend direction is semantically inverted**

`TheilSenTrendStrategy` uses the same sign convention for every metric: positive slope → `IMPROVING`, negative slope → `DECLINING`. This is correct for `quality_score` (higher is better), but wrong for `cost_usd` where a **positive slope (rising costs) should be `DECLINING`**.

As written, an agent whose `cost_usd` trend is increasing over a window will be reported as `IMPROVING`, and a cost-efficient agent reducing spend will be reported as `DECLINING`. The result is that cost trend data in `AgentPerformanceSnapshot` is actively misleading.

The fix requires either:
1. Passing a `higher_is_better: bool` flag into `TrendDetectionStrategy.detect()` so it can invert the direction for cost metrics, or
2. Inverting the values before passing them (e.g., pass `-cost_usd`) for metrics where lower is better.

```python
# Cost trend — negate cost so that declining cost = positive slope = IMPROVING.
cost_values = tuple((r.completed_at, -r.cost_usd) for r in window_records)
```

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +93 to +99
msg = f"Hiring request {request_id!r} not found"
logger.warning(
HR_HIRING_REQUEST_CREATED,
request_id=request_id,
error=msg,
)
raise HiringError(msg)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrong log event constant used for "not found" error

_get_request logs HR_HIRING_REQUEST_CREATED when a hiring request is not found. Any monitoring rule or structured-log query watching HR_HIRING_REQUEST_CREATED will incorrectly count this error path as a successful request creation event.

The correct constant here should be something that signals a lookup failure. If no dedicated "not found" constant exists in events.hr, using the downstream failure event (e.g. HR_HIRING_INSTANTIATION_FAILED) or creating a new HR_HIRING_REQUEST_NOT_FOUND constant would be more accurate.

Suggested change
msg = f"Hiring request {request_id!r} not found"
logger.warning(
HR_HIRING_REQUEST_CREATED,
request_id=request_id,
error=msg,
)
raise HiringError(msg)
logger.warning(
HR_HIRING_INSTANTIATION_FAILED,
request_id=request_id,
error=msg,
)
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/ai_company/hr/hiring_service.py
Line: 93-99

Comment:
**Wrong log event constant used for "not found" error**

`_get_request` logs `HR_HIRING_REQUEST_CREATED` when a hiring request is **not found**. Any monitoring rule or structured-log query watching `HR_HIRING_REQUEST_CREATED` will incorrectly count this error path as a successful request creation event.

The correct constant here should be something that signals a lookup failure. If no dedicated "not found" constant exists in `events.hr`, using the downstream failure event (e.g. `HR_HIRING_INSTANTIATION_FAILED`) or creating a new `HR_HIRING_REQUEST_NOT_FOUND` constant would be more accurate.

```suggestion
            logger.warning(
                HR_HIRING_INSTANTIATION_FAILED,
                request_id=request_id,
                error=msg,
            )
```

How can I resolve this? If you propose a fix, please make it concise.

agent_name=identity.name,
firing_request_id=request.id,
tasks_reassigned=tasks_reassigned,
memory_archive_id=None,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

memory_archive_id is always hardcoded to None

OffboardingRecord.memory_archive_id exists to capture the ID of the cold-storage archive created during offboarding, but it is unconditionally set to None here. The archival_result object (of type ArchivalResult) is available at this point and carries total_archived and promoted_to_org, but ArchivalResult itself does not expose an archive ID field — meaning there is currently no way to populate this field.

This is a gap between the data model (OffboardingRecord has the field; ArchivalResult does not). If the archive ID is needed for auditability or downstream references, ArchivalResult needs an archive_id: NotBlankStr | None field that implementing strategies can populate. Otherwise the field in OffboardingRecord is permanently dead and creates misleading API surface.

At minimum, consider adding archive_id: NotBlankStr | None = None to ArchivalResult so that FullSnapshotStrategy (or any other archival implementation) can return the ID of the archive record it creates.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/ai_company/hr/offboarding_service.py
Line: 144

Comment:
**`memory_archive_id` is always hardcoded to `None`**

`OffboardingRecord.memory_archive_id` exists to capture the ID of the cold-storage archive created during offboarding, but it is unconditionally set to `None` here. The `archival_result` object (of type `ArchivalResult`) is available at this point and carries `total_archived` and `promoted_to_org`, but `ArchivalResult` itself does not expose an archive ID field — meaning there is currently no way to populate this field.

This is a gap between the data model (`OffboardingRecord` has the field; `ArchivalResult` does not). If the archive ID is needed for auditability or downstream references, `ArchivalResult` needs an `archive_id: NotBlankStr | None` field that implementing strategies can populate. Otherwise the field in `OffboardingRecord` is permanently dead and creates misleading API surface.

At minimum, consider adding `archive_id: NotBlankStr | None = None` to `ArchivalResult` so that `FullSnapshotStrategy` (or any other archival implementation) can return the ID of the archive record it creates.

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +163 to +170
"""
result = await self._quality_strategy.score(
agent_id=agent_id,
task_id=task_id,
task_result=task_result,
acceptance_criteria=acceptance_criteria,
)
return task_result.model_copy(update={"quality_score": result.score})
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

score_task_quality does not update the stored record

score_task_quality computes the quality score and returns a new TaskMetricRecord with quality_score populated, but it never writes the updated record back into self._task_metrics. The in-memory store continues to hold the original record with quality_score=None.

Consequence: get_snapshot() aggregates overall_quality_score from self._task_metrics, so it will always compute None (no scored records) even after score_task_quality has been called. Trend detection on quality_score in _compute_trends is similarly broken — quality_values will always be empty because no stored record ever has a non-None quality_score.

The only workaround for callers today is to take the returned scored record and call record_task_metric(scored_record) a second time — which also duplicates the entry in the list. There is no "replace" API.

The method should write the scored record back into self._task_metrics, replacing the original entry by matching on record.id, before returning it.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/ai_company/hr/performance/tracker.py
Line: 163-170

Comment:
**`score_task_quality` does not update the stored record**

`score_task_quality` computes the quality score and returns a new `TaskMetricRecord` with `quality_score` populated, but it **never writes the updated record back into `self._task_metrics`**. The in-memory store continues to hold the original record with `quality_score=None`.

Consequence: `get_snapshot()` aggregates `overall_quality_score` from `self._task_metrics`, so it will always compute `None` (no scored records) even after `score_task_quality` has been called. Trend detection on `quality_score` in `_compute_trends` is similarly broken — `quality_values` will always be empty because no stored record ever has a non-`None` `quality_score`.

The only workaround for callers today is to take the returned scored record and call `record_task_metric(scored_record)` a second time — which also duplicates the entry in the list. There is no "replace" API.

The method should write the scored record back into `self._task_metrics`, replacing the original entry by matching on `record.id`, before returning it.

How can I resolve this? If you propose a fix, please make it concise.

Aureliolo added a commit that referenced this pull request Mar 10, 2026
🤖 I have created a release *beep* *boop*
---


##
[0.1.1](ai-company-v0.1.0...ai-company-v0.1.1)
(2026-03-10)


### Features

* add autonomy levels and approval timeout policies
([#42](#42),
[#126](#126))
([#197](#197))
([eecc25a](eecc25a))
* add CFO cost optimization service with anomaly detection, reports, and
approval decisions
([#186](#186))
([a7fa00b](a7fa00b))
* add code quality toolchain (ruff, mypy, pre-commit, dependabot)
([#63](#63))
([36681a8](36681a8))
* add configurable cost tiers and subscription/quota-aware tracking
([#67](#67))
([#185](#185))
([9baedfa](9baedfa))
* add container packaging, Docker Compose, and CI pipeline
([#269](#269))
([435bdfe](435bdfe)),
closes [#267](#267)
* add coordination error taxonomy classification pipeline
([#146](#146))
([#181](#181))
([70c7480](70c7480))
* add cost-optimized, hierarchical, and auction assignment strategies
([#175](#175))
([ce924fa](ce924fa)),
closes [#173](#173)
* add design specification, license, and project setup
([8669a09](8669a09))
* add env var substitution and config file auto-discovery
([#77](#77))
([7f53832](7f53832))
* add FastestStrategy routing + vendor-agnostic cleanup
([#140](#140))
([09619cb](09619cb)),
closes [#139](#139)
* add HR engine and performance tracking
([#45](#45),
[#47](#47))
([#193](#193))
([2d091ea](2d091ea))
* add issue auto-search and resolution verification to PR review skill
([#119](#119))
([deecc39](deecc39))
* add memory retrieval, ranking, and context injection pipeline
([#41](#41))
([873b0aa](873b0aa))
* add pluggable MemoryBackend protocol with models, config, and events
([#180](#180))
([46cfdd4](46cfdd4))
* add pluggable MemoryBackend protocol with models, config, and events
([#32](#32))
([46cfdd4](46cfdd4))
* add pluggable PersistenceBackend protocol with SQLite implementation
([#36](#36))
([f753779](f753779))
* add progressive trust and promotion/demotion subsystems
([#43](#43),
[#49](#49))
([3a87c08](3a87c08))
* add retry handler, rate limiter, and provider resilience
([#100](#100))
([b890545](b890545))
* add SecOps security agent with rule engine, audit log, and ToolInvoker
integration ([#40](#40))
([83b7b6c](83b7b6c))
* add shared org memory and memory consolidation/archival
([#125](#125),
[#48](#48))
([4a0832b](4a0832b))
* design unified provider interface
([#86](#86))
([3e23d64](3e23d64))
* expand template presets, rosters, and add inheritance
([#80](#80),
[#81](#81),
[#84](#84))
([15a9134](15a9134))
* implement agent runtime state vs immutable config split
([#115](#115))
([4cb1ca5](4cb1ca5))
* implement AgentEngine core orchestrator
([#11](#11))
([#143](#143))
([f2eb73a](f2eb73a))
* implement basic tool system (registry, invocation, results)
([#15](#15))
([c51068b](c51068b))
* implement built-in file system tools
([#18](#18))
([325ef98](325ef98))
* implement communication foundation — message bus, dispatcher, and
messenger ([#157](#157))
([8e71bfd](8e71bfd))
* implement company template system with 7 built-in presets
([#85](#85))
([cbf1496](cbf1496))
* implement conflict resolution protocol
([#122](#122))
([#166](#166))
([e03f9f2](e03f9f2))
* implement core entity and role system models
([#69](#69))
([acf9801](acf9801))
* implement crash recovery with fail-and-reassign strategy
([#149](#149))
([e6e91ed](e6e91ed))
* implement engine extensions — Plan-and-Execute loop and call
categorization
([#134](#134),
[#135](#135))
([#159](#159))
([9b2699f](9b2699f))
* implement enterprise logging system with structlog
([#73](#73))
([2f787e5](2f787e5))
* implement graceful shutdown with cooperative timeout strategy
([#130](#130))
([6592515](6592515))
* implement hierarchical delegation and loop prevention
([#12](#12),
[#17](#17))
([6be60b6](6be60b6))
* implement LiteLLM driver and provider registry
([#88](#88))
([ae3f18b](ae3f18b)),
closes [#4](#4)
* implement LLM decomposition strategy and workspace isolation
([#174](#174))
([aa0eefe](aa0eefe))
* implement meeting protocol system
([#123](#123))
([ee7caca](ee7caca))
* implement message and communication domain models
([#74](#74))
([560a5d2](560a5d2))
* implement model routing engine
([#99](#99))
([d3c250b](d3c250b))
* implement parallel agent execution
([#22](#22))
([#161](#161))
([65940b3](65940b3))
* implement per-call cost tracking service
([#7](#7))
([#102](#102))
([c4f1f1c](c4f1f1c))
* implement personality injection and system prompt construction
([#105](#105))
([934dd85](934dd85))
* implement single-task execution lifecycle
([#21](#21))
([#144](#144))
([c7e64e4](c7e64e4))
* implement subprocess sandbox for tool execution isolation
([#131](#131))
([#153](#153))
([3c8394e](3c8394e))
* implement task assignment subsystem with pluggable strategies
([#172](#172))
([c7f1b26](c7f1b26)),
closes [#26](#26)
[#30](#30)
* implement task decomposition and routing engine
([#14](#14))
([9c7fb52](9c7fb52))
* implement Task, Project, Artifact, Budget, and Cost domain models
([#71](#71))
([81eabf1](81eabf1))
* implement tool permission checking
([#16](#16))
([833c190](833c190))
* implement YAML config loader with Pydantic validation
([#59](#59))
([ff3a2ba](ff3a2ba))
* implement YAML config loader with Pydantic validation
([#75](#75))
([ff3a2ba](ff3a2ba))
* initialize project with uv, hatchling, and src layout
([39005f9](39005f9))
* initialize project with uv, hatchling, and src layout
([#62](#62))
([39005f9](39005f9))
* Litestar REST API, WebSocket feed, and approval queue (M6)
([#189](#189))
([29fcd08](29fcd08))
* make TokenUsage.total_tokens a computed field
([#118](#118))
([c0bab18](c0bab18)),
closes [#109](#109)
* parallel tool execution in ToolInvoker.invoke_all
([#137](#137))
([58517ee](58517ee))
* testing framework, CI pipeline, and M0 gap fixes
([#64](#64))
([f581749](f581749))
* wire all modules into observability system
([#97](#97))
([f7a0617](f7a0617))


### Bug Fixes

* address Greptile post-merge review findings from PRs
[#170](https://github.com/Aureliolo/ai-company/issues/170)-[#175](https://github.com/Aureliolo/ai-company/issues/175)
([#176](#176))
([c5ca929](c5ca929))
* address post-merge review feedback from PRs
[#164](https://github.com/Aureliolo/ai-company/issues/164)-[#167](https://github.com/Aureliolo/ai-company/issues/167)
([#170](#170))
([3bf897a](3bf897a)),
closes [#169](#169)
* enforce strict mypy on test files
([#89](#89))
([aeeff8c](aeeff8c))
* harden Docker sandbox, MCP bridge, and code runner
([#50](#50),
[#53](#53))
([d5e1b6e](d5e1b6e))
* harden git tools security + code quality improvements
([#150](#150))
([000a325](000a325))
* harden subprocess cleanup, env filtering, and shutdown resilience
([#155](#155))
([d1fe1fb](d1fe1fb))
* incorporate post-merge feedback + pre-PR review fixes
([#164](#164))
([c02832a](c02832a))
* pre-PR review fixes for post-merge findings
([#183](#183))
([26b3108](26b3108))
* strengthen immutability for BaseTool schema and ToolInvoker boundaries
([#117](#117))
([7e5e861](7e5e861))


### Performance

* harden non-inferable principle implementation
([#195](#195))
([02b5f4e](02b5f4e)),
closes [#188](#188)


### Refactoring

* adopt NotBlankStr across all models
([#108](#108))
([#120](#120))
([ef89b90](ef89b90))
* extract _SpendingTotals base class from spending summary models
([#111](#111))
([2f39c1b](2f39c1b))
* harden BudgetEnforcer with error handling, validation extraction, and
review fixes
([#182](#182))
([c107bf9](c107bf9))
* harden personality profiles, department validation, and template
rendering ([#158](#158))
([10b2299](10b2299))
* pre-PR review improvements for ExecutionLoop + ReAct loop
([#124](#124))
([8dfb3c0](8dfb3c0))
* split events.py into per-domain event modules
([#136](#136))
([e9cba89](e9cba89))


### Documentation

* add ADR-001 memory layer evaluation and selection
([#178](#178))
([db3026f](db3026f)),
closes [#39](#39)
* add agent scaling research findings to DESIGN_SPEC
([#145](#145))
([57e487b](57e487b))
* add CLAUDE.md, contributing guide, and dev documentation
([#65](#65))
([55c1025](55c1025)),
closes [#54](#54)
* add crash recovery, sandboxing, analytics, and testing decisions
([#127](#127))
([5c11595](5c11595))
* address external review feedback with MVP scope and new protocols
([#128](#128))
([3b30b9a](3b30b9a))
* expand design spec with pluggable strategy protocols
([#121](#121))
([6832db6](6832db6))
* finalize 23 design decisions (ADR-002)
([#190](#190))
([8c39742](8c39742))
* update project docs for M2.5 conventions and add docs-consistency
review agent
([#114](#114))
([99766ee](99766ee))


### Tests

* add e2e single agent integration tests
([#24](#24))
([#156](#156))
([f566fb4](f566fb4))
* add provider adapter integration tests
([#90](#90))
([40a61f4](40a61f4))


### CI/CD

* add Release Please for automated versioning and GitHub Releases
([#278](#278))
([a488758](a488758))
* bump actions/checkout from 4 to 6
([#95](#95))
([1897247](1897247))
* bump actions/upload-artifact from 4 to 7
([#94](#94))
([27b1517](27b1517))
* harden CI/CD pipeline
([#92](#92))
([ce4693c](ce4693c))
* split vulnerability scans into critical-fail and high-warn tiers
([#277](#277))
([aba48af](aba48af))


### Maintenance

* add /worktree skill for parallel worktree management
([#171](#171))
([951e337](951e337))
* add design spec context loading to research-link skill
([8ef9685](8ef9685))
* add post-merge-cleanup skill
([#70](#70))
([f913705](f913705))
* add pre-pr-review skill and update CLAUDE.md
([#103](#103))
([92e9023](92e9023))
* add research-link skill and rename skill files to SKILL.md
([#101](#101))
([651c577](651c577))
* bump aiosqlite from 0.21.0 to 0.22.1
([#191](#191))
([3274a86](3274a86))
* bump pyyaml from 6.0.2 to 6.0.3 in the minor-and-patch group
([#96](#96))
([0338d0c](0338d0c))
* bump ruff from 0.15.4 to 0.15.5
([a49ee46](a49ee46))
* fix M0 audit items
([#66](#66))
([c7724b5](c7724b5))
* pin setup-uv action to full SHA
([#281](#281))
([4448002](4448002))
* post-audit cleanup — PEP 758, loggers, bug fixes, refactoring, tests,
hookify rules
([#148](#148))
([c57a6a9](c57a6a9))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).
Aureliolo added a commit that referenced this pull request Mar 11, 2026
🤖 I have created a release *beep* *boop*
---


##
[0.1.0](v0.0.0...v0.1.0)
(2026-03-11)


### Features

* add autonomy levels and approval timeout policies
([#42](#42),
[#126](#126))
([#197](#197))
([eecc25a](eecc25a))
* add CFO cost optimization service with anomaly detection, reports, and
approval decisions
([#186](#186))
([a7fa00b](a7fa00b))
* add code quality toolchain (ruff, mypy, pre-commit, dependabot)
([#63](#63))
([36681a8](36681a8))
* add configurable cost tiers and subscription/quota-aware tracking
([#67](#67))
([#185](#185))
([9baedfa](9baedfa))
* add container packaging, Docker Compose, and CI pipeline
([#269](#269))
([435bdfe](435bdfe)),
closes [#267](#267)
* add coordination error taxonomy classification pipeline
([#146](#146))
([#181](#181))
([70c7480](70c7480))
* add cost-optimized, hierarchical, and auction assignment strategies
([#175](#175))
([ce924fa](ce924fa)),
closes [#173](#173)
* add design specification, license, and project setup
([8669a09](8669a09))
* add env var substitution and config file auto-discovery
([#77](#77))
([7f53832](7f53832))
* add FastestStrategy routing + vendor-agnostic cleanup
([#140](#140))
([09619cb](09619cb)),
closes [#139](#139)
* add HR engine and performance tracking
([#45](#45),
[#47](#47))
([#193](#193))
([2d091ea](2d091ea))
* add issue auto-search and resolution verification to PR review skill
([#119](#119))
([deecc39](deecc39))
* add mandatory JWT + API key authentication
([#256](#256))
([c279cfe](c279cfe))
* add memory retrieval, ranking, and context injection pipeline
([#41](#41))
([873b0aa](873b0aa))
* add pluggable MemoryBackend protocol with models, config, and events
([#180](#180))
([46cfdd4](46cfdd4))
* add pluggable MemoryBackend protocol with models, config, and events
([#32](#32))
([46cfdd4](46cfdd4))
* add pluggable output scan response policies
([#263](#263))
([b9907e8](b9907e8))
* add pluggable PersistenceBackend protocol with SQLite implementation
([#36](#36))
([f753779](f753779))
* add progressive trust and promotion/demotion subsystems
([#43](#43),
[#49](#49))
([3a87c08](3a87c08))
* add retry handler, rate limiter, and provider resilience
([#100](#100))
([b890545](b890545))
* add SecOps security agent with rule engine, audit log, and ToolInvoker
integration ([#40](#40))
([83b7b6c](83b7b6c))
* add shared org memory and memory consolidation/archival
([#125](#125),
[#48](#48))
([4a0832b](4a0832b))
* design unified provider interface
([#86](#86))
([3e23d64](3e23d64))
* expand template presets, rosters, and add inheritance
([#80](#80),
[#81](#81),
[#84](#84))
([15a9134](15a9134))
* implement agent runtime state vs immutable config split
([#115](#115))
([4cb1ca5](4cb1ca5))
* implement AgentEngine core orchestrator
([#11](#11))
([#143](#143))
([f2eb73a](f2eb73a))
* implement AuditRepository for security audit log persistence
([#279](#279))
([94bc29f](94bc29f))
* implement basic tool system (registry, invocation, results)
([#15](#15))
([c51068b](c51068b))
* implement built-in file system tools
([#18](#18))
([325ef98](325ef98))
* implement communication foundation — message bus, dispatcher, and
messenger ([#157](#157))
([8e71bfd](8e71bfd))
* implement company template system with 7 built-in presets
([#85](#85))
([cbf1496](cbf1496))
* implement conflict resolution protocol
([#122](#122))
([#166](#166))
([e03f9f2](e03f9f2))
* implement core entity and role system models
([#69](#69))
([acf9801](acf9801))
* implement crash recovery with fail-and-reassign strategy
([#149](#149))
([e6e91ed](e6e91ed))
* implement engine extensions — Plan-and-Execute loop and call
categorization
([#134](#134),
[#135](#135))
([#159](#159))
([9b2699f](9b2699f))
* implement enterprise logging system with structlog
([#73](#73))
([2f787e5](2f787e5))
* implement graceful shutdown with cooperative timeout strategy
([#130](#130))
([6592515](6592515))
* implement hierarchical delegation and loop prevention
([#12](#12),
[#17](#17))
([6be60b6](6be60b6))
* implement LiteLLM driver and provider registry
([#88](#88))
([ae3f18b](ae3f18b)),
closes [#4](#4)
* implement LLM decomposition strategy and workspace isolation
([#174](#174))
([aa0eefe](aa0eefe))
* implement meeting protocol system
([#123](#123))
([ee7caca](ee7caca))
* implement message and communication domain models
([#74](#74))
([560a5d2](560a5d2))
* implement model routing engine
([#99](#99))
([d3c250b](d3c250b))
* implement parallel agent execution
([#22](#22))
([#161](#161))
([65940b3](65940b3))
* implement per-call cost tracking service
([#7](#7))
([#102](#102))
([c4f1f1c](c4f1f1c))
* implement personality injection and system prompt construction
([#105](#105))
([934dd85](934dd85))
* implement single-task execution lifecycle
([#21](#21))
([#144](#144))
([c7e64e4](c7e64e4))
* implement subprocess sandbox for tool execution isolation
([#131](#131))
([#153](#153))
([3c8394e](3c8394e))
* implement task assignment subsystem with pluggable strategies
([#172](#172))
([c7f1b26](c7f1b26)),
closes [#26](#26)
[#30](#30)
* implement task decomposition and routing engine
([#14](#14))
([9c7fb52](9c7fb52))
* implement Task, Project, Artifact, Budget, and Cost domain models
([#71](#71))
([81eabf1](81eabf1))
* implement tool permission checking
([#16](#16))
([833c190](833c190))
* implement YAML config loader with Pydantic validation
([#59](#59))
([ff3a2ba](ff3a2ba))
* implement YAML config loader with Pydantic validation
([#75](#75))
([ff3a2ba](ff3a2ba))
* initialize project with uv, hatchling, and src layout
([39005f9](39005f9))
* initialize project with uv, hatchling, and src layout
([#62](#62))
([39005f9](39005f9))
* Litestar REST API, WebSocket feed, and approval queue (M6)
([#189](#189))
([29fcd08](29fcd08))
* make TokenUsage.total_tokens a computed field
([#118](#118))
([c0bab18](c0bab18)),
closes [#109](#109)
* parallel tool execution in ToolInvoker.invoke_all
([#137](#137))
([58517ee](58517ee))
* testing framework, CI pipeline, and M0 gap fixes
([#64](#64))
([f581749](f581749))
* wire all modules into observability system
([#97](#97))
([f7a0617](f7a0617))


### Bug Fixes

* address Greptile post-merge review findings from PRs
[#170](https://github.com/Aureliolo/ai-company/issues/170)-[#175](https://github.com/Aureliolo/ai-company/issues/175)
([#176](#176))
([c5ca929](c5ca929))
* address post-merge review feedback from PRs
[#164](https://github.com/Aureliolo/ai-company/issues/164)-[#167](https://github.com/Aureliolo/ai-company/issues/167)
([#170](#170))
([3bf897a](3bf897a)),
closes [#169](#169)
* enforce strict mypy on test files
([#89](#89))
([aeeff8c](aeeff8c))
* harden Docker sandbox, MCP bridge, and code runner
([#50](#50),
[#53](#53))
([d5e1b6e](d5e1b6e))
* harden git tools security + code quality improvements
([#150](#150))
([000a325](000a325))
* harden subprocess cleanup, env filtering, and shutdown resilience
([#155](#155))
([d1fe1fb](d1fe1fb))
* incorporate post-merge feedback + pre-PR review fixes
([#164](#164))
([c02832a](c02832a))
* pre-PR review fixes for post-merge findings
([#183](#183))
([26b3108](26b3108))
* resolve circular imports, bump litellm, fix release tag format
([#286](#286))
([a6659b5](a6659b5))
* strengthen immutability for BaseTool schema and ToolInvoker boundaries
([#117](#117))
([7e5e861](7e5e861))


### Performance

* harden non-inferable principle implementation
([#195](#195))
([02b5f4e](02b5f4e)),
closes [#188](#188)


### Refactoring

* adopt NotBlankStr across all models
([#108](#108))
([#120](#120))
([ef89b90](ef89b90))
* extract _SpendingTotals base class from spending summary models
([#111](#111))
([2f39c1b](2f39c1b))
* harden BudgetEnforcer with error handling, validation extraction, and
review fixes
([#182](#182))
([c107bf9](c107bf9))
* harden personality profiles, department validation, and template
rendering ([#158](#158))
([10b2299](10b2299))
* pre-PR review improvements for ExecutionLoop + ReAct loop
([#124](#124))
([8dfb3c0](8dfb3c0))
* split events.py into per-domain event modules
([#136](#136))
([e9cba89](e9cba89))


### Documentation

* add ADR-001 memory layer evaluation and selection
([#178](#178))
([db3026f](db3026f)),
closes [#39](#39)
* add agent scaling research findings to DESIGN_SPEC
([#145](#145))
([57e487b](57e487b))
* add CLAUDE.md, contributing guide, and dev documentation
([#65](#65))
([55c1025](55c1025)),
closes [#54](#54)
* add crash recovery, sandboxing, analytics, and testing decisions
([#127](#127))
([5c11595](5c11595))
* address external review feedback with MVP scope and new protocols
([#128](#128))
([3b30b9a](3b30b9a))
* expand design spec with pluggable strategy protocols
([#121](#121))
([6832db6](6832db6))
* finalize 23 design decisions (ADR-002)
([#190](#190))
([8c39742](8c39742))
* update project docs for M2.5 conventions and add docs-consistency
review agent
([#114](#114))
([99766ee](99766ee))


### Tests

* add e2e single agent integration tests
([#24](#24))
([#156](#156))
([f566fb4](f566fb4))
* add provider adapter integration tests
([#90](#90))
([40a61f4](40a61f4))


### CI/CD

* add Release Please for automated versioning and GitHub Releases
([#278](#278))
([a488758](a488758))
* bump actions/checkout from 4 to 6
([#95](#95))
([1897247](1897247))
* bump actions/upload-artifact from 4 to 7
([#94](#94))
([27b1517](27b1517))
* bump anchore/scan-action from 6.5.1 to 7.3.2
([#271](#271))
([80a1c15](80a1c15))
* bump docker/build-push-action from 6.19.2 to 7.0.0
([#273](#273))
([dd0219e](dd0219e))
* bump docker/login-action from 3.7.0 to 4.0.0
([#272](#272))
([33d6238](33d6238))
* bump docker/metadata-action from 5.10.0 to 6.0.0
([#270](#270))
([baee04e](baee04e))
* bump docker/setup-buildx-action from 3.12.0 to 4.0.0
([#274](#274))
([5fc06f7](5fc06f7))
* bump sigstore/cosign-installer from 3.9.1 to 4.1.0
([#275](#275))
([29dd16c](29dd16c))
* harden CI/CD pipeline
([#92](#92))
([ce4693c](ce4693c))
* split vulnerability scans into critical-fail and high-warn tiers
([#277](#277))
([aba48af](aba48af))


### Maintenance

* add /worktree skill for parallel worktree management
([#171](#171))
([951e337](951e337))
* add design spec context loading to research-link skill
([8ef9685](8ef9685))
* add post-merge-cleanup skill
([#70](#70))
([f913705](f913705))
* add pre-pr-review skill and update CLAUDE.md
([#103](#103))
([92e9023](92e9023))
* add research-link skill and rename skill files to SKILL.md
([#101](#101))
([651c577](651c577))
* bump aiosqlite from 0.21.0 to 0.22.1
([#191](#191))
([3274a86](3274a86))
* bump pyyaml from 6.0.2 to 6.0.3 in the minor-and-patch group
([#96](#96))
([0338d0c](0338d0c))
* bump ruff from 0.15.4 to 0.15.5
([a49ee46](a49ee46))
* fix M0 audit items
([#66](#66))
([c7724b5](c7724b5))
* **main:** release ai-company 0.1.1
([#282](#282))
([2f4703d](2f4703d))
* pin setup-uv action to full SHA
([#281](#281))
([4448002](4448002))
* post-audit cleanup — PEP 758, loggers, bug fixes, refactoring, tests,
hookify rules
([#148](#148))
([c57a6a9](c57a6a9))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).

---------

Signed-off-by: Aurelio <19254254+Aureliolo@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement agent performance tracking and metrics collection Implement HR engine (hiring, firing, onboarding, offboarding flows)

2 participants