feat: add HR engine and performance tracking (#45, #47)#193
Conversation
Dependency Review✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.Scanned FilesNone |
|
Caution Review failedThe pull request is closed. ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (65)
📝 WalkthroughSummary by CodeRabbit
WalkthroughAdds a new modular HR package and performance subsystem: hiring, onboarding, offboarding, archival and registry services; pluggable protocols and strategies for reassignment, archival, quality, collaboration, windowing and trend detection; SQLite-backed HR persistence (v2 migration); observability event constants; and extensive unit tests and fixtures. Changes
Sequence Diagram(s)sequenceDiagram
actor User
participant HiringService as "HiringService"
participant ApprovalStore as "ApprovalStore (optional)"
participant AgentRegistry as "AgentRegistryService"
participant Onboarding as "OnboardingService (optional)"
User->>HiringService: create_request(...)
HiringService->>HiringService: persist request
User->>HiringService: generate_candidate(request)
HiringService->>HiringService: append candidate
User->>HiringService: submit_for_approval(request,candidate_id)
alt ApprovalStore configured
HiringService->>ApprovalStore: create approval item
ApprovalStore-->>HiringService: approval result (async)
else auto-approve
HiringService->>HiringService: mark APPROVED
end
User->>HiringService: instantiate_agent(request)
HiringService->>AgentRegistry: register(agent_identity)
AgentRegistry-->>HiringService: registered
opt OnboardingService present
HiringService->>Onboarding: start_onboarding(agent_id)
Onboarding-->>HiringService: checklist
end
HiringService-->>User: AgentIdentity
sequenceDiagram
actor User
participant OffboardingService as "OffboardingService"
participant Reassign as "TaskReassignmentStrategy"
participant Archival as "MemoryArchivalStrategy"
participant MessageBus as "MessageBus (optional)"
participant AgentRegistry as "AgentRegistryService"
User->>OffboardingService: offboard(firing_request)
OffboardingService->>AgentRegistry: get(agent_id)
OffboardingService->>Reassign: reassign(agent_id, active_tasks)
Reassign-->>OffboardingService: updated_tasks
OffboardingService->>Archival: archive(agent_id,...)
Archival-->>OffboardingService: ArchivalResult
alt MessageBus configured
OffboardingService->>MessageBus: publish(notification)
end
OffboardingService->>AgentRegistry: unregister(agent_id)
AgentRegistry-->>OffboardingService: removed
OffboardingService-->>User: OffboardingRecord
sequenceDiagram
participant Tracker as "PerformanceTracker"
participant Quality as "QualityScoringStrategy"
participant Collaboration as "CollaborationScoringStrategy"
participant Window as "MetricsWindowStrategy"
participant Trend as "TrendDetectionStrategy"
Tracker->>Tracker: record_task_metric(record)
Tracker->>Quality: score(agent_id, task_id, task_result, criteria)
Quality-->>Tracker: QualityScoreResult
Tracker->>Tracker: record_collaboration_event(record)
Tracker->>Collaboration: score(agent_id, collab_records)
Collaboration-->>Tracker: CollaborationScoreResult
Tracker->>Window: compute_windows(records, now)
Window-->>Tracker: tuple[WindowMetrics]
Tracker->>Trend: detect(metric_name, values, window_size)
Trend-->>Tracker: TrendResult
Tracker-->>Tracker: assemble AgentPerformanceSnapshot
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes Possibly related PRs
✨ Finishing Touches
🧪 Generate unit tests (beta)
✨ Simplify code
|
Greptile SummaryThis PR adds a complete HR engine (hiring pipeline, onboarding/offboarding orchestration, agent registry) and a performance-tracking subsystem (task metrics, collaboration scoring, Theil-Sen trend detection, multi-window rolling aggregates) backed by a new SQLite schema migration (V2). The overall architecture is sound — protocols are well-defined, error handling is explicit, and the previous round of pre-PR reviews has visibly cleaned up broad exception clauses and lock usage. Key findings:
Confidence Score: 2/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant Caller
participant HiringService
participant AgentRegistryService
participant OnboardingService
participant PerformanceTracker
participant OffboardingService
Note over Caller,OffboardingService: Hiring Pipeline
Caller->>HiringService: create_request(role, dept, level, ...)
HiringService-->>Caller: HiringRequest (PENDING)
Caller->>HiringService: generate_candidate(request)
HiringService-->>Caller: HiringRequest (+ CandidateCard)
Caller->>HiringService: submit_for_approval(request, candidate_id)
alt approval_store present
HiringService->>ApprovalStore: add(ApprovalItem)
else auto-approve
HiringService-->>Caller: HiringRequest (APPROVED)
end
Caller->>HiringService: instantiate_agent(request)
HiringService->>AgentRegistryService: register(identity)
HiringService->>OnboardingService: start_onboarding(agent_id)
HiringService-->>Caller: AgentIdentity (ACTIVE)
Note over Caller,PerformanceTracker: Performance Tracking
Caller->>PerformanceTracker: record_task_metric(record)
PerformanceTracker-->>Caller: TaskMetricRecord (quality_score=None)
Caller->>PerformanceTracker: score_task_quality(task_result, criteria)
PerformanceTracker-->>Caller: TaskMetricRecord (quality_score=X)
Note right of PerformanceTracker: ⚠️ stored record NOT updated
Caller->>PerformanceTracker: get_snapshot(agent_id)
PerformanceTracker-->>Caller: AgentPerformanceSnapshot (quality=None!)
Note over Caller,OffboardingService: Offboarding Pipeline
Caller->>OffboardingService: offboard(FiringRequest)
OffboardingService->>TaskRepository: list_tasks + reassign
OffboardingService->>ArchivalStrategy: archive(agent_id, ...)
OffboardingService->>MessageBus: publish(HR_NOTIFICATION)
OffboardingService->>AgentRegistryService: update_status(TERMINATED)
OffboardingService-->>Caller: OffboardingRecord (memory_archive_id=None)
Last reviewed commit: f2c49f7 |
There was a problem hiding this comment.
Pull request overview
Adds a new HR subsystem (hiring/onboarding/offboarding/registry) plus performance tracking (metrics, scoring, rolling windows, trend detection), and wires persistence/observability support for the new domain.
Changes:
- Introduces
ai_company.hrpackage: lifecycle services (registry, hiring, onboarding, offboarding) + pluggable reassignment/archival strategies. - Introduces
ai_company.hr.performancepackage: metric models, quality/collaboration scoring, multi-window aggregation, Theil–Sen trend detection, and a tracker service. - Extends SQLite persistence schema to v2 and adds SQLite repositories + persistence backend/protocol surface area for HR/performance tables and events.
Reviewed changes
Copilot reviewed 63 out of 65 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/unit/persistence/test_protocol.py | Extends persistence protocol compliance fakes to include HR/performance repos. |
| tests/unit/persistence/test_migrations_v2.py | Adds unit tests validating schema v2 migrations, tables, and indexes. |
| tests/unit/observability/test_events.py | Updates expected observability event modules to include hr and performance. |
| tests/unit/hr/test_registry.py | Adds unit tests for AgentRegistryService behavior. |
| tests/unit/hr/test_queue_return_strategy.py | Adds unit tests for queue-return task reassignment strategy. |
| tests/unit/hr/test_persistence.py | Adds unit tests for HR SQLite repositories round-trips and filters. |
| tests/unit/hr/test_onboarding_service.py | Adds unit tests for onboarding checklist lifecycle and activation. |
| tests/unit/hr/test_offboarding_service.py | Adds unit tests for offboarding pipeline orchestration with fakes. |
| tests/unit/hr/test_models.py | Adds unit tests for HR domain model validation and immutability. |
| tests/unit/hr/test_hiring_service.py | Adds unit tests for hiring pipeline stages and error cases. |
| tests/unit/hr/test_full_snapshot_strategy.py | Adds unit tests for snapshot archival strategy, including promotion behavior. |
| tests/unit/hr/test_errors.py | Adds unit tests for HR error hierarchy. |
| tests/unit/hr/test_enums.py | Adds unit tests for HR enums and expected values. |
| tests/unit/hr/performance/test_theil_sen_strategy.py | Adds unit tests for Theil–Sen trend detection behavior. |
| tests/unit/hr/performance/test_multi_window_strategy.py | Adds unit tests for multi-window aggregation and edge cases. |
| tests/unit/hr/performance/test_ci_quality_strategy.py | Adds unit tests for CI-signal quality scoring strategy. |
| tests/unit/hr/performance/test_behavioral_collaboration_strategy.py | Adds unit tests for behavioral telemetry collaboration scoring. |
| tests/unit/hr/performance/conftest.py | Adds helpers/factories for performance tracking unit tests. |
| tests/unit/hr/performance/init.py | Marks performance unit test package. |
| tests/unit/hr/conftest.py | Adds shared HR fixtures and factory builders for unit tests. |
| tests/unit/hr/init.py | Marks HR unit test package. |
| tests/unit/core/test_enums.py | Updates expected AgentStatus enum member count. |
| tests/unit/communication/test_enums.py | Updates expected MessageType enum member count. |
| tests/unit/api/conftest.py | Extends fake persistence backend with HR/performance repositories. |
| src/ai_company/persistence/sqlite/repositories.py | Updates module docstring to point HR repos to hr_repositories.py. |
| src/ai_company/persistence/sqlite/migrations.py | Bumps schema to v2 and adds v2 migration statements for HR tables/indexes. |
| src/ai_company/persistence/sqlite/hr_repositories.py | Adds SQLite repositories for lifecycle events, task metrics, collaboration metrics. |
| src/ai_company/persistence/sqlite/backend.py | Wires new HR repositories into SQLite persistence backend. |
| src/ai_company/persistence/repositories.py | Re-exports new HR repository protocols. |
| src/ai_company/persistence/protocol.py | Extends PersistenceBackend protocol with HR repository properties. |
| src/ai_company/observability/events/persistence.py | Adds structured event constants for HR persistence operations. |
| src/ai_company/observability/events/performance.py | Adds structured event constants for performance tracking operations. |
| src/ai_company/observability/events/hr.py | Adds structured event constants for HR domain operations. |
| src/ai_company/hr/registry.py | Introduces an in-memory agent registry service with status updates and queries. |
| src/ai_company/hr/reassignment_protocol.py | Adds protocol for task reassignment strategies. |
| src/ai_company/hr/queue_return_strategy.py | Implements queue-return reassignment strategy (interrupt + clear assignee). |
| src/ai_company/hr/persistence_protocol.py | Adds repository protocols for lifecycle events and performance metrics. |
| src/ai_company/hr/performance/window_protocol.py | Adds protocol for rolling-window aggregation strategies. |
| src/ai_company/hr/performance/trend_protocol.py | Adds protocol for trend detection strategies. |
| src/ai_company/hr/performance/tracker.py | Adds in-memory performance tracker for recording/scoring/querying snapshots. |
| src/ai_company/hr/performance/theil_sen_strategy.py | Implements Theil–Sen robust slope trend detector. |
| src/ai_company/hr/performance/quality_protocol.py | Adds protocol for quality scoring strategies. |
| src/ai_company/hr/performance/multi_window_strategy.py | Implements multi-window rolling aggregation (7d/30d/90d, etc.). |
| src/ai_company/hr/performance/models.py | Adds performance tracking Pydantic models (records, results, windows, snapshots). |
| src/ai_company/hr/performance/config.py | Adds performance configuration model (windows, thresholds, weights). |
| src/ai_company/hr/performance/collaboration_protocol.py | Adds protocol for collaboration scoring strategies. |
| src/ai_company/hr/performance/ci_quality_strategy.py | Implements CI-signal-based quality scoring strategy. |
| src/ai_company/hr/performance/behavioral_collaboration_strategy.py | Implements behavioral telemetry collaboration scoring strategy. |
| src/ai_company/hr/performance/init.py | Marks performance package. |
| src/ai_company/hr/onboarding_service.py | Implements onboarding checklist management and activation on completion. |
| src/ai_company/hr/offboarding_service.py | Implements offboarding pipeline orchestration (reassign/archive/notify/terminate). |
| src/ai_company/hr/models.py | Adds HR Pydantic models for hiring/firing/onboarding/offboarding/lifecycle events. |
| src/ai_company/hr/hiring_service.py | Implements hiring pipeline (request → candidates → approval → instantiation). |
| src/ai_company/hr/full_snapshot_strategy.py | Implements full-snapshot memory archival with optional org-memory promotion. |
| src/ai_company/hr/errors.py | Adds HR error hierarchy (hiring/offboarding/registry/performance). |
| src/ai_company/hr/enums.py | Adds HR enums (request status, reasons, steps, event types, trend direction). |
| src/ai_company/hr/archival_protocol.py | Adds protocol + result model for memory archival strategies. |
| src/ai_company/hr/init.py | Marks HR package. |
| src/ai_company/core/enums.py | Adds AgentStatus.ONBOARDING. |
| src/ai_company/communication/enums.py | Adds MessageType.HR_NOTIFICATION. |
| README.md | Updates milestone feature list to include HR engine and performance tracking. |
| DESIGN_SPEC.md | Updates package tree documentation to reflect new HR/performance modules and persistence files. |
| CLAUDE.md | Updates repository overview and logging conventions to include HR/performance event constants. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
src/ai_company/hr/registry.py
Outdated
| name_lower = name.lower() | ||
| for identity in self._agents.values(): | ||
| if str(identity.name).lower() == name_lower: | ||
| return identity | ||
| return None |
There was a problem hiding this comment.
These iterations over self._agents.values() are not protected by self._lock. If another task registers/unregisters concurrently, Python can raise RuntimeError: dictionary changed size during iteration. Consider taking a snapshot of self._agents.values() under the lock (or holding the lock during the loop).
src/ai_company/hr/registry.py
Outdated
| Returns: | ||
| Tuple of active agent identities. | ||
| """ | ||
| return tuple(a for a in self._agents.values() if a.status == AgentStatus.ACTIVE) |
There was a problem hiding this comment.
list_active() reads/iterates registry state without holding self._lock, so concurrent register()/unregister() can trigger RuntimeError: dictionary changed size during iteration or return inconsistent results. Consider snapshotting under the lock (or holding the lock while building the tuple).
src/ai_company/hr/registry.py
Outdated
| dept_lower = department.lower() | ||
| return tuple( | ||
| a for a in self._agents.values() if str(a.department).lower() == dept_lower | ||
| ) |
There was a problem hiding this comment.
list_by_department() iterates self._agents.values() without self._lock. Concurrent writes can raise RuntimeError: dictionary changed size during iteration. Consider snapshotting the values under the lock before filtering.
src/ai_company/hr/hiring_service.py
Outdated
| except Exception as exc: | ||
| msg = f"Failed to instantiate agent for request {request.id!r}" | ||
| logger.exception( | ||
| HR_HIRING_INSTANTIATION_FAILED, | ||
| request_id=str(request.id), | ||
| error=str(exc), | ||
| ) | ||
| raise HiringError(msg) from exc |
There was a problem hiding this comment.
Catching a blanket Exception here will also swallow unexpected programmer errors (e.g., AttributeError/TypeError) and re-raise them as HiringError, which can make debugging harder. Consider catching the specific exception types you expect from AgentIdentity/ModelConfig validation and registry.register() (e.g., Pydantic validation errors, AgentAlreadyRegisteredError, etc.), and let genuinely unexpected exceptions propagate.
| def __init__( | ||
| self, | ||
| *, | ||
| quality_strategy: QualityScoringStrategy, | ||
| collaboration_strategy: CollaborationScoringStrategy, | ||
| window_strategy: MetricsWindowStrategy, | ||
| trend_strategy: TrendDetectionStrategy, | ||
| config: PerformanceConfig | None = None, | ||
| ) -> None: | ||
| self._quality_strategy = quality_strategy | ||
| self._collaboration_strategy = collaboration_strategy | ||
| self._window_strategy = window_strategy | ||
| self._trend_strategy = trend_strategy | ||
| self._config = config or PerformanceConfig() | ||
| self._task_metrics: dict[str, list[TaskMetricRecord]] = {} | ||
| self._collab_metrics: dict[str, list[CollaborationMetricRecord]] = {} | ||
|
|
There was a problem hiding this comment.
PerformanceConfig includes windows, improving_threshold, declining_threshold, and collaboration_weights, but this tracker only uses config.min_data_points. This can lead to confusing/ineffective configuration (changing thresholds/windows has no effect). Either wire these config values into the injected strategies (or construct default strategies from config) or remove the unused fields/config parameter from PerformanceTracker.
| # Compute all pairwise slopes. | ||
| slopes: list[float] = [] | ||
| for (t1, v1), (t2, v2) in combinations(values, 2): | ||
| dt_days = (t2.timestamp() - t1.timestamp()) / _SECONDS_PER_DAY | ||
| if abs(dt_days) < _MIN_DELTA_DAYS: | ||
| continue | ||
| slope = (v2 - v1) / dt_days | ||
| slopes.append(slope) |
There was a problem hiding this comment.
Theil-Sen slope calculation depends on time ordering. combinations(values, 2) uses the incoming order, so if values isn’t sorted by timestamp you can get negative dt_days and inverted slopes, which can flip the detected trend direction. Consider sorting by timestamp (or otherwise ensuring x2 > x1) before computing slopes.
| org_category = _CATEGORY_MAP[entry.category] | ||
| author = OrgFactAuthor(agent_id=agent_id) | ||
| write_req = OrgFactWriteRequest( | ||
| content=NotBlankStr(entry.content), | ||
| category=org_category, | ||
| ) | ||
| await org_memory_backend.write(write_req, author=author) |
There was a problem hiding this comment.
Org memory promotion appears to be effectively disabled: OrgFactAuthor(agent_id=agent_id) will fail validation because non-human authors require both agent_id and seniority (see OrgFactAuthor validator). Since the exception is caught, promotable entries will be skipped and promoted_count will stay 0, contradicting the docstring. Pass an author with seniority (e.g., derive from AgentIdentity/registry) or change the promotion API to accept the author explicitly.
There was a problem hiding this comment.
Actionable comments posted: 30
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
tests/unit/communication/test_enums.py (1)
19-31:⚠️ Potential issue | 🟡 MinorAdd assertion for the new
HR_NOTIFICATIONvalue.The member count was updated to 10, but
test_valuesdoesn't verify the newHR_NOTIFICATIONmember. For consistency with other enum tests in this file (which verify all member values), add the missing assertion.💚 Proposed fix
def test_values(self) -> None: assert MessageType.TASK_UPDATE.value == "task_update" assert MessageType.QUESTION.value == "question" assert MessageType.ANNOUNCEMENT.value == "announcement" assert MessageType.REVIEW_REQUEST.value == "review_request" assert MessageType.APPROVAL.value == "approval" assert MessageType.DELEGATION.value == "delegation" assert MessageType.STATUS_REPORT.value == "status_report" assert MessageType.ESCALATION.value == "escalation" assert MessageType.MEETING_CONTRIBUTION.value == "meeting_contribution" + assert MessageType.HR_NOTIFICATION.value == "hr_notification"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/unit/communication/test_enums.py` around lines 19 - 31, The test suite added a new enum member but test_values doesn't assert it; update test_values in tests/unit/communication/test_enums.py to include an assertion that MessageType.HR_NOTIFICATION.value == "hr_notification" (alongside the other MessageType assertions) so the new member is verified and the member count matches test_member_count.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@DESIGN_SPEC.md`:
- Line 2827: The structure map entry incorrectly points to hr.enums for
AgentStatus; update the map so the AgentStatus reference points to the module
where it is actually defined (core.enums) instead of hr.enums, i.e., replace the
AgentStatus module reference in the map from hr.enums to core.enums so readers
are directed to the correct definition.
In `@src/ai_company/hr/archival_protocol.py`:
- Around line 51-70: The archive method's docstring is missing a Raises section
documenting MemoryArchivalError; update the docstring for async def archive (in
archival_protocol.py) to include a "Raises:" block that lists
MemoryArchivalError (and when it is raised, e.g., on archival failure or backend
errors) to match TaskReassignmentStrategy's style and the project's errors.py
hierarchy so callers know to handle this exception.
In `@src/ai_company/hr/full_snapshot_strategy.py`:
- Around line 65-186: The archive() method is too long and should be broken into
helpers; extract the phases into private methods to improve readability and
testability: implement a _retrieve_entries(agent_id, memory_backend) that wraps
memory_backend.retrieve and error handling, an _archive_entries(entries,
archival_store, agent_id, now) that constructs ArchivalEntry, calls
archival_store.archive and returns (archived_count, deleted_ids), a
_promote_to_org(entries, org_memory_backend, agent_id) that handles category
mapping via _CATEGORY_MAP, OrgFactAuthor/OrgFactWriteRequest and returns
promoted_count, and a _clean_hot_store(deleted_ids, memory_backend, agent_id)
that deletes ids and returns hot_store_cleaned; update archive() to call these
helpers and assemble the ArchivalResult while preserving existing logging and
exception behavior (use function names: archive, _retrieve_entries,
_archive_entries, _promote_to_org, _clean_hot_store, ArchivalEntry,
archival_store.archive, org_memory_backend.write, memory_backend.delete).
- Around line 156-169: The delete loop is re-wrapping already-validated IDs with
NotBlankStr; remove the redundant wrap and pass memory_id directly to
memory_backend.delete from the hot store cleanup loop (symbols: deleted_ids,
memory_id, memory_backend.delete, NotBlankStr, hot_store_cleaned); ensure
consistency with how deleted_ids is populated (either keep deleted_ids as
list[NotBlankStr] and pass the elements unchanged, or keep them as list[str] and
pass the raw str) and update the code that builds deleted_ids accordingly so
validation happens only once when entries are read from the backend.
In `@src/ai_company/hr/hiring_service.py`:
- Around line 318-329: The new AgentIdentity is not copying the candidate's
skills, causing AgentIdentity.skills to remain the default empty SkillSet;
update the AgentIdentity construction (AgentIdentity(...)) to set its skills
from the CandidateCard (e.g., use candidate.skills or convert candidate.skills
into the expected SkillSet type) so the hired agent retains the
CandidateCard.skills for downstream skill-based routing and gap-filling logic.
Ensure any necessary conversion or validation is applied so the types match
between CandidateCard.skills and AgentIdentity.skills.
- Around line 129-168: The generate_candidate method is mutating _requests using
the caller-supplied HiringRequest snapshot (request) which can cause
lost-updates; change it to re-load the authoritative request from self._requests
by id (self._requests[str(request.id)]) before creating the CandidateCard, then
apply changes to that fresh instance (use model_copy or a helper like
_apply_update_to_request) and write back into self._requests; encapsulate the
read-modify-write in a small helper (e.g., _get_current_request and
_save_updated_request) and use those helpers here (and similarly in the other
methods flagged at lines 170-244) to avoid overwriting concurrent updates and
preserve existing candidates/approval fields.
- Around line 327-349: The code always creates the new identity with
AgentStatus.ONBOARDING which leaves agents stuck if no onboarding service exists
or if start_onboarding fails; fix by: when instantiating the identity choose
status = AgentStatus.ONBOARDING only if self._onboarding_service is not None,
otherwise use AgentStatus.ACTIVE; additionally wrap the await
self._onboarding_service.start_onboarding(...) call in a try/except so that if
start_onboarding raises you perform compensation: call the registry
rollback/unregister method (e.g., self._registry.unregister(identity.id) or
appropriate removal), remove or revert the request entry in self._requests, and
raise a HiringError from the exception; reference symbols:
AgentStatus.ONBOARDING, AgentStatus.ACTIVE, _onboarding_service,
_registry.register, start_onboarding, self._requests,
HiringRequestStatus.INSTANTIATED.
- Line 154: The assignment for estimated_monthly_cost currently uses
"request.budget_limit_monthly or 50.0", which treats 0.0 as missing; change it
to preserve an explicit 0.0 by using a None-check (e.g., use
request.budget_limit_monthly if request.budget_limit_monthly is not None else
50.0) so estimated_monthly_cost uses the provided 0.0 but falls back to 50.0
only when budget_limit_monthly is None; update the expression where
estimated_monthly_cost is set (referencing request.budget_limit_monthly and
estimated_monthly_cost) accordingly.
In `@src/ai_company/hr/models.py`:
- Around line 270-273: Replace the plain string type on the tasks_reassigned
field with the NotBlankStr type so identifiers are validated: change the
annotation from tuple[str, ...] to tuple[NotBlankStr, ...] on the
tasks_reassigned Field, keep the default=() and description, and add an import
for NotBlankStr from core.types if it’s not already imported; ensure any type
hints/usages that rely on tasks_reassigned reflect the new NotBlankStr element
type.
In `@src/ai_company/hr/offboarding_service.py`:
- Around line 109-114: The warning logs the wrong event constant when the agent
lookup fails: replace the HR_FIRING_INITIATED constant used in logger.warning
with HR_FIRING_COMPLETE and include the error context (msg) as before; locate
the agent lookup around the self._registry.get(agent_id) call where identity is
checked, update the logger.warning call that currently references
HR_FIRING_INITIATED to use HR_FIRING_COMPLETE, keeping agent_id and error=msg,
then raise AgentNotFoundError as before.
In `@src/ai_company/hr/onboarding_service.py`:
- Around line 46-78: Check registry membership for agent_id at the start of
start_onboarding and reject missing/invalid agents before creating the
OnboardingChecklist; add a lookup (e.g., via existing registry/service used
elsewhere) and raise OnboardingError with a clear message if not found. Extract
the finalization logic that persists/completes a checklist into a small helper
(e.g., _finalize_checklist or finalize_checklist_write) and call it from
complete_step/update_status so the write-after-await path isn't embedded in the
long method. Ensure start_onboarding uses the registry check and only constructs
and stores the checklist after validation, and update
complete_step/update_status to call the new helper to persist the final
checklist.
In `@src/ai_company/hr/performance/multi_window_strategy.py`:
- Around line 111-159: The code currently computes success_rate for any non-zero
count even when the window is below the minimum sample size; change it so
success_rate is only computed and returned when has_enough is True (otherwise
set it to None). Concretely, remove or stop using the unconditional success_rate
= completed / count assignment and instead compute success_rate inside the
has_enough branch (or set success_rate = None when not has_enough), then pass
round(success_rate, 4) only when success_rate is not None in the WindowMetrics
constructor (refer to the has_enough variable, the success_rate name, and the
final WindowMetrics return block to locate the change).
In `@src/ai_company/hr/performance/quality_protocol.py`:
- Around line 25-27: Change the name property return type from plain str to
NotBlankStr to enforce non-blank strategy names: update the annotation on the
name property in the protocol (def name(self) -> NotBlankStr), add the required
import from core.types (NotBlankStr), and ensure any callers/implementations and
tests return/consume NotBlankStr-compatible values (matching existing
QualityScoreResult.strategy_name usage).
In `@src/ai_company/hr/performance/tracker.py`:
- Around line 302-324: get_collaboration_metrics currently only supports a since
filter while get_task_metrics supports both since and until; add an optional
until: AwareDatetime | None = None parameter to get_collaboration_metrics,
update the docstring to document it, and apply the same filter logic used in
get_task_metrics (filter records by recorded_at >= since and recorded_at <=
until when those args are provided) using the same _collab_metrics traversal
(preserve behavior when agent_id is provided); ensure the default remains None
for backward compatibility and update any callers/tests that expect the new
signature.
- Around line 221-272: _compute_trends contains duplicated logic for building
(timestamp, value) tuples and calling self._trend_strategy.detect for
quality_score and cost_usd; extract that into a small helper (e.g.,
_compute_metric_trend or compute_and_append_trend) that accepts metric_name
(NotBlankStr), a value extractor (lambda or attr name) and window.window_size,
builds the values tuple from window_records, calls self._trend_strategy.detect
and appends the result when values exist; update _compute_trends to call this
helper for "quality_score" (skip None quality_score) and "cost_usd" to remove
the duplicated blocks while preserving filtering, window_size param and
appending behavior.
In `@src/ai_company/hr/queue_return_strategy.py`:
- Around line 75-87: The code uses the same event constant
HR_FIRING_TASKS_REASSIGNED for both the failure warning and the success info
log, which harms observability; update the failure path in the logger.warning
call inside the except block to use a distinct constant (e.g.,
HR_FIRING_TASK_REASSIGN_FAILED) or create that constant if it doesn't exist,
keep the same structured fields (agent_id, task_id, error=msg), and leave the
raise TaskReassignmentError(msg) from exc unchanged so success logs (logger.info
with HR_FIRING_TASKS_REASSIGNED) remain distinct from failures.
In `@src/ai_company/hr/reassignment_protocol.py`:
- Around line 27-42: The protocol method reassign lacks a Raises docstring
entry; update its docstring to document that implementations (e.g.,
QueueReturnStrategy) may raise TaskReassignmentError when task state transitions
fail—add a Raises section to reassign describing TaskReassignmentError, when it
is raised (on transition failures/failed clearing of assigned_to) and any
conditions or guarantees callers can expect.
In `@src/ai_company/hr/registry.py`:
- Around line 39-47: The constructor stores a MessageBus in self._message_bus
but register(), unregister(), and update_status() only log events; integrate the
bus so lifecycle changes are published: in AgentRegistry.register,
AgentRegistry.unregister, and AgentRegistry.update_status (and any helper that
mutates self._agents) after acquiring self._lock and updating self._agents, call
self._message_bus.publish/send (guarding for None) with a concise event object
(include event type like
"agent_registered"/"agent_unregistered"/"agent_status_updated", the
AgentIdentity and new status) so callers wired with a MessageBus receive
notifications; preserve existing locking and error handling and make publishing
best-effort (log but don’t raise if bus fails).
In `@src/ai_company/persistence/sqlite/hr_repositories.py`:
- Around line 307-343: The CollaborationMetricRepository.query method lacks a
deterministic ordering; update the SQL in query (method name: query of
CollaborationMetricRepository) to append an ORDER BY clause (e.g., ORDER BY
recorded_at ASC, id ASC) after the WHERE block (or on the base SELECT when no
WHERE) so results are consistently ordered before executing self._db.execute;
keep parameters and error handling unchanged and still convert rows via
self._row_to_record.
- Around line 204-244: The TaskMetricRepository.query method (and the analogous
method in CollaborationMetricRepository) returns rows without a deterministic
order; update the SQL constructed in TaskMetricRepository.query to append an
ORDER BY clause (e.g., ORDER BY completed_at DESC, id DESC or your chosen
deterministic columns) when building the final sql string so results are
consistently ordered, ensure the clause is added after the WHERE when clauses
exist (in the same place where sql is currently appended), and keep the existing
parameter handling and error logging (PERSISTENCE_TASK_METRIC_QUERY_FAILED /
PERSISTENCE_TASK_METRIC_QUERIED) unchanged.
In `@src/ai_company/persistence/sqlite/migrations.py`:
- Around line 123-140: Add composite indexes so agent+time predicates are
covered: create "idx_tm_agent_id_completed_at" on task_metrics(agent_id,
completed_at) and "idx_cm_agent_id_recorded_at" on
collaboration_metrics(agent_id, recorded_at). Insert these CREATE INDEX IF NOT
EXISTS statements into the migrations list alongside the existing single-column
indexes (near idx_tm_agent_id / idx_tm_completed_at and idx_cm_agent_id /
idx_cm_recorded_at) so TaskMetricRepository.query and
CollaborationMetricRepository.query (which filter by agent_id plus time range)
can use the composite indexes; ensure the column order is agent_id then the
timestamp (completed_at / recorded_at).
In `@tests/unit/api/conftest.py`:
- Around line 116-138: The list_events implementation in
FakeLifecycleEventRepository ignores the since parameter; update list_events
(and the in-memory filtering logic in the FakeLifecycleEventRepository class) to
filter events by their timestamp/creation time: when since is provided, only
include events whose event.timestamp (or created_at if your event model uses
that attribute) is >= since; handle both datetime and numeric timestamps
consistently (convert types as needed) and ensure the result variable is updated
before returning so time-based tests behave correctly.
- Around line 162-180: The query method of FakeCollaborationMetricRepository
currently ignores the since parameter; update query (in class
FakeCollaborationMetricRepository, method query) to filter self._records by the
since timestamp when provided (e.g., keep only records where record.created_at
or record.timestamp >= since), in addition to the existing agent_id filter, and
return the filtered tuple; ensure you handle both datetime and string/ISO inputs
consistently (convert/compare as appropriate) and don't modify other method
signatures.
- Around line 140-159: The query method of FakeTaskMetricRepository currently
ignores the since and until parameters; update the
FakeTaskMetricRepository.query implementation to apply time-range filtering
after the agent_id filter by comparing each record's timestamp field (e.g.,
record.timestamp or record.created_at — use the actual timestamp attribute used
by your task metric objects) against the since and until values, only including
records where (since is None or record.timestamp >= since) and (until is None or
record.timestamp <= until); keep the existing agent_id filter and return a tuple
of the filtered list so tests relying on time-based filtering behave correctly.
In `@tests/unit/core/test_enums.py`:
- Around line 42-43: Update the test_agent_status_values test to include an
assertion for the new AgentStatus.ONBOARDING member: locate the
test_agent_status_values function and add one more assertion that checks
AgentStatus.ONBOARDING exists and equals the expected enum value using the same
assertion style as the other values in that test (i.e., mirror how
AgentStatus.ACTIVE/INACTIVE/OTHER are asserted to ensure consistency).
In `@tests/unit/hr/performance/test_behavioral_collaboration_strategy.py`:
- Around line 81-106: Update the docstring on test_all_components_none to match
the assertions: replace the incorrect "All optional components None -> neutral
5.0, confidence 0.0." with a description that reflects the actual scenario and
expected values, e.g., "Only loop_prevention present -> score 10.0, confidence
0.1." in the test_all_components_none test function (referencing the use of
make_collab_metric, NOW, and the call to strategy.score with
NotBlankStr("agent-001")).
In `@tests/unit/hr/performance/test_ci_quality_strategy.py`:
- Around line 129-157: Merge the two separate tests
test_cost_efficiency_zero_cost and test_cost_efficiency_high_cost into the
existing parametrized test_cost_efficiency_parametrized: add parameter rows for
(cost_usd=0.0, expected=10.0) and (cost_usd=15.0, expected=0.0), use the same
setup via self._make_strategy() and make_task_metric, call
strategy.score(agent_id=NotBlankStr("agent-001"),
task_id=NotBlankStr("task-001"), task_result=task_result,
acceptance_criteria=()), and assert dict(result.breakdown)["cost_efficiency"] ==
expected; then remove the now-duplicate test_cost_efficiency_zero_cost and
test_cost_efficiency_high_cost (and the other duplicate pair noted) so only the
parametrized test covers these cases.
In `@tests/unit/hr/performance/test_tracker.py`:
- Around line 253-285: The test test_snapshot_with_windows_and_trends asserts
trends are present despite only recording one TaskMetricRecord and not supplying
the time-series that TrendStrategy.detect uses; either (A) make the scenario
satisfy the trend-detection threshold by adding enough TaskMetricRecord entries
(call tracker.record_task_metric multiple times at different timestamps so
WindowMetrics.data_point_count and TrendStrategy.detect receive sufficient
samples) before calling tracker.get_snapshot, or (B) change the assertion to
expect no trends (assert snapshot.trends == ()) to verify the threshold guard;
update references in the test around
MockWindowStrategy/WindowMetrics.data_point_count, tracker.record_task_metric,
and tracker.get_snapshot accordingly.
In `@tests/unit/hr/test_offboarding_service.py`:
- Around line 38-52: The test double list_tasks in
tests/unit/hr/test_offboarding_service.py declares assigned_to: str | None but
OffboardingService calls it with NotBlankStr(agent_id); change the signature to
assigned_to: NotBlankStr | None to match the expected type, add the appropriate
import for NotBlankStr from the module where it’s defined, and update any
related type hints in that file so the fake ListTasks function signature aligns
with the OffboardingService usage (function name: list_tasks).
In `@tests/unit/persistence/test_protocol.py`:
- Around line 76-117: Add protocol-compliance tests for the new fake repos so
they assert isinstance against the repository protocols: add three tests in the
TestProtocolCompliance suite that check _FakeLifecycleEventRepository() is a
LifecycleEventRepository, _FakeTaskMetricRepository() is a TaskMetricRepository,
and _FakeCollaborationMetricRepository() is a CollaborationMetricRepository;
import the protocols (LifecycleEventRepository, TaskMetricRepository,
CollaborationMetricRepository) and place the assertions alongside the existing
protocol tests (e.g., the block around TestProtocolCompliance where other
isinstance checks live).
---
Outside diff comments:
In `@tests/unit/communication/test_enums.py`:
- Around line 19-31: The test suite added a new enum member but test_values
doesn't assert it; update test_values in tests/unit/communication/test_enums.py
to include an assertion that MessageType.HR_NOTIFICATION.value ==
"hr_notification" (alongside the other MessageType assertions) so the new member
is verified and the member count matches test_member_count.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: e95b47f7-a83d-4db9-853b-3da1d0d5e8a2
📒 Files selected for processing (65)
CLAUDE.mdDESIGN_SPEC.mdREADME.mdsrc/ai_company/communication/enums.pysrc/ai_company/core/enums.pysrc/ai_company/hr/__init__.pysrc/ai_company/hr/archival_protocol.pysrc/ai_company/hr/enums.pysrc/ai_company/hr/errors.pysrc/ai_company/hr/full_snapshot_strategy.pysrc/ai_company/hr/hiring_service.pysrc/ai_company/hr/models.pysrc/ai_company/hr/offboarding_service.pysrc/ai_company/hr/onboarding_service.pysrc/ai_company/hr/performance/__init__.pysrc/ai_company/hr/performance/behavioral_collaboration_strategy.pysrc/ai_company/hr/performance/ci_quality_strategy.pysrc/ai_company/hr/performance/collaboration_protocol.pysrc/ai_company/hr/performance/config.pysrc/ai_company/hr/performance/models.pysrc/ai_company/hr/performance/multi_window_strategy.pysrc/ai_company/hr/performance/quality_protocol.pysrc/ai_company/hr/performance/theil_sen_strategy.pysrc/ai_company/hr/performance/tracker.pysrc/ai_company/hr/performance/trend_protocol.pysrc/ai_company/hr/performance/window_protocol.pysrc/ai_company/hr/persistence_protocol.pysrc/ai_company/hr/queue_return_strategy.pysrc/ai_company/hr/reassignment_protocol.pysrc/ai_company/hr/registry.pysrc/ai_company/observability/events/hr.pysrc/ai_company/observability/events/performance.pysrc/ai_company/observability/events/persistence.pysrc/ai_company/persistence/protocol.pysrc/ai_company/persistence/repositories.pysrc/ai_company/persistence/sqlite/backend.pysrc/ai_company/persistence/sqlite/hr_repositories.pysrc/ai_company/persistence/sqlite/migrations.pysrc/ai_company/persistence/sqlite/repositories.pytests/unit/api/conftest.pytests/unit/communication/test_enums.pytests/unit/core/test_enums.pytests/unit/hr/__init__.pytests/unit/hr/conftest.pytests/unit/hr/performance/__init__.pytests/unit/hr/performance/conftest.pytests/unit/hr/performance/test_behavioral_collaboration_strategy.pytests/unit/hr/performance/test_ci_quality_strategy.pytests/unit/hr/performance/test_models.pytests/unit/hr/performance/test_multi_window_strategy.pytests/unit/hr/performance/test_theil_sen_strategy.pytests/unit/hr/performance/test_tracker.pytests/unit/hr/test_enums.pytests/unit/hr/test_errors.pytests/unit/hr/test_full_snapshot_strategy.pytests/unit/hr/test_hiring_service.pytests/unit/hr/test_models.pytests/unit/hr/test_offboarding_service.pytests/unit/hr/test_onboarding_service.pytests/unit/hr/test_persistence.pytests/unit/hr/test_queue_return_strategy.pytests/unit/hr/test_registry.pytests/unit/observability/test_events.pytests/unit/persistence/test_migrations_v2.pytests/unit/persistence/test_protocol.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: CI Pass
- GitHub Check: Agent
- GitHub Check: Greptile Review
🧰 Additional context used
📓 Path-based instructions (4)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Nofrom __future__ import annotations— Python 3.14 has PEP 649 native lazy annotations
Use PEP 758 except syntax: useexcept A, B:(no parentheses) — ruff enforces this on Python 3.14
All public functions must have type hints; mypy strict mode is enforced
Public classes and functions must have Google-style docstrings (enforced by ruff D rules)
Every module with business logic must include logging setup:from ai_company.observability import get_loggerthenlogger = get_logger(__name__). Never useimport loggingorprint()in application code. Variable name must always belogger
Use structured logging with event name constants:logger.info(EVENT_CONSTANT, key=value)— never use string interpolation likelogger.info('msg %s', val). Import event constants directly fromai_company.observability.events.<domain>
All error paths must log at WARNING or ERROR with context before raising; all state transitions must log at INFO; DEBUG for object creation, internal flow, and entry/exit of key functions
Line length maximum is 88 characters (enforced by ruff)
Functions must be < 50 lines; files must be < 800 lines
Handle errors explicitly; never silently swallow exceptions
Validate at system boundaries: user input, external APIs, and config files. Do not validate unnecessarily at internal boundaries
Files:
src/ai_company/hr/performance/window_protocol.pysrc/ai_company/persistence/protocol.pytests/unit/hr/test_hiring_service.pysrc/ai_company/observability/events/performance.pytests/unit/hr/performance/test_tracker.pysrc/ai_company/core/enums.pysrc/ai_company/hr/performance/__init__.pytests/unit/hr/performance/conftest.pytests/unit/hr/test_errors.pytests/unit/hr/performance/test_behavioral_collaboration_strategy.pytests/unit/hr/performance/test_ci_quality_strategy.pytests/unit/hr/performance/test_models.pysrc/ai_company/persistence/sqlite/migrations.pysrc/ai_company/hr/performance/ci_quality_strategy.pytests/unit/hr/test_offboarding_service.pysrc/ai_company/hr/queue_return_strategy.pysrc/ai_company/hr/hiring_service.pysrc/ai_company/hr/performance/trend_protocol.pysrc/ai_company/communication/enums.pytests/unit/hr/test_enums.pysrc/ai_company/hr/performance/config.pysrc/ai_company/hr/persistence_protocol.pysrc/ai_company/hr/enums.pysrc/ai_company/hr/performance/behavioral_collaboration_strategy.pysrc/ai_company/observability/events/persistence.pysrc/ai_company/hr/performance/collaboration_protocol.pytests/unit/hr/performance/test_theil_sen_strategy.pysrc/ai_company/persistence/repositories.pytests/unit/hr/test_onboarding_service.pysrc/ai_company/hr/performance/theil_sen_strategy.pysrc/ai_company/hr/offboarding_service.pysrc/ai_company/persistence/sqlite/backend.pysrc/ai_company/hr/models.pysrc/ai_company/hr/full_snapshot_strategy.pysrc/ai_company/hr/registry.pytests/unit/hr/test_models.pysrc/ai_company/persistence/sqlite/repositories.pysrc/ai_company/hr/errors.pytests/unit/hr/test_queue_return_strategy.pytests/unit/communication/test_enums.pytests/unit/core/test_enums.pytests/unit/hr/performance/test_multi_window_strategy.pysrc/ai_company/hr/reassignment_protocol.pysrc/ai_company/hr/onboarding_service.pysrc/ai_company/hr/archival_protocol.pysrc/ai_company/persistence/sqlite/hr_repositories.pytests/unit/hr/test_registry.pytests/unit/persistence/test_migrations_v2.pysrc/ai_company/hr/__init__.pytests/unit/hr/conftest.pytests/unit/hr/test_full_snapshot_strategy.pysrc/ai_company/observability/events/hr.pytests/unit/api/conftest.pysrc/ai_company/hr/performance/quality_protocol.pysrc/ai_company/hr/performance/models.pytests/unit/observability/test_events.pytests/unit/hr/test_persistence.pytests/unit/persistence/test_protocol.pysrc/ai_company/hr/performance/multi_window_strategy.pysrc/ai_company/hr/performance/tracker.py
src/ai_company/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/ai_company/**/*.py: Use Pydantic v2 with frozen models for config/identity; use separate mutable-via-copy models (withmodel_copy(update=...)) for runtime state that evolves. Never mix static config fields with mutable runtime fields in one model
Use@computed_fieldfor derived values instead of storing + validating redundant fields (e.g.TokenUsage.total_tokens); useNotBlankStrfromcore.typesfor all identifier/name fields—including optional (NotBlankStr | None) and tuple variants—instead of manual whitespace validators
Create new objects for immutability; never mutate existing ones. For non-Pydantic internal collections (registries,BaseTool), usecopy.deepcopy()at construction +MappingProxyTypefor read-only enforcement. Fordict/listfields in frozen Pydantic models, rely onfrozen=Trueandcopy.deepcopy()at system boundaries (tool execution, LLM provider serialization, inter-agent delegation, persistence serialization)
Preferasyncio.TaskGroupfor fan-out/fan-in parallel operations in new code (multiple tool invocations, parallel agent calls). Prefer structured concurrency over barecreate_task
Files:
src/ai_company/hr/performance/window_protocol.pysrc/ai_company/persistence/protocol.pysrc/ai_company/observability/events/performance.pysrc/ai_company/core/enums.pysrc/ai_company/hr/performance/__init__.pysrc/ai_company/persistence/sqlite/migrations.pysrc/ai_company/hr/performance/ci_quality_strategy.pysrc/ai_company/hr/queue_return_strategy.pysrc/ai_company/hr/hiring_service.pysrc/ai_company/hr/performance/trend_protocol.pysrc/ai_company/communication/enums.pysrc/ai_company/hr/performance/config.pysrc/ai_company/hr/persistence_protocol.pysrc/ai_company/hr/enums.pysrc/ai_company/hr/performance/behavioral_collaboration_strategy.pysrc/ai_company/observability/events/persistence.pysrc/ai_company/hr/performance/collaboration_protocol.pysrc/ai_company/persistence/repositories.pysrc/ai_company/hr/performance/theil_sen_strategy.pysrc/ai_company/hr/offboarding_service.pysrc/ai_company/persistence/sqlite/backend.pysrc/ai_company/hr/models.pysrc/ai_company/hr/full_snapshot_strategy.pysrc/ai_company/hr/registry.pysrc/ai_company/persistence/sqlite/repositories.pysrc/ai_company/hr/errors.pysrc/ai_company/hr/reassignment_protocol.pysrc/ai_company/hr/onboarding_service.pysrc/ai_company/hr/archival_protocol.pysrc/ai_company/persistence/sqlite/hr_repositories.pysrc/ai_company/hr/__init__.pysrc/ai_company/observability/events/hr.pysrc/ai_company/hr/performance/quality_protocol.pysrc/ai_company/hr/performance/models.pysrc/ai_company/hr/performance/multi_window_strategy.pysrc/ai_company/hr/performance/tracker.py
**/*.md
📄 CodeRabbit inference engine (CLAUDE.md)
DESIGN_SPEC.md is MANDATORY reading before implementing any feature. The spec is the starting point for architecture, data models, and behavior. If implementation deviates from the spec, alert the user and explain why — user decides whether to proceed or update the spec. Do not silently diverge. When spec sections are referenced (e.g. 'Section 10.2'), read that section verbatim before coding. When approved deviations occur, update DESIGN_SPEC.md to reflect the new reality
Files:
CLAUDE.mdREADME.mdDESIGN_SPEC.md
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/**/*.py: Use pytest markers:@pytest.mark.unit,@pytest.mark.integration,@pytest.mark.e2e,@pytest.mark.slow. Minimum coverage 80% (enforced in CI). Async mode:asyncio_mode = 'auto'— no manual@pytest.mark.asyncioneeded. Per-test timeout: 30 seconds. Usepytest-xdistvia-n autofor parallelism. Prefer@pytest.mark.parametrizefor testing similar cases
NEVER use real vendor names (Anthropic, OpenAI, Claude, GPT, etc.) in project-owned code, docstrings, comments, tests, or config examples. Use generic names:example-provider,example-large-001,example-medium-001,example-small-001,large/medium/smallas aliases. Tests must usetest-provider,test-small-001, etc. Vendor names may only appear in: (1) DESIGN_SPEC.md provider list, (2).claude/files, (3) third-party import paths
Files:
tests/unit/hr/test_hiring_service.pytests/unit/hr/performance/test_tracker.pytests/unit/hr/performance/conftest.pytests/unit/hr/test_errors.pytests/unit/hr/performance/test_behavioral_collaboration_strategy.pytests/unit/hr/performance/test_ci_quality_strategy.pytests/unit/hr/performance/test_models.pytests/unit/hr/test_offboarding_service.pytests/unit/hr/test_enums.pytests/unit/hr/performance/test_theil_sen_strategy.pytests/unit/hr/test_onboarding_service.pytests/unit/hr/test_models.pytests/unit/hr/test_queue_return_strategy.pytests/unit/communication/test_enums.pytests/unit/core/test_enums.pytests/unit/hr/performance/test_multi_window_strategy.pytests/unit/hr/test_registry.pytests/unit/persistence/test_migrations_v2.pytests/unit/hr/conftest.pytests/unit/hr/test_full_snapshot_strategy.pytests/unit/api/conftest.pytests/unit/observability/test_events.pytests/unit/hr/test_persistence.pytests/unit/persistence/test_protocol.py
🧠 Learnings (8)
📚 Learning: 2026-03-09T20:32:32.549Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-09T20:32:32.549Z
Learning: Applies to **/*.py : Every module with business logic must include logging setup: `from ai_company.observability import get_logger` then `logger = get_logger(__name__)`. Never use `import logging` or `print()` in application code. Variable name must always be `logger`
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-09T20:32:32.549Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-09T20:32:32.549Z
Learning: Applies to **/*.py : Use structured logging with event name constants: `logger.info(EVENT_CONSTANT, key=value)` — never use string interpolation like `logger.info('msg %s', val)`. Import event constants directly from `ai_company.observability.events.<domain>`
Applied to files:
CLAUDE.mdsrc/ai_company/observability/events/performance.pysrc/ai_company/observability/events/hr.py
📚 Learning: 2026-03-09T20:32:32.549Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-09T20:32:32.549Z
Learning: Applies to **/*.py : All error paths must log at WARNING or ERROR with context before raising; all state transitions must log at INFO; DEBUG for object creation, internal flow, and entry/exit of key functions
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-09T20:32:32.549Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-09T20:32:32.549Z
Learning: Applies to tests/**/*.py : Use pytest markers: `pytest.mark.unit`, `pytest.mark.integration`, `pytest.mark.e2e`, `pytest.mark.slow`. Minimum coverage 80% (enforced in CI). Async mode: `asyncio_mode = 'auto'` — no manual `pytest.mark.asyncio` needed. Per-test timeout: 30 seconds. Use `pytest-xdist` via `-n auto` for parallelism. Prefer `pytest.mark.parametrize` for testing similar cases
Applied to files:
tests/unit/hr/performance/test_ci_quality_strategy.py
📚 Learning: 2026-03-09T20:32:32.549Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-09T20:32:32.549Z
Learning: Applies to src/ai_company/**/*.py : Use Pydantic v2 with frozen models for config/identity; use separate mutable-via-copy models (with `model_copy(update=...)`) for runtime state that evolves. Never mix static config fields with mutable runtime fields in one model
Applied to files:
src/ai_company/hr/performance/config.pysrc/ai_company/hr/models.pysrc/ai_company/hr/performance/models.py
📚 Learning: 2026-03-09T20:32:32.549Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-09T20:32:32.549Z
Learning: Applies to src/ai_company/**/*.py : Use `computed_field` for derived values instead of storing + validating redundant fields (e.g. `TokenUsage.total_tokens`); use `NotBlankStr` from `core.types` for all identifier/name fields—including optional (`NotBlankStr | None`) and tuple variants—instead of manual whitespace validators
Applied to files:
src/ai_company/hr/models.pysrc/ai_company/hr/performance/models.py
📚 Learning: 2026-03-09T20:32:32.549Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-09T20:32:32.549Z
Learning: Applies to src/ai_company/**/*.py : Prefer `asyncio.TaskGroup` for fan-out/fan-in parallel operations in new code (multiple tool invocations, parallel agent calls). Prefer structured concurrency over bare `create_task`
Applied to files:
src/ai_company/hr/reassignment_protocol.py
📚 Learning: 2026-03-09T20:32:32.549Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-09T20:32:32.549Z
Learning: Applies to src/ai_company/**/*.py : Create new objects for immutability; never mutate existing ones. For non-Pydantic internal collections (registries, `BaseTool`), use `copy.deepcopy()` at construction + `MappingProxyType` for read-only enforcement. For `dict`/`list` fields in frozen Pydantic models, rely on `frozen=True` and `copy.deepcopy()` at system boundaries (tool execution, LLM provider serialization, inter-agent delegation, persistence serialization)
Applied to files:
src/ai_company/hr/performance/models.py
🧬 Code graph analysis (43)
src/ai_company/hr/performance/window_protocol.py (5)
src/ai_company/hr/performance/models.py (2)
TaskMetricRecord(22-64)WindowMetrics(195-249)src/ai_company/hr/performance/theil_sen_strategy.py (1)
name(69-71)src/ai_company/hr/archival_protocol.py (1)
name(47-49)src/ai_company/hr/performance/collaboration_protocol.py (1)
name(25-27)src/ai_company/hr/performance/multi_window_strategy.py (2)
name(66-68)compute_windows(75-100)
src/ai_company/persistence/protocol.py (4)
src/ai_company/hr/persistence_protocol.py (3)
CollaborationMetricRepository(98-130)LifecycleEventRepository(22-56)TaskMetricRepository(60-94)src/ai_company/persistence/sqlite/backend.py (3)
lifecycle_events(241-247)task_metrics(250-256)collaboration_metrics(259-267)tests/unit/api/conftest.py (3)
lifecycle_events(228-229)task_metrics(232-233)collaboration_metrics(236-237)tests/unit/persistence/test_protocol.py (3)
lifecycle_events(154-155)task_metrics(158-159)collaboration_metrics(162-163)
tests/unit/hr/performance/test_tracker.py (6)
src/ai_company/hr/performance/models.py (6)
CollaborationMetricRecord(67-122)CollaborationScoreResult(150-172)QualityScoreResult(125-147)TaskMetricRecord(22-64)TrendResult(175-192)WindowMetrics(195-249)src/ai_company/hr/performance/config.py (1)
PerformanceConfig(10-59)src/ai_company/hr/performance/collaboration_protocol.py (2)
name(25-27)score(29-46)src/ai_company/hr/performance/quality_protocol.py (2)
name(26-28)score(30-49)src/ai_company/hr/performance/trend_protocol.py (2)
name(25-27)detect(29-46)src/ai_company/hr/performance/window_protocol.py (3)
name(27-29)min_data_points(32-34)compute_windows(36-51)
tests/unit/hr/performance/conftest.py (3)
src/ai_company/core/enums.py (2)
Complexity(247-253)TaskType(227-235)src/ai_company/core/task.py (1)
AcceptanceCriterion(26-42)src/ai_company/hr/performance/models.py (2)
CollaborationMetricRecord(67-122)TaskMetricRecord(22-64)
tests/unit/hr/test_errors.py (1)
src/ai_company/hr/errors.py (15)
AgentAlreadyRegisteredError(64-65)AgentNotFoundError(60-61)AgentRegistryError(56-57)FiringError(30-31)HiringApprovalRequiredError(15-16)HiringError(11-12)HiringRejectedError(19-20)HRError(4-5)InsufficientDataError(75-76)InvalidCandidateError(23-24)MemoryArchivalError(42-43)OffboardingError(34-35)OnboardingError(49-50)PerformanceError(71-72)TaskReassignmentError(38-39)
tests/unit/hr/performance/test_behavioral_collaboration_strategy.py (2)
src/ai_company/hr/performance/behavioral_collaboration_strategy.py (2)
name(58-60)score(62-140)tests/unit/hr/performance/conftest.py (1)
make_collab_metric(44-65)
tests/unit/hr/performance/test_ci_quality_strategy.py (1)
src/ai_company/hr/performance/ci_quality_strategy.py (3)
CISignalQualityStrategy(28-114)name(41-43)score(45-114)
tests/unit/hr/performance/test_models.py (3)
src/ai_company/core/enums.py (2)
Complexity(247-253)TaskType(227-235)src/ai_company/hr/enums.py (1)
TrendDirection(42-48)src/ai_company/hr/performance/models.py (6)
AgentPerformanceSnapshot(252-287)CollaborationMetricRecord(67-122)CollaborationScoreResult(150-172)QualityScoreResult(125-147)TrendResult(175-192)WindowMetrics(195-249)
src/ai_company/persistence/sqlite/migrations.py (1)
tests/unit/hr/test_persistence.py (1)
db(29-35)
src/ai_company/hr/performance/ci_quality_strategy.py (2)
src/ai_company/hr/performance/models.py (2)
QualityScoreResult(125-147)TaskMetricRecord(22-64)src/ai_company/core/task.py (1)
AcceptanceCriterion(26-42)
tests/unit/hr/test_offboarding_service.py (2)
src/ai_company/hr/errors.py (1)
AgentNotFoundError(60-61)src/ai_company/hr/offboarding_service.py (2)
OffboardingService(44-228)offboard(85-228)
src/ai_company/hr/queue_return_strategy.py (3)
src/ai_company/core/enums.py (1)
TaskStatus(198-224)src/ai_company/hr/errors.py (1)
TaskReassignmentError(38-39)src/ai_company/core/task.py (1)
Task(45-261)
src/ai_company/hr/hiring_service.py (8)
src/ai_company/core/agent.py (2)
AgentIdentity(265-323)ModelConfig(149-178)src/ai_company/core/approval.py (1)
ApprovalItem(24-96)src/ai_company/core/enums.py (4)
ActionType(372-384)AgentStatus(67-73)ApprovalRiskLevel(419-425)SeniorityLevel(6-21)src/ai_company/hr/enums.py (1)
HiringRequestStatus(6-12)src/ai_company/hr/models.py (2)
CandidateCard(29-66)HiringRequest(69-141)src/ai_company/hr/registry.py (2)
AgentRegistryService(29-193)register(48-74)src/ai_company/api/approval_store.py (1)
ApprovalStore(30-156)src/ai_company/hr/onboarding_service.py (2)
OnboardingService(27-166)start_onboarding(46-78)
src/ai_company/hr/performance/trend_protocol.py (3)
src/ai_company/hr/performance/models.py (1)
TrendResult(175-192)src/ai_company/hr/performance/theil_sen_strategy.py (2)
name(69-71)detect(73-143)tests/unit/hr/performance/test_tracker.py (5)
name(35-36)name(62-63)name(93-94)name(120-121)detect(123-136)
tests/unit/hr/test_enums.py (1)
src/ai_company/hr/enums.py (5)
FiringReason(15-21)HiringRequestStatus(6-12)LifecycleEventType(32-39)OnboardingStep(24-29)TrendDirection(42-48)
src/ai_company/hr/performance/config.py (4)
src/ai_company/hr/performance/multi_window_strategy.py (1)
min_data_points(71-73)src/ai_company/hr/performance/window_protocol.py (1)
min_data_points(32-34)tests/unit/hr/performance/test_tracker.py (1)
min_data_points(97-98)src/ai_company/tools/base.py (1)
description(115-117)
src/ai_company/hr/persistence_protocol.py (3)
src/ai_company/hr/enums.py (1)
LifecycleEventType(32-39)src/ai_company/hr/models.py (1)
AgentLifecycleEvent(302-331)src/ai_company/hr/performance/models.py (2)
CollaborationMetricRecord(67-122)TaskMetricRecord(22-64)
src/ai_company/hr/performance/behavioral_collaboration_strategy.py (1)
src/ai_company/hr/performance/models.py (2)
CollaborationMetricRecord(67-122)CollaborationScoreResult(150-172)
src/ai_company/hr/performance/collaboration_protocol.py (2)
src/ai_company/hr/performance/models.py (2)
CollaborationMetricRecord(67-122)CollaborationScoreResult(150-172)src/ai_company/hr/performance/behavioral_collaboration_strategy.py (2)
name(58-60)score(62-140)
tests/unit/hr/performance/test_theil_sen_strategy.py (1)
src/ai_company/hr/performance/theil_sen_strategy.py (3)
TheilSenTrendStrategy(43-143)name(69-71)detect(73-143)
src/ai_company/persistence/repositories.py (1)
src/ai_company/hr/persistence_protocol.py (3)
CollaborationMetricRepository(98-130)LifecycleEventRepository(22-56)TaskMetricRepository(60-94)
tests/unit/hr/test_onboarding_service.py (5)
src/ai_company/core/enums.py (1)
AgentStatus(67-73)src/ai_company/hr/enums.py (1)
OnboardingStep(24-29)src/ai_company/hr/errors.py (1)
OnboardingError(49-50)tests/unit/hr/conftest.py (2)
registry(174-176)make_agent_identity(37-55)src/ai_company/hr/registry.py (3)
AgentRegistryService(29-193)register(48-74)get(102-111)
src/ai_company/hr/performance/theil_sen_strategy.py (3)
src/ai_company/hr/enums.py (1)
TrendDirection(42-48)src/ai_company/hr/performance/models.py (1)
TrendResult(175-192)src/ai_company/observability/_logger.py (1)
get_logger(8-28)
src/ai_company/persistence/sqlite/backend.py (2)
src/ai_company/persistence/sqlite/hr_repositories.py (3)
SQLiteCollaborationMetricRepository(247-343)SQLiteLifecycleEventRepository(45-145)SQLiteTaskMetricRepository(148-244)src/ai_company/persistence/protocol.py (3)
lifecycle_events(99-101)task_metrics(104-106)collaboration_metrics(109-111)
src/ai_company/hr/models.py (2)
src/ai_company/core/enums.py (1)
SeniorityLevel(6-21)src/ai_company/hr/enums.py (4)
FiringReason(15-21)HiringRequestStatus(6-12)LifecycleEventType(32-39)OnboardingStep(24-29)
src/ai_company/hr/full_snapshot_strategy.py (7)
src/ai_company/core/enums.py (2)
MemoryCategory(101-108)OrgFactCategory(120-130)src/ai_company/hr/archival_protocol.py (3)
ArchivalResult(17-34)name(47-49)archive(51-70)src/ai_company/memory/consolidation/models.py (1)
ArchivalEntry(56-91)src/ai_company/memory/models.py (1)
MemoryQuery(153-230)src/ai_company/memory/org/models.py (2)
OrgFactAuthor(19-88)OrgFactWriteRequest(113-124)src/ai_company/memory/org/protocol.py (1)
OrgMemoryBackend(20-110)src/ai_company/memory/protocol.py (1)
MemoryBackend(20-182)
src/ai_company/hr/registry.py (4)
src/ai_company/core/enums.py (1)
AgentStatus(67-73)src/ai_company/hr/errors.py (2)
AgentAlreadyRegisteredError(64-65)AgentNotFoundError(60-61)src/ai_company/communication/bus_protocol.py (1)
MessageBus(20-209)src/ai_company/core/agent.py (1)
AgentIdentity(265-323)
tests/unit/hr/test_models.py (4)
src/ai_company/core/enums.py (1)
SeniorityLevel(6-21)src/ai_company/hr/enums.py (4)
FiringReason(15-21)HiringRequestStatus(6-12)LifecycleEventType(32-39)OnboardingStep(24-29)src/ai_company/hr/models.py (5)
AgentLifecycleEvent(302-331)OffboardingRecord(250-299)OnboardingChecklist(219-247)OnboardingStepRecord(187-216)is_complete(245-247)tests/unit/hr/conftest.py (3)
make_candidate_card(58-83)make_firing_request(121-137)make_hiring_request(86-118)
tests/unit/hr/test_queue_return_strategy.py (1)
src/ai_company/hr/queue_return_strategy.py (2)
reassign(42-88)name(38-40)
tests/unit/communication/test_enums.py (1)
src/ai_company/communication/enums.py (1)
MessageType(6-21)
tests/unit/core/test_enums.py (1)
src/ai_company/core/enums.py (1)
AgentStatus(67-73)
tests/unit/hr/performance/test_multi_window_strategy.py (1)
src/ai_company/hr/performance/multi_window_strategy.py (4)
MultiWindowStrategy(44-160)min_data_points(71-73)name(66-68)compute_windows(75-100)
src/ai_company/hr/reassignment_protocol.py (2)
src/ai_company/core/task.py (1)
Task(45-261)src/ai_company/hr/queue_return_strategy.py (2)
name(38-40)reassign(42-88)
src/ai_company/hr/onboarding_service.py (5)
src/ai_company/core/enums.py (1)
AgentStatus(67-73)src/ai_company/hr/enums.py (1)
OnboardingStep(24-29)src/ai_company/hr/errors.py (1)
OnboardingError(49-50)src/ai_company/hr/models.py (3)
OnboardingChecklist(219-247)OnboardingStepRecord(187-216)is_complete(245-247)src/ai_company/hr/registry.py (2)
AgentRegistryService(29-193)update_status(153-188)
src/ai_company/hr/archival_protocol.py (3)
src/ai_company/memory/consolidation/archival.py (1)
ArchivalStore(15-75)src/ai_company/memory/org/protocol.py (1)
OrgMemoryBackend(20-110)src/ai_company/memory/protocol.py (1)
MemoryBackend(20-182)
tests/unit/hr/test_registry.py (3)
src/ai_company/core/enums.py (1)
AgentStatus(67-73)src/ai_company/hr/errors.py (2)
AgentAlreadyRegisteredError(64-65)AgentNotFoundError(60-61)tests/unit/hr/conftest.py (2)
registry(174-176)make_agent_identity(37-55)
tests/unit/persistence/test_migrations_v2.py (1)
src/ai_company/persistence/sqlite/migrations.py (4)
_apply_v1(175-178)get_user_version(146-150)run_migrations(195-263)set_user_version(153-172)
tests/unit/hr/conftest.py (5)
src/ai_company/core/agent.py (2)
AgentIdentity(265-323)ModelConfig(149-178)src/ai_company/hr/enums.py (2)
FiringReason(15-21)HiringRequestStatus(6-12)src/ai_company/hr/models.py (3)
CandidateCard(29-66)FiringRequest(144-184)HiringRequest(69-141)src/ai_company/hr/onboarding_service.py (1)
OnboardingService(27-166)src/ai_company/hr/registry.py (1)
AgentRegistryService(29-193)
tests/unit/hr/test_full_snapshot_strategy.py (1)
src/ai_company/hr/full_snapshot_strategy.py (3)
FullSnapshotStrategy(47-186)archive(65-186)name(61-63)
tests/unit/api/conftest.py (2)
src/ai_company/persistence/sqlite/backend.py (3)
lifecycle_events(241-247)task_metrics(250-256)collaboration_metrics(259-267)src/ai_company/persistence/protocol.py (3)
lifecycle_events(99-101)task_metrics(104-106)collaboration_metrics(109-111)
src/ai_company/hr/performance/quality_protocol.py (3)
src/ai_company/core/task.py (1)
AcceptanceCriterion(26-42)src/ai_company/hr/performance/models.py (2)
QualityScoreResult(125-147)TaskMetricRecord(22-64)src/ai_company/hr/performance/ci_quality_strategy.py (2)
name(41-43)score(45-114)
src/ai_company/hr/performance/multi_window_strategy.py (1)
src/ai_company/hr/performance/models.py (2)
TaskMetricRecord(22-64)WindowMetrics(195-249)
src/ai_company/hr/performance/tracker.py (6)
src/ai_company/hr/performance/models.py (6)
AgentPerformanceSnapshot(252-287)CollaborationMetricRecord(67-122)CollaborationScoreResult(150-172)TaskMetricRecord(22-64)TrendResult(175-192)WindowMetrics(195-249)src/ai_company/hr/performance/collaboration_protocol.py (2)
CollaborationScoringStrategy(17-46)score(29-46)src/ai_company/hr/performance/quality_protocol.py (2)
QualityScoringStrategy(18-49)score(30-49)src/ai_company/hr/performance/trend_protocol.py (2)
TrendDetectionStrategy(17-46)detect(29-46)src/ai_company/hr/performance/window_protocol.py (3)
MetricsWindowStrategy(19-51)compute_windows(36-51)min_data_points(32-34)src/ai_company/hr/performance/multi_window_strategy.py (2)
compute_windows(75-100)min_data_points(71-73)
🪛 LanguageTool
CLAUDE.md
[style] ~87-~87: A comma is missing here.
Context: ...nder ai_company.observability.events (e.g. PROVIDER_CALL_START from `events.prov...
(EG_NO_COMMA)
README.md
[typographical] ~31-~31: To join two clauses or introduce examples, consider using an em dash.
Context: ...uth planned for M7) - HR Engine (M7) - Hiring pipeline (request → generate cand...
(DASH_RULE)
[typographical] ~32-~32: To join two clauses or introduce examples, consider using an em dash.
Context: ...registry - Performance Tracking (M7) - Task metrics, CI-based quality scoring, ...
(DASH_RULE)
[grammar] ~32-~32: Please add a punctuation mark at the end of paragraph.
Context: ... detection, multi-window rolling metric aggregation ### Not implemented yet (planned miles...
(PUNCTUATION_PARAGRAPH_END)
[typographical] ~36-~36: To join two clauses or introduce examples, consider using an em dash.
Context: ...ones) - Memory Backend Adapter (M5) - Memory protocols, retrieval pipeline, or...
(DASH_RULE)
[typographical] ~37-~37: To join two clauses or introduce examples, consider using an em dash.
Context: ..., Temporal KG) planned - CLI Surface - cli/ package is placeholder-only - **S...
(DASH_RULE)
[typographical] ~38-~38: To join two clauses or introduce examples, consider using an em dash.
Context: ...only - Security/Approval System (M7) - security/ package is placeholder-only;...
(DASH_RULE)
[typographical] ~39-~39: To join two clauses or introduce examples, consider using an em dash.
Context: ...Ops agent - Advanced Product Surface - web dashboard, external integrations ##...
(DASH_RULE)
DESIGN_SPEC.md
Outdated
| │ │ └── hr_engine.py # Hiring, firing, performance (M7) | ||
| │ ├── hr/ # HR engine: hiring, firing, onboarding, offboarding, agent registry, performance tracking | ||
| │ │ ├── __init__.py # Package exports | ||
| │ │ ├── enums.py # HR enumerations (AgentStatus, OnboardingStep, etc.) |
There was a problem hiding this comment.
Fix the AgentStatus module reference in the structure map.
AgentStatus is defined in src/ai_company/core/enums.py in this PR, not in src/ai_company/hr/enums.py. Leaving this entry as-is will send readers to the wrong module.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@DESIGN_SPEC.md` at line 2827, The structure map entry incorrectly points to
hr.enums for AgentStatus; update the map so the AgentStatus reference points to
the module where it is actually defined (core.enums) instead of hr.enums, i.e.,
replace the AgentStatus module reference in the map from hr.enums to core.enums
so readers are directed to the correct definition.
| async def archive( | ||
| self, | ||
| *, | ||
| agent_id: NotBlankStr, | ||
| memory_backend: MemoryBackend, | ||
| archival_store: ArchivalStore, | ||
| org_memory_backend: OrgMemoryBackend | None = None, | ||
| ) -> ArchivalResult: | ||
| """Archive all memories for a departing agent. | ||
|
|
||
| Args: | ||
| agent_id: Agent whose memories to archive. | ||
| memory_backend: Hot memory store. | ||
| archival_store: Cold archival storage. | ||
| org_memory_backend: Optional org memory for promotion. | ||
|
|
||
| Returns: | ||
| Result of the archival operation. | ||
|
|
||
| Raises: | ||
| MemoryArchivalError: If retrieval from hot store fails. | ||
| """ | ||
| try: | ||
| entries = await memory_backend.retrieve( | ||
| agent_id, | ||
| MemoryQuery(limit=_MAX_MEMORIES_PER_ARCHIVAL), | ||
| ) | ||
| except Exception as exc: | ||
| msg = f"Failed to retrieve memories for agent {agent_id!r}" | ||
| logger.error( # noqa: TRY400 | ||
| HR_ARCHIVAL_ENTRY_FAILED, | ||
| agent_id=agent_id, | ||
| phase="retrieve", | ||
| error=str(exc), | ||
| error_type=type(exc).__name__, | ||
| ) | ||
| raise MemoryArchivalError(msg) from exc | ||
|
|
||
| now = datetime.now(UTC) | ||
| archived_count = 0 | ||
| promoted_count = 0 | ||
| deleted_ids: list[str] = [] | ||
|
|
||
| for entry in entries: | ||
| # Archive to cold storage. | ||
| try: | ||
| archival_entry = ArchivalEntry( | ||
| original_id=entry.id, | ||
| agent_id=agent_id, | ||
| content=NotBlankStr(entry.content), | ||
| category=entry.category, | ||
| metadata=entry.metadata, | ||
| created_at=entry.created_at, | ||
| archived_at=now, | ||
| ) | ||
| await archival_store.archive(archival_entry) | ||
| archived_count += 1 | ||
| deleted_ids.append(str(entry.id)) | ||
| except (OSError, ValueError, ValidationError) as exc: | ||
| logger.warning( | ||
| HR_ARCHIVAL_ENTRY_FAILED, | ||
| agent_id=agent_id, | ||
| memory_id=str(entry.id), | ||
| phase="archive", | ||
| error=str(exc), | ||
| ) | ||
| continue | ||
|
|
||
| # Promote to org memory if eligible. | ||
| if ( | ||
| org_memory_backend is not None | ||
| and entry.category in _PROMOTABLE_CATEGORIES | ||
| ): | ||
| try: | ||
| org_category = _CATEGORY_MAP[entry.category] | ||
| author = OrgFactAuthor(agent_id=agent_id) | ||
| write_req = OrgFactWriteRequest( | ||
| content=NotBlankStr(entry.content), | ||
| category=org_category, | ||
| ) | ||
| await org_memory_backend.write(write_req, author=author) | ||
| promoted_count += 1 | ||
| except (OSError, ValueError, KeyError) as exc: | ||
| logger.warning( | ||
| HR_ARCHIVAL_ENTRY_FAILED, | ||
| agent_id=agent_id, | ||
| memory_id=str(entry.id), | ||
| phase="promote", | ||
| error=str(exc), | ||
| ) | ||
|
|
||
| # Clean hot store. | ||
| hot_store_cleaned = True | ||
| for memory_id in deleted_ids: | ||
| try: | ||
| await memory_backend.delete(agent_id, NotBlankStr(memory_id)) | ||
| except (OSError, ValueError) as exc: | ||
| hot_store_cleaned = False | ||
| logger.warning( | ||
| HR_ARCHIVAL_ENTRY_FAILED, | ||
| agent_id=agent_id, | ||
| memory_id=memory_id, | ||
| phase="delete", | ||
| error=str(exc), | ||
| ) | ||
|
|
||
| result = ArchivalResult( | ||
| agent_id=agent_id, | ||
| total_archived=archived_count, | ||
| promoted_to_org=promoted_count, | ||
| hot_store_cleaned=hot_store_cleaned, | ||
| strategy_name=NotBlankStr(self.name), | ||
| ) | ||
|
|
||
| logger.info( | ||
| HR_FIRING_MEMORY_ARCHIVED, | ||
| agent_id=agent_id, | ||
| archived=archived_count, | ||
| promoted=promoted_count, | ||
| cleaned=hot_store_cleaned, | ||
| ) | ||
| return result |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Consider extracting helper methods to reduce function length.
The archive() method spans ~120 lines, exceeding the 50-line guideline. The distinct phases (retrieve → archive to cold → promote to org → clean hot store) are natural candidates for extraction into private helper methods (e.g., _archive_entries(), _promote_to_org(), _clean_hot_store()). This would improve readability and testability.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/ai_company/hr/full_snapshot_strategy.py` around lines 65 - 186, The
archive() method is too long and should be broken into helpers; extract the
phases into private methods to improve readability and testability: implement a
_retrieve_entries(agent_id, memory_backend) that wraps memory_backend.retrieve
and error handling, an _archive_entries(entries, archival_store, agent_id, now)
that constructs ArchivalEntry, calls archival_store.archive and returns
(archived_count, deleted_ids), a _promote_to_org(entries, org_memory_backend,
agent_id) that handles category mapping via _CATEGORY_MAP,
OrgFactAuthor/OrgFactWriteRequest and returns promoted_count, and a
_clean_hot_store(deleted_ids, memory_backend, agent_id) that deletes ids and
returns hot_store_cleaned; update archive() to call these helpers and assemble
the ArchivalResult while preserving existing logging and exception behavior (use
function names: archive, _retrieve_entries, _archive_entries, _promote_to_org,
_clean_hot_store, ArchivalEntry, archival_store.archive,
org_memory_backend.write, memory_backend.delete).
| # Clean hot store. | ||
| hot_store_cleaned = True | ||
| for memory_id in deleted_ids: | ||
| try: | ||
| await memory_backend.delete(agent_id, NotBlankStr(memory_id)) | ||
| except (OSError, ValueError) as exc: | ||
| hot_store_cleaned = False | ||
| logger.warning( | ||
| HR_ARCHIVAL_ENTRY_FAILED, | ||
| agent_id=agent_id, | ||
| memory_id=memory_id, | ||
| phase="delete", | ||
| error=str(exc), | ||
| ) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Description: Check MemoryBackend.delete signature for memory_id type
rg -n --type=py -A 3 'async def delete.*agent_id.*memory_id' src/ai_company/memory/Repository: Aureliolo/ai-company
Length of output: 46
🏁 Script executed:
#!/bin/bash
# Find MemoryBackend definition
rg -n --type=py 'class MemoryBackend|def delete.*agent_id' src/ai_company/ | head -20Repository: Aureliolo/ai-company
Length of output: 131
🏁 Script executed:
#!/bin/bash
# Search for memory_backend usage and imports
rg -n --type=py 'memory_backend.delete' src/ai_company/ -B 2 -A 2Repository: Aureliolo/ai-company
Length of output: 527
🏁 Script executed:
#!/bin/bash
# List files in memory directory
fd . src/ai_company/memory/ --type f -name "*.py" | head -20Repository: Aureliolo/ai-company
Length of output: 294
🏁 Script executed:
#!/bin/bash
cat src/ai_company/memory/protocol.pyRepository: Aureliolo/ai-company
Length of output: 5317
🏁 Script executed:
#!/bin/bash
# Read the relevant sections of full_snapshot_strategy.py to see where deleted_ids comes from
head -200 src/ai_company/hr/full_snapshot_strategy.py | tail -100Repository: Aureliolo/ai-company
Length of output: 3727
🏁 Script executed:
#!/bin/bash
# Search for where deleted_ids is defined/assigned
rg -n --type=py 'deleted_ids.*=' src/ai_company/hr/full_snapshot_strategy.py -B 3 -A 1Repository: Aureliolo/ai-company
Length of output: 206
🏁 Script executed:
#!/bin/bash
# Check MemoryEntry model to see what type entry.id is
rg -n --type=py 'class MemoryEntry' src/ai_company/memory/ -A 20Repository: Aureliolo/ai-company
Length of output: 1677
Remove redundant NotBlankStr wrapping in delete call.
deleted_ids stores str(entry.id) (line 119), but entry.id is already NotBlankStr from the backend response. Re-wrapping with NotBlankStr(memory_id) at line 160 re-validates unnecessarily at an internal boundary. Either keep deleted_ids as list[NotBlankStr] and pass directly, or if storing as str, pass it directly without re-wrapping—validation already occurred when the entry was retrieved from the backend.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/ai_company/hr/full_snapshot_strategy.py` around lines 156 - 169, The
delete loop is re-wrapping already-validated IDs with NotBlankStr; remove the
redundant wrap and pass memory_id directly to memory_backend.delete from the hot
store cleanup loop (symbols: deleted_ids, memory_id, memory_backend.delete,
NotBlankStr, hot_store_cleaned); ensure consistency with how deleted_ids is
populated (either keep deleted_ids as list[NotBlankStr] and pass the elements
unchanged, or keep them as list[str] and pass the raw str) and update the code
that builds deleted_ids accordingly so validation happens only once when entries
are read from the backend.
| async def test_cost_efficiency_zero_cost(self) -> None: | ||
| """Zero cost -> max cost efficiency score.""" | ||
| strategy = self._make_strategy() | ||
| task_result = make_task_metric(cost_usd=0.0) | ||
|
|
||
| result = await strategy.score( | ||
| agent_id=NotBlankStr("agent-001"), | ||
| task_id=NotBlankStr("task-001"), | ||
| task_result=task_result, | ||
| acceptance_criteria=(), | ||
| ) | ||
|
|
||
| breakdown_dict = dict(result.breakdown) | ||
| assert breakdown_dict["cost_efficiency"] == 10.0 | ||
|
|
||
| async def test_cost_efficiency_high_cost(self) -> None: | ||
| """Cost >= 10 -> zero cost efficiency score.""" | ||
| strategy = self._make_strategy() | ||
| task_result = make_task_metric(cost_usd=15.0) | ||
|
|
||
| result = await strategy.score( | ||
| agent_id=NotBlankStr("agent-001"), | ||
| task_id=NotBlankStr("task-001"), | ||
| task_result=task_result, | ||
| acceptance_criteria=(), | ||
| ) | ||
|
|
||
| breakdown_dict = dict(result.breakdown) | ||
| assert breakdown_dict["cost_efficiency"] == 0.0 |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Collapse the cost-efficiency assertions into the parametrized test.
test_cost_efficiency_zero_cost and test_cost_efficiency_high_cost hit the same mapping branches that test_cost_efficiency_parametrized already covers, so keeping both forms only adds drift and maintenance.
As per coding guidelines "Prefer @pytest.mark.parametrize for testing similar cases".
Also applies to: 159-185
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/unit/hr/performance/test_ci_quality_strategy.py` around lines 129 -
157, Merge the two separate tests test_cost_efficiency_zero_cost and
test_cost_efficiency_high_cost into the existing parametrized
test_cost_efficiency_parametrized: add parameter rows for (cost_usd=0.0,
expected=10.0) and (cost_usd=15.0, expected=0.0), use the same setup via
self._make_strategy() and make_task_metric, call
strategy.score(agent_id=NotBlankStr("agent-001"),
task_id=NotBlankStr("task-001"), task_result=task_result,
acceptance_criteria=()), and assert dict(result.breakdown)["cost_efficiency"] ==
expected; then remove the now-duplicate test_cost_efficiency_zero_cost and
test_cost_efficiency_high_cost (and the other duplicate pair noted) so only the
parametrized test covers these cases.
| async def test_snapshot_with_windows_and_trends(self) -> None: | ||
| """Snapshot includes windows and trends from strategies.""" | ||
| window = WindowMetrics( | ||
| window_size=NotBlankStr("7d"), | ||
| data_point_count=10, | ||
| tasks_completed=8, | ||
| tasks_failed=2, | ||
| success_rate=0.8, | ||
| ) | ||
| tracker = _make_tracker( | ||
| window=MockWindowStrategy( | ||
| windows=(window,), | ||
| min_data_points=5, | ||
| ), | ||
| ) | ||
| # Add a scored task record so trend computation has data | ||
| record = make_task_metric( | ||
| completed_at=NOW - timedelta(hours=1), | ||
| quality_score=7.5, | ||
| ) | ||
| await tracker.record_task_metric(record) | ||
|
|
||
| snapshot = await tracker.get_snapshot( | ||
| NotBlankStr("agent-001"), | ||
| now=NOW, | ||
| ) | ||
|
|
||
| assert len(snapshot.windows) == 1 | ||
| assert snapshot.windows[0].window_size == "7d" | ||
| # Trends should be computed (quality_score + cost_usd) | ||
| assert len(snapshot.trends) == 2 | ||
| assert snapshot.overall_quality_score == 7.5 | ||
|
|
There was a problem hiding this comment.
Don't assert trends from a single task sample.
This test only records one TaskMetricRecord, but the feature contract requires minimum data points before trend detection. The mocked WindowMetrics.data_point_count=10 does not supply the time-series values that TrendStrategy.detect(...) consumes, so len(snapshot.trends) == 2 bakes in under-sampled trend output.
Make the scenario satisfy the threshold instead
- # Add a scored task record so trend computation has data
- record = make_task_metric(
- completed_at=NOW - timedelta(hours=1),
- quality_score=7.5,
- )
- await tracker.record_task_metric(record)
+ # Seed at least min_data_points scored records so trend computation is valid.
+ for hours_ago, quality_score in enumerate(
+ (7.1, 7.2, 7.3, 7.4, 7.5),
+ start=1,
+ ):
+ await tracker.record_task_metric(
+ make_task_metric(
+ task_id=f"task-{hours_ago:03d}",
+ completed_at=NOW - timedelta(hours=hours_ago),
+ quality_score=quality_score,
+ )
+ )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/unit/hr/performance/test_tracker.py` around lines 253 - 285, The test
test_snapshot_with_windows_and_trends asserts trends are present despite only
recording one TaskMetricRecord and not supplying the time-series that
TrendStrategy.detect uses; either (A) make the scenario satisfy the
trend-detection threshold by adding enough TaskMetricRecord entries (call
tracker.record_task_metric multiple times at different timestamps so
WindowMetrics.data_point_count and TrendStrategy.detect receive sufficient
samples) before calling tracker.get_snapshot, or (B) change the assertion to
expect no trends (assert snapshot.trends == ()) to verify the threshold guard;
update references in the test around
MockWindowStrategy/WindowMetrics.data_point_count, tracker.record_task_metric,
and tracker.get_snapshot accordingly.
| async def list_tasks( | ||
| self, | ||
| *, | ||
| status: TaskStatus | None = None, | ||
| assigned_to: str | None = None, | ||
| project: str | None = None, | ||
| ) -> tuple[Task, ...]: | ||
| result = list(self._tasks.values()) | ||
| if status is not None: | ||
| result = [t for t in result if t.status == status] | ||
| if assigned_to is not None: | ||
| result = [t for t in result if t.assigned_to == assigned_to] | ||
| if project is not None: | ||
| result = [t for t in result if t.project == project] | ||
| return tuple(result) |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Type hint for assigned_to parameter should be NotBlankStr | None.
The list_tasks method signature uses assigned_to: str | None = None, but looking at the OffboardingService (line 121 in offboarding_service.py), it calls list_tasks(assigned_to=NotBlankStr(agent_id)). The fake should match the expected interface more closely. However, since this is a test double and Python's duck typing allows it to work, this is minor.
🧹 Proposed fix for type consistency
async def list_tasks(
self,
*,
status: TaskStatus | None = None,
- assigned_to: str | None = None,
+ assigned_to: NotBlankStr | None = None,
project: str | None = None,
) -> tuple[Task, ...]:🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/unit/hr/test_offboarding_service.py` around lines 38 - 52, The test
double list_tasks in tests/unit/hr/test_offboarding_service.py declares
assigned_to: str | None but OffboardingService calls it with
NotBlankStr(agent_id); change the signature to assigned_to: NotBlankStr | None
to match the expected type, add the appropriate import for NotBlankStr from the
module where it’s defined, and update any related type hints in that file so the
fake ListTasks function signature aligns with the OffboardingService usage
(function name: list_tasks).
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces a significant new "HR Engine" and "Performance Tracking" system, fundamentally expanding the capabilities of the AI company platform. The HR engine manages the entire lifecycle of AI agents, from their initial hiring and onboarding to their eventual offboarding, including task reassignment and memory archival. Complementing this, the performance tracking system provides detailed insights into agent efficiency, quality, and collaboration through various metrics and trend analysis. These additions are supported by new persistence layers and enhanced observability, making the platform more robust and manageable for operating a virtual organization of AI agents. Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a comprehensive HR engine and performance tracking system, including a full hiring pipeline, onboarding/offboarding processes, an agent registry, and detailed performance metrics. The persistence layer has been extended with SQLite repositories for new HR-related data. However, two significant race conditions were identified in the HiringService and OnboardingService due to asynchronous operations between reading and updating state, which can lead to duplicate agent instantiation and inconsistent onboarding checklists. It is recommended to use asynchronous locks to ensure atomic state transitions. Additionally, several key services (HiringService, OnboardingService, PerformanceTracker, AgentRegistryService) are currently using in-memory storage, leading to state loss on restart, and should be refactored to utilize the new persistence layer. There are also opportunities for more specific exception handling.
| self._task_metrics: dict[str, list[TaskMetricRecord]] = {} | ||
| self._collab_metrics: dict[str, list[CollaborationMetricRecord]] = {} |
There was a problem hiding this comment.
The PerformanceTracker service uses in-memory dictionaries for storing task and collaboration metrics. This is a critical issue as all performance data will be lost upon service restart. The PR includes the implementation of SQLiteTaskMetricRepository and SQLiteCollaborationMetricRepository. This service should be refactored to use these repositories for persistent storage.
| *, | ||
| message_bus: MessageBus | None = None, | ||
| ) -> None: | ||
| self._agents: dict[str, AgentIdentity] = {} |
There was a problem hiding this comment.
The AgentRegistryService stores agent identities in an in-memory dictionary. This is a critical flaw, as the entire state of the company's workforce will be lost if the application restarts. The agent registry must be backed by a persistent store. The PR includes persistence layers for other HR entities, and the agent registry should be treated with the same importance.
| self._registry = registry | ||
| self._approval_store = approval_store | ||
| self._onboarding_service = onboarding_service | ||
| self._requests: dict[str, HiringRequest] = {} |
There was a problem hiding this comment.
The HiringService uses an in-memory dictionary self._requests to store hiring requests. This makes the state of hiring requests volatile, as it will be lost if the service restarts. The PR correctly adds persistence repositories for HR data, but this service doesn't seem to use them. The service should be refactored to accept and use a repository for HiringRequest persistence.
| registry: AgentRegistryService, | ||
| ) -> None: | ||
| self._registry = registry | ||
| self._checklists: dict[str, OnboardingChecklist] = {} |
There was a problem hiding this comment.
src/ai_company/hr/hiring_service.py
Outdated
| if request.status == HiringRequestStatus.INSTANTIATED: | ||
| msg = f"Hiring request {request.id!r} is already instantiated" | ||
| logger.warning( | ||
| HR_HIRING_INSTANTIATION_FAILED, | ||
| request_id=str(request.id), | ||
| error=msg, | ||
| ) | ||
| raise HiringError(msg) | ||
| if request.status == HiringRequestStatus.REJECTED: | ||
| msg = f"Hiring request {request.id!r} was rejected" | ||
| logger.warning( | ||
| HR_HIRING_INSTANTIATION_FAILED, | ||
| request_id=str(request.id), | ||
| error=msg, | ||
| ) | ||
| raise HiringRejectedError(msg) | ||
| if request.status == HiringRequestStatus.PENDING: | ||
| msg = f"Hiring request {request.id!r} requires approval" | ||
| logger.warning( | ||
| HR_HIRING_INSTANTIATION_FAILED, | ||
| request_id=str(request.id), | ||
| error=msg, | ||
| ) | ||
| raise HiringApprovalRequiredError(msg) | ||
| if request.selected_candidate_id is None: | ||
| msg = f"No candidate selected on request {request.id!r}" | ||
| logger.warning( | ||
| HR_HIRING_INSTANTIATION_FAILED, | ||
| request_id=str(request.id), | ||
| error=msg, | ||
| ) | ||
| raise InvalidCandidateError(msg) | ||
|
|
||
| candidate = next( | ||
| ( | ||
| c | ||
| for c in request.candidates | ||
| if str(c.id) == request.selected_candidate_id | ||
| ), | ||
| None, | ||
| ) | ||
| if candidate is None: | ||
| msg = ( | ||
| f"Selected candidate {request.selected_candidate_id!r} " | ||
| f"not found on request {request.id!r}" | ||
| ) | ||
| logger.warning( | ||
| HR_HIRING_INSTANTIATION_FAILED, | ||
| request_id=str(request.id), | ||
| error=msg, | ||
| ) | ||
| raise InvalidCandidateError(msg) | ||
|
|
||
| try: | ||
| identity = AgentIdentity( | ||
| name=candidate.name, | ||
| role=candidate.role, | ||
| department=candidate.department, | ||
| level=candidate.level, | ||
| model=ModelConfig( | ||
| provider=NotBlankStr("default-provider"), | ||
| model_id=NotBlankStr("default-model-001"), | ||
| ), | ||
| status=AgentStatus.ONBOARDING, | ||
| hiring_date=datetime.now(UTC).date(), | ||
| ) | ||
| await self._registry.register(identity) | ||
| except Exception as exc: | ||
| msg = f"Failed to instantiate agent for request {request.id!r}" | ||
| logger.exception( | ||
| HR_HIRING_INSTANTIATION_FAILED, | ||
| request_id=str(request.id), | ||
| error=str(exc), | ||
| ) | ||
| raise HiringError(msg) from exc | ||
|
|
||
| # Update request status. | ||
| updated = request.model_copy( | ||
| update={"status": HiringRequestStatus.INSTANTIATED}, | ||
| ) | ||
| self._requests[str(updated.id)] = updated |
There was a problem hiding this comment.
The instantiate_agent method is vulnerable to a race condition. It checks if a hiring request has already been instantiated at the beginning of the method, but then performs an asynchronous operation (await self._registry.register(identity)) before updating the request status to INSTANTIATED. In a concurrent environment, multiple tasks could pass the initial status check before the first one completes and updates the status, leading to multiple agents being hired for a single approved request, potentially resulting in unauthorized resource allocation and budget bypass. Additionally, the except Exception block is too broad and can hide specific issues during agent registration, such as AgentAlreadyRegisteredError. Consider catching more specific exceptions first, and use except Exception as a final catch-all if necessary.
| checklist = self._checklists.get(agent_id) | ||
| if checklist is None: | ||
| msg = f"No onboarding checklist for agent {agent_id!r}" | ||
| logger.warning( | ||
| HR_ONBOARDING_STEP_COMPLETE, | ||
| agent_id=agent_id, | ||
| error=msg, | ||
| ) | ||
| raise OnboardingError(msg) | ||
|
|
||
| now = datetime.now(UTC) | ||
| step_found = any(s.step == step and not s.completed for s in checklist.steps) | ||
| if not step_found: | ||
| logger.debug( | ||
| HR_ONBOARDING_STEP_COMPLETE, | ||
| agent_id=agent_id, | ||
| step=step.value, | ||
| skipped="step_not_found_or_already_completed", | ||
| ) | ||
| return checklist | ||
|
|
||
| updated_steps = tuple( | ||
| s.model_copy( | ||
| update={ | ||
| "completed": True, | ||
| "completed_at": now, | ||
| "notes": notes, | ||
| }, | ||
| ) | ||
| if s.step == step and not s.completed | ||
| else s | ||
| for s in checklist.steps | ||
| ) | ||
|
|
||
| updated = checklist.model_copy(update={"steps": updated_steps}) | ||
|
|
||
| # Check if all steps are now complete. | ||
| if updated.is_complete and not checklist.is_complete: | ||
| updated = updated.model_copy(update={"completed_at": now}) | ||
| await self._registry.update_status(agent_id, AgentStatus.ACTIVE) | ||
| logger.info(HR_ONBOARDING_COMPLETE, agent_id=agent_id) | ||
|
|
||
| self._checklists[agent_id] = updated |
There was a problem hiding this comment.
The complete_step method contains a race condition that can lead to state inconsistency. It reads the current checklist from self._checklists, performs an asynchronous operation (await self._registry.update_status(...)) if the checklist is completed, and then saves the updated checklist back to the dictionary. If multiple steps for the same agent are completed concurrently, one update may overwrite another because each task is working with a potentially stale copy of the checklist read before the other task's update was saved. This could result in completed steps being reverted to a pending state.
| except Exception as exc: | ||
| msg = f"Failed to retrieve memories for agent {agent_id!r}" | ||
| logger.error( # noqa: TRY400 | ||
| HR_ARCHIVAL_ENTRY_FAILED, | ||
| agent_id=agent_id, | ||
| phase="retrieve", | ||
| error=str(exc), | ||
| error_type=type(exc).__name__, | ||
| ) | ||
| raise MemoryArchivalError(msg) from exc |
There was a problem hiding this comment.
This except Exception block is too broad and can mask unexpected errors. The PR description mentions that broad exception blocks were narrowed, but this one seems to have been missed. It's better to catch more specific exceptions that you expect memory_backend.retrieve to raise, such as IOError, ConnectionError, or a custom MemoryRetrievalError if one exists in the project.
| except Exception as exc: | |
| msg = f"Failed to retrieve memories for agent {agent_id!r}" | |
| logger.error( # noqa: TRY400 | |
| HR_ARCHIVAL_ENTRY_FAILED, | |
| agent_id=agent_id, | |
| phase="retrieve", | |
| error=str(exc), | |
| error_type=type(exc).__name__, | |
| ) | |
| raise MemoryArchivalError(msg) from exc | |
| except (IOError, ConnectionError) as exc: # Or other specific expected errors | |
| msg = f"Failed to retrieve memories for agent {agent_id!r}" | |
| logger.error( # noqa: TRY400 | |
| HR_ARCHIVAL_ENTRY_FAILED, | |
| agent_id=agent_id, | |
| phase="retrieve", | |
| error=str(exc), | |
| error_type=type(exc).__name__, | |
| ) | |
| raise MemoryArchivalError(msg) from exc |
Implement the full HR module (agent lifecycle management) and performance tracking system for M7. Includes: - Performance models, protocols, and 4 pluggable strategies (CI quality scoring, behavioral collaboration, multi-window metrics, Theil-Sen trend detection) - PerformanceTracker service with in-memory storage - HR models (CandidateCard, HiringRequest, FiringRequest, onboarding/ offboarding records, lifecycle events) - AgentRegistryService with asyncio.Lock-protected hot-pluggable registry - HiringService (request → candidate → approval → instantiation) - OnboardingService (checklist tracking, auto-activation) - OffboardingService (task reassignment → memory archival → team notify) - QueueReturnStrategy (D9) and FullSnapshotStrategy (D10) - SQLite schema v2 migration (lifecycle_events, task_metrics, collaboration_metrics tables) - 3 new SQLite repository implementations - HR and performance persistence protocols - AgentStatus.ONBOARDING and MessageType.HR_NOTIFICATION enums - 16 HR + 6 performance observability event constants - 286 unit tests at 91.6% coverage
Pre-reviewed by 9 agents, 57 findings addressed: - Fix critical bug: trend detection now filters records per time window - Fix re-instantiation vulnerability: guard against INSTANTIATED status - Narrow except Exception blocks to specific types across 5+ locations - Add missing logger.warning() before raises per CLAUDE.md rules - Extract HR repos to hr_repositories.py (800-line file limit) - Fix since/until types from object to AwareDatetime in repo queries - Add model validators (threshold ordering, temporal order, step consistency) - Fix NotBlankStr type violations in 5 model fields - Update DESIGN_SPEC.md, CLAUDE.md, README.md for HR module
- Fix ActionType.HIRING → ActionType.ORG_HIRE (new enum from SecOps merge) - Resolve rebase conflicts in CLAUDE.md, DESIGN_SPEC.md, README.md - Merge security event constants into CLAUDE.md logging section - Update implementation snapshot to include SecOps + Docker/MCP/HR - Fix test_events.py to include security, hr, and performance modules - Fix hiring_service: _get_request pattern, specific exception handling, budget_limit_monthly validation, configurable default_model_config, refactored into small helpers - Fix offboarding_service: default hot_store_cleaned=False, termination raises OffboardingError, broader exception handling, refactored helpers - Fix full_snapshot_strategy: OrgFactAuthor receives agent_seniority, narrowed exception handling, refactored into helpers - Fix performance tracker: correct default strategy imports, optional strategy params with PerformanceConfig defaults, window label warning - Fix ci_quality_strategy: log-scaled cost scoring replaces broken linear - Fix multi_window_strategy: success_rate=None below min_data_points - Fix collaboration_protocol docstring accuracy - Fix theil_sen_strategy _median docstring for empty-list sentinel - Lint fixes: TRY400, TC001, PLC0415, EM101, TRY003, N817, I001, E501 - Test updates for all source changes + new edge-case tests
48f28cf to
f2c49f7
Compare
| # Cost trend. | ||
| cost_values = tuple((r.completed_at, r.cost_usd) for r in window_records) | ||
| if cost_values: | ||
| trends.append( | ||
| self._trend_strategy.detect( | ||
| metric_name=NotBlankStr("cost_usd"), | ||
| values=cost_values, | ||
| window_size=window.window_size, | ||
| ) | ||
| ) |
There was a problem hiding this comment.
Cost trend direction is semantically inverted
TheilSenTrendStrategy uses the same sign convention for every metric: positive slope → IMPROVING, negative slope → DECLINING. This is correct for quality_score (higher is better), but wrong for cost_usd where a positive slope (rising costs) should be DECLINING.
As written, an agent whose cost_usd trend is increasing over a window will be reported as IMPROVING, and a cost-efficient agent reducing spend will be reported as DECLINING. The result is that cost trend data in AgentPerformanceSnapshot is actively misleading.
The fix requires either:
- Passing a
higher_is_better: boolflag intoTrendDetectionStrategy.detect()so it can invert the direction for cost metrics, or - Inverting the values before passing them (e.g., pass
-cost_usd) for metrics where lower is better.
# Cost trend — negate cost so that declining cost = positive slope = IMPROVING.
cost_values = tuple((r.completed_at, -r.cost_usd) for r in window_records)Prompt To Fix With AI
This is a comment left during a code review.
Path: src/ai_company/hr/performance/tracker.py
Line: 315-324
Comment:
**Cost trend direction is semantically inverted**
`TheilSenTrendStrategy` uses the same sign convention for every metric: positive slope → `IMPROVING`, negative slope → `DECLINING`. This is correct for `quality_score` (higher is better), but wrong for `cost_usd` where a **positive slope (rising costs) should be `DECLINING`**.
As written, an agent whose `cost_usd` trend is increasing over a window will be reported as `IMPROVING`, and a cost-efficient agent reducing spend will be reported as `DECLINING`. The result is that cost trend data in `AgentPerformanceSnapshot` is actively misleading.
The fix requires either:
1. Passing a `higher_is_better: bool` flag into `TrendDetectionStrategy.detect()` so it can invert the direction for cost metrics, or
2. Inverting the values before passing them (e.g., pass `-cost_usd`) for metrics where lower is better.
```python
# Cost trend — negate cost so that declining cost = positive slope = IMPROVING.
cost_values = tuple((r.completed_at, -r.cost_usd) for r in window_records)
```
How can I resolve this? If you propose a fix, please make it concise.| msg = f"Hiring request {request_id!r} not found" | ||
| logger.warning( | ||
| HR_HIRING_REQUEST_CREATED, | ||
| request_id=request_id, | ||
| error=msg, | ||
| ) | ||
| raise HiringError(msg) |
There was a problem hiding this comment.
Wrong log event constant used for "not found" error
_get_request logs HR_HIRING_REQUEST_CREATED when a hiring request is not found. Any monitoring rule or structured-log query watching HR_HIRING_REQUEST_CREATED will incorrectly count this error path as a successful request creation event.
The correct constant here should be something that signals a lookup failure. If no dedicated "not found" constant exists in events.hr, using the downstream failure event (e.g. HR_HIRING_INSTANTIATION_FAILED) or creating a new HR_HIRING_REQUEST_NOT_FOUND constant would be more accurate.
| msg = f"Hiring request {request_id!r} not found" | |
| logger.warning( | |
| HR_HIRING_REQUEST_CREATED, | |
| request_id=request_id, | |
| error=msg, | |
| ) | |
| raise HiringError(msg) | |
| logger.warning( | |
| HR_HIRING_INSTANTIATION_FAILED, | |
| request_id=request_id, | |
| error=msg, | |
| ) |
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/ai_company/hr/hiring_service.py
Line: 93-99
Comment:
**Wrong log event constant used for "not found" error**
`_get_request` logs `HR_HIRING_REQUEST_CREATED` when a hiring request is **not found**. Any monitoring rule or structured-log query watching `HR_HIRING_REQUEST_CREATED` will incorrectly count this error path as a successful request creation event.
The correct constant here should be something that signals a lookup failure. If no dedicated "not found" constant exists in `events.hr`, using the downstream failure event (e.g. `HR_HIRING_INSTANTIATION_FAILED`) or creating a new `HR_HIRING_REQUEST_NOT_FOUND` constant would be more accurate.
```suggestion
logger.warning(
HR_HIRING_INSTANTIATION_FAILED,
request_id=request_id,
error=msg,
)
```
How can I resolve this? If you propose a fix, please make it concise.| agent_name=identity.name, | ||
| firing_request_id=request.id, | ||
| tasks_reassigned=tasks_reassigned, | ||
| memory_archive_id=None, |
There was a problem hiding this comment.
memory_archive_id is always hardcoded to None
OffboardingRecord.memory_archive_id exists to capture the ID of the cold-storage archive created during offboarding, but it is unconditionally set to None here. The archival_result object (of type ArchivalResult) is available at this point and carries total_archived and promoted_to_org, but ArchivalResult itself does not expose an archive ID field — meaning there is currently no way to populate this field.
This is a gap between the data model (OffboardingRecord has the field; ArchivalResult does not). If the archive ID is needed for auditability or downstream references, ArchivalResult needs an archive_id: NotBlankStr | None field that implementing strategies can populate. Otherwise the field in OffboardingRecord is permanently dead and creates misleading API surface.
At minimum, consider adding archive_id: NotBlankStr | None = None to ArchivalResult so that FullSnapshotStrategy (or any other archival implementation) can return the ID of the archive record it creates.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/ai_company/hr/offboarding_service.py
Line: 144
Comment:
**`memory_archive_id` is always hardcoded to `None`**
`OffboardingRecord.memory_archive_id` exists to capture the ID of the cold-storage archive created during offboarding, but it is unconditionally set to `None` here. The `archival_result` object (of type `ArchivalResult`) is available at this point and carries `total_archived` and `promoted_to_org`, but `ArchivalResult` itself does not expose an archive ID field — meaning there is currently no way to populate this field.
This is a gap between the data model (`OffboardingRecord` has the field; `ArchivalResult` does not). If the archive ID is needed for auditability or downstream references, `ArchivalResult` needs an `archive_id: NotBlankStr | None` field that implementing strategies can populate. Otherwise the field in `OffboardingRecord` is permanently dead and creates misleading API surface.
At minimum, consider adding `archive_id: NotBlankStr | None = None` to `ArchivalResult` so that `FullSnapshotStrategy` (or any other archival implementation) can return the ID of the archive record it creates.
How can I resolve this? If you propose a fix, please make it concise.| """ | ||
| result = await self._quality_strategy.score( | ||
| agent_id=agent_id, | ||
| task_id=task_id, | ||
| task_result=task_result, | ||
| acceptance_criteria=acceptance_criteria, | ||
| ) | ||
| return task_result.model_copy(update={"quality_score": result.score}) |
There was a problem hiding this comment.
score_task_quality does not update the stored record
score_task_quality computes the quality score and returns a new TaskMetricRecord with quality_score populated, but it never writes the updated record back into self._task_metrics. The in-memory store continues to hold the original record with quality_score=None.
Consequence: get_snapshot() aggregates overall_quality_score from self._task_metrics, so it will always compute None (no scored records) even after score_task_quality has been called. Trend detection on quality_score in _compute_trends is similarly broken — quality_values will always be empty because no stored record ever has a non-None quality_score.
The only workaround for callers today is to take the returned scored record and call record_task_metric(scored_record) a second time — which also duplicates the entry in the list. There is no "replace" API.
The method should write the scored record back into self._task_metrics, replacing the original entry by matching on record.id, before returning it.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/ai_company/hr/performance/tracker.py
Line: 163-170
Comment:
**`score_task_quality` does not update the stored record**
`score_task_quality` computes the quality score and returns a new `TaskMetricRecord` with `quality_score` populated, but it **never writes the updated record back into `self._task_metrics`**. The in-memory store continues to hold the original record with `quality_score=None`.
Consequence: `get_snapshot()` aggregates `overall_quality_score` from `self._task_metrics`, so it will always compute `None` (no scored records) even after `score_task_quality` has been called. Trend detection on `quality_score` in `_compute_trends` is similarly broken — `quality_values` will always be empty because no stored record ever has a non-`None` `quality_score`.
The only workaround for callers today is to take the returned scored record and call `record_task_metric(scored_record)` a second time — which also duplicates the entry in the list. There is no "replace" API.
The method should write the scored record back into `self._task_metrics`, replacing the original entry by matching on `record.id`, before returning it.
How can I resolve this? If you propose a fix, please make it concise.🤖 I have created a release *beep* *boop* --- ## [0.1.1](ai-company-v0.1.0...ai-company-v0.1.1) (2026-03-10) ### Features * add autonomy levels and approval timeout policies ([#42](#42), [#126](#126)) ([#197](#197)) ([eecc25a](eecc25a)) * add CFO cost optimization service with anomaly detection, reports, and approval decisions ([#186](#186)) ([a7fa00b](a7fa00b)) * add code quality toolchain (ruff, mypy, pre-commit, dependabot) ([#63](#63)) ([36681a8](36681a8)) * add configurable cost tiers and subscription/quota-aware tracking ([#67](#67)) ([#185](#185)) ([9baedfa](9baedfa)) * add container packaging, Docker Compose, and CI pipeline ([#269](#269)) ([435bdfe](435bdfe)), closes [#267](#267) * add coordination error taxonomy classification pipeline ([#146](#146)) ([#181](#181)) ([70c7480](70c7480)) * add cost-optimized, hierarchical, and auction assignment strategies ([#175](#175)) ([ce924fa](ce924fa)), closes [#173](#173) * add design specification, license, and project setup ([8669a09](8669a09)) * add env var substitution and config file auto-discovery ([#77](#77)) ([7f53832](7f53832)) * add FastestStrategy routing + vendor-agnostic cleanup ([#140](#140)) ([09619cb](09619cb)), closes [#139](#139) * add HR engine and performance tracking ([#45](#45), [#47](#47)) ([#193](#193)) ([2d091ea](2d091ea)) * add issue auto-search and resolution verification to PR review skill ([#119](#119)) ([deecc39](deecc39)) * add memory retrieval, ranking, and context injection pipeline ([#41](#41)) ([873b0aa](873b0aa)) * add pluggable MemoryBackend protocol with models, config, and events ([#180](#180)) ([46cfdd4](46cfdd4)) * add pluggable MemoryBackend protocol with models, config, and events ([#32](#32)) ([46cfdd4](46cfdd4)) * add pluggable PersistenceBackend protocol with SQLite implementation ([#36](#36)) ([f753779](f753779)) * add progressive trust and promotion/demotion subsystems ([#43](#43), [#49](#49)) ([3a87c08](3a87c08)) * add retry handler, rate limiter, and provider resilience ([#100](#100)) ([b890545](b890545)) * add SecOps security agent with rule engine, audit log, and ToolInvoker integration ([#40](#40)) ([83b7b6c](83b7b6c)) * add shared org memory and memory consolidation/archival ([#125](#125), [#48](#48)) ([4a0832b](4a0832b)) * design unified provider interface ([#86](#86)) ([3e23d64](3e23d64)) * expand template presets, rosters, and add inheritance ([#80](#80), [#81](#81), [#84](#84)) ([15a9134](15a9134)) * implement agent runtime state vs immutable config split ([#115](#115)) ([4cb1ca5](4cb1ca5)) * implement AgentEngine core orchestrator ([#11](#11)) ([#143](#143)) ([f2eb73a](f2eb73a)) * implement basic tool system (registry, invocation, results) ([#15](#15)) ([c51068b](c51068b)) * implement built-in file system tools ([#18](#18)) ([325ef98](325ef98)) * implement communication foundation — message bus, dispatcher, and messenger ([#157](#157)) ([8e71bfd](8e71bfd)) * implement company template system with 7 built-in presets ([#85](#85)) ([cbf1496](cbf1496)) * implement conflict resolution protocol ([#122](#122)) ([#166](#166)) ([e03f9f2](e03f9f2)) * implement core entity and role system models ([#69](#69)) ([acf9801](acf9801)) * implement crash recovery with fail-and-reassign strategy ([#149](#149)) ([e6e91ed](e6e91ed)) * implement engine extensions — Plan-and-Execute loop and call categorization ([#134](#134), [#135](#135)) ([#159](#159)) ([9b2699f](9b2699f)) * implement enterprise logging system with structlog ([#73](#73)) ([2f787e5](2f787e5)) * implement graceful shutdown with cooperative timeout strategy ([#130](#130)) ([6592515](6592515)) * implement hierarchical delegation and loop prevention ([#12](#12), [#17](#17)) ([6be60b6](6be60b6)) * implement LiteLLM driver and provider registry ([#88](#88)) ([ae3f18b](ae3f18b)), closes [#4](#4) * implement LLM decomposition strategy and workspace isolation ([#174](#174)) ([aa0eefe](aa0eefe)) * implement meeting protocol system ([#123](#123)) ([ee7caca](ee7caca)) * implement message and communication domain models ([#74](#74)) ([560a5d2](560a5d2)) * implement model routing engine ([#99](#99)) ([d3c250b](d3c250b)) * implement parallel agent execution ([#22](#22)) ([#161](#161)) ([65940b3](65940b3)) * implement per-call cost tracking service ([#7](#7)) ([#102](#102)) ([c4f1f1c](c4f1f1c)) * implement personality injection and system prompt construction ([#105](#105)) ([934dd85](934dd85)) * implement single-task execution lifecycle ([#21](#21)) ([#144](#144)) ([c7e64e4](c7e64e4)) * implement subprocess sandbox for tool execution isolation ([#131](#131)) ([#153](#153)) ([3c8394e](3c8394e)) * implement task assignment subsystem with pluggable strategies ([#172](#172)) ([c7f1b26](c7f1b26)), closes [#26](#26) [#30](#30) * implement task decomposition and routing engine ([#14](#14)) ([9c7fb52](9c7fb52)) * implement Task, Project, Artifact, Budget, and Cost domain models ([#71](#71)) ([81eabf1](81eabf1)) * implement tool permission checking ([#16](#16)) ([833c190](833c190)) * implement YAML config loader with Pydantic validation ([#59](#59)) ([ff3a2ba](ff3a2ba)) * implement YAML config loader with Pydantic validation ([#75](#75)) ([ff3a2ba](ff3a2ba)) * initialize project with uv, hatchling, and src layout ([39005f9](39005f9)) * initialize project with uv, hatchling, and src layout ([#62](#62)) ([39005f9](39005f9)) * Litestar REST API, WebSocket feed, and approval queue (M6) ([#189](#189)) ([29fcd08](29fcd08)) * make TokenUsage.total_tokens a computed field ([#118](#118)) ([c0bab18](c0bab18)), closes [#109](#109) * parallel tool execution in ToolInvoker.invoke_all ([#137](#137)) ([58517ee](58517ee)) * testing framework, CI pipeline, and M0 gap fixes ([#64](#64)) ([f581749](f581749)) * wire all modules into observability system ([#97](#97)) ([f7a0617](f7a0617)) ### Bug Fixes * address Greptile post-merge review findings from PRs [#170](https://github.com/Aureliolo/ai-company/issues/170)-[#175](https://github.com/Aureliolo/ai-company/issues/175) ([#176](#176)) ([c5ca929](c5ca929)) * address post-merge review feedback from PRs [#164](https://github.com/Aureliolo/ai-company/issues/164)-[#167](https://github.com/Aureliolo/ai-company/issues/167) ([#170](#170)) ([3bf897a](3bf897a)), closes [#169](#169) * enforce strict mypy on test files ([#89](#89)) ([aeeff8c](aeeff8c)) * harden Docker sandbox, MCP bridge, and code runner ([#50](#50), [#53](#53)) ([d5e1b6e](d5e1b6e)) * harden git tools security + code quality improvements ([#150](#150)) ([000a325](000a325)) * harden subprocess cleanup, env filtering, and shutdown resilience ([#155](#155)) ([d1fe1fb](d1fe1fb)) * incorporate post-merge feedback + pre-PR review fixes ([#164](#164)) ([c02832a](c02832a)) * pre-PR review fixes for post-merge findings ([#183](#183)) ([26b3108](26b3108)) * strengthen immutability for BaseTool schema and ToolInvoker boundaries ([#117](#117)) ([7e5e861](7e5e861)) ### Performance * harden non-inferable principle implementation ([#195](#195)) ([02b5f4e](02b5f4e)), closes [#188](#188) ### Refactoring * adopt NotBlankStr across all models ([#108](#108)) ([#120](#120)) ([ef89b90](ef89b90)) * extract _SpendingTotals base class from spending summary models ([#111](#111)) ([2f39c1b](2f39c1b)) * harden BudgetEnforcer with error handling, validation extraction, and review fixes ([#182](#182)) ([c107bf9](c107bf9)) * harden personality profiles, department validation, and template rendering ([#158](#158)) ([10b2299](10b2299)) * pre-PR review improvements for ExecutionLoop + ReAct loop ([#124](#124)) ([8dfb3c0](8dfb3c0)) * split events.py into per-domain event modules ([#136](#136)) ([e9cba89](e9cba89)) ### Documentation * add ADR-001 memory layer evaluation and selection ([#178](#178)) ([db3026f](db3026f)), closes [#39](#39) * add agent scaling research findings to DESIGN_SPEC ([#145](#145)) ([57e487b](57e487b)) * add CLAUDE.md, contributing guide, and dev documentation ([#65](#65)) ([55c1025](55c1025)), closes [#54](#54) * add crash recovery, sandboxing, analytics, and testing decisions ([#127](#127)) ([5c11595](5c11595)) * address external review feedback with MVP scope and new protocols ([#128](#128)) ([3b30b9a](3b30b9a)) * expand design spec with pluggable strategy protocols ([#121](#121)) ([6832db6](6832db6)) * finalize 23 design decisions (ADR-002) ([#190](#190)) ([8c39742](8c39742)) * update project docs for M2.5 conventions and add docs-consistency review agent ([#114](#114)) ([99766ee](99766ee)) ### Tests * add e2e single agent integration tests ([#24](#24)) ([#156](#156)) ([f566fb4](f566fb4)) * add provider adapter integration tests ([#90](#90)) ([40a61f4](40a61f4)) ### CI/CD * add Release Please for automated versioning and GitHub Releases ([#278](#278)) ([a488758](a488758)) * bump actions/checkout from 4 to 6 ([#95](#95)) ([1897247](1897247)) * bump actions/upload-artifact from 4 to 7 ([#94](#94)) ([27b1517](27b1517)) * harden CI/CD pipeline ([#92](#92)) ([ce4693c](ce4693c)) * split vulnerability scans into critical-fail and high-warn tiers ([#277](#277)) ([aba48af](aba48af)) ### Maintenance * add /worktree skill for parallel worktree management ([#171](#171)) ([951e337](951e337)) * add design spec context loading to research-link skill ([8ef9685](8ef9685)) * add post-merge-cleanup skill ([#70](#70)) ([f913705](f913705)) * add pre-pr-review skill and update CLAUDE.md ([#103](#103)) ([92e9023](92e9023)) * add research-link skill and rename skill files to SKILL.md ([#101](#101)) ([651c577](651c577)) * bump aiosqlite from 0.21.0 to 0.22.1 ([#191](#191)) ([3274a86](3274a86)) * bump pyyaml from 6.0.2 to 6.0.3 in the minor-and-patch group ([#96](#96)) ([0338d0c](0338d0c)) * bump ruff from 0.15.4 to 0.15.5 ([a49ee46](a49ee46)) * fix M0 audit items ([#66](#66)) ([c7724b5](c7724b5)) * pin setup-uv action to full SHA ([#281](#281)) ([4448002](4448002)) * post-audit cleanup — PEP 758, loggers, bug fixes, refactoring, tests, hookify rules ([#148](#148)) ([c57a6a9](c57a6a9)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).
🤖 I have created a release *beep* *boop* --- ## [0.1.0](v0.0.0...v0.1.0) (2026-03-11) ### Features * add autonomy levels and approval timeout policies ([#42](#42), [#126](#126)) ([#197](#197)) ([eecc25a](eecc25a)) * add CFO cost optimization service with anomaly detection, reports, and approval decisions ([#186](#186)) ([a7fa00b](a7fa00b)) * add code quality toolchain (ruff, mypy, pre-commit, dependabot) ([#63](#63)) ([36681a8](36681a8)) * add configurable cost tiers and subscription/quota-aware tracking ([#67](#67)) ([#185](#185)) ([9baedfa](9baedfa)) * add container packaging, Docker Compose, and CI pipeline ([#269](#269)) ([435bdfe](435bdfe)), closes [#267](#267) * add coordination error taxonomy classification pipeline ([#146](#146)) ([#181](#181)) ([70c7480](70c7480)) * add cost-optimized, hierarchical, and auction assignment strategies ([#175](#175)) ([ce924fa](ce924fa)), closes [#173](#173) * add design specification, license, and project setup ([8669a09](8669a09)) * add env var substitution and config file auto-discovery ([#77](#77)) ([7f53832](7f53832)) * add FastestStrategy routing + vendor-agnostic cleanup ([#140](#140)) ([09619cb](09619cb)), closes [#139](#139) * add HR engine and performance tracking ([#45](#45), [#47](#47)) ([#193](#193)) ([2d091ea](2d091ea)) * add issue auto-search and resolution verification to PR review skill ([#119](#119)) ([deecc39](deecc39)) * add mandatory JWT + API key authentication ([#256](#256)) ([c279cfe](c279cfe)) * add memory retrieval, ranking, and context injection pipeline ([#41](#41)) ([873b0aa](873b0aa)) * add pluggable MemoryBackend protocol with models, config, and events ([#180](#180)) ([46cfdd4](46cfdd4)) * add pluggable MemoryBackend protocol with models, config, and events ([#32](#32)) ([46cfdd4](46cfdd4)) * add pluggable output scan response policies ([#263](#263)) ([b9907e8](b9907e8)) * add pluggable PersistenceBackend protocol with SQLite implementation ([#36](#36)) ([f753779](f753779)) * add progressive trust and promotion/demotion subsystems ([#43](#43), [#49](#49)) ([3a87c08](3a87c08)) * add retry handler, rate limiter, and provider resilience ([#100](#100)) ([b890545](b890545)) * add SecOps security agent with rule engine, audit log, and ToolInvoker integration ([#40](#40)) ([83b7b6c](83b7b6c)) * add shared org memory and memory consolidation/archival ([#125](#125), [#48](#48)) ([4a0832b](4a0832b)) * design unified provider interface ([#86](#86)) ([3e23d64](3e23d64)) * expand template presets, rosters, and add inheritance ([#80](#80), [#81](#81), [#84](#84)) ([15a9134](15a9134)) * implement agent runtime state vs immutable config split ([#115](#115)) ([4cb1ca5](4cb1ca5)) * implement AgentEngine core orchestrator ([#11](#11)) ([#143](#143)) ([f2eb73a](f2eb73a)) * implement AuditRepository for security audit log persistence ([#279](#279)) ([94bc29f](94bc29f)) * implement basic tool system (registry, invocation, results) ([#15](#15)) ([c51068b](c51068b)) * implement built-in file system tools ([#18](#18)) ([325ef98](325ef98)) * implement communication foundation — message bus, dispatcher, and messenger ([#157](#157)) ([8e71bfd](8e71bfd)) * implement company template system with 7 built-in presets ([#85](#85)) ([cbf1496](cbf1496)) * implement conflict resolution protocol ([#122](#122)) ([#166](#166)) ([e03f9f2](e03f9f2)) * implement core entity and role system models ([#69](#69)) ([acf9801](acf9801)) * implement crash recovery with fail-and-reassign strategy ([#149](#149)) ([e6e91ed](e6e91ed)) * implement engine extensions — Plan-and-Execute loop and call categorization ([#134](#134), [#135](#135)) ([#159](#159)) ([9b2699f](9b2699f)) * implement enterprise logging system with structlog ([#73](#73)) ([2f787e5](2f787e5)) * implement graceful shutdown with cooperative timeout strategy ([#130](#130)) ([6592515](6592515)) * implement hierarchical delegation and loop prevention ([#12](#12), [#17](#17)) ([6be60b6](6be60b6)) * implement LiteLLM driver and provider registry ([#88](#88)) ([ae3f18b](ae3f18b)), closes [#4](#4) * implement LLM decomposition strategy and workspace isolation ([#174](#174)) ([aa0eefe](aa0eefe)) * implement meeting protocol system ([#123](#123)) ([ee7caca](ee7caca)) * implement message and communication domain models ([#74](#74)) ([560a5d2](560a5d2)) * implement model routing engine ([#99](#99)) ([d3c250b](d3c250b)) * implement parallel agent execution ([#22](#22)) ([#161](#161)) ([65940b3](65940b3)) * implement per-call cost tracking service ([#7](#7)) ([#102](#102)) ([c4f1f1c](c4f1f1c)) * implement personality injection and system prompt construction ([#105](#105)) ([934dd85](934dd85)) * implement single-task execution lifecycle ([#21](#21)) ([#144](#144)) ([c7e64e4](c7e64e4)) * implement subprocess sandbox for tool execution isolation ([#131](#131)) ([#153](#153)) ([3c8394e](3c8394e)) * implement task assignment subsystem with pluggable strategies ([#172](#172)) ([c7f1b26](c7f1b26)), closes [#26](#26) [#30](#30) * implement task decomposition and routing engine ([#14](#14)) ([9c7fb52](9c7fb52)) * implement Task, Project, Artifact, Budget, and Cost domain models ([#71](#71)) ([81eabf1](81eabf1)) * implement tool permission checking ([#16](#16)) ([833c190](833c190)) * implement YAML config loader with Pydantic validation ([#59](#59)) ([ff3a2ba](ff3a2ba)) * implement YAML config loader with Pydantic validation ([#75](#75)) ([ff3a2ba](ff3a2ba)) * initialize project with uv, hatchling, and src layout ([39005f9](39005f9)) * initialize project with uv, hatchling, and src layout ([#62](#62)) ([39005f9](39005f9)) * Litestar REST API, WebSocket feed, and approval queue (M6) ([#189](#189)) ([29fcd08](29fcd08)) * make TokenUsage.total_tokens a computed field ([#118](#118)) ([c0bab18](c0bab18)), closes [#109](#109) * parallel tool execution in ToolInvoker.invoke_all ([#137](#137)) ([58517ee](58517ee)) * testing framework, CI pipeline, and M0 gap fixes ([#64](#64)) ([f581749](f581749)) * wire all modules into observability system ([#97](#97)) ([f7a0617](f7a0617)) ### Bug Fixes * address Greptile post-merge review findings from PRs [#170](https://github.com/Aureliolo/ai-company/issues/170)-[#175](https://github.com/Aureliolo/ai-company/issues/175) ([#176](#176)) ([c5ca929](c5ca929)) * address post-merge review feedback from PRs [#164](https://github.com/Aureliolo/ai-company/issues/164)-[#167](https://github.com/Aureliolo/ai-company/issues/167) ([#170](#170)) ([3bf897a](3bf897a)), closes [#169](#169) * enforce strict mypy on test files ([#89](#89)) ([aeeff8c](aeeff8c)) * harden Docker sandbox, MCP bridge, and code runner ([#50](#50), [#53](#53)) ([d5e1b6e](d5e1b6e)) * harden git tools security + code quality improvements ([#150](#150)) ([000a325](000a325)) * harden subprocess cleanup, env filtering, and shutdown resilience ([#155](#155)) ([d1fe1fb](d1fe1fb)) * incorporate post-merge feedback + pre-PR review fixes ([#164](#164)) ([c02832a](c02832a)) * pre-PR review fixes for post-merge findings ([#183](#183)) ([26b3108](26b3108)) * resolve circular imports, bump litellm, fix release tag format ([#286](#286)) ([a6659b5](a6659b5)) * strengthen immutability for BaseTool schema and ToolInvoker boundaries ([#117](#117)) ([7e5e861](7e5e861)) ### Performance * harden non-inferable principle implementation ([#195](#195)) ([02b5f4e](02b5f4e)), closes [#188](#188) ### Refactoring * adopt NotBlankStr across all models ([#108](#108)) ([#120](#120)) ([ef89b90](ef89b90)) * extract _SpendingTotals base class from spending summary models ([#111](#111)) ([2f39c1b](2f39c1b)) * harden BudgetEnforcer with error handling, validation extraction, and review fixes ([#182](#182)) ([c107bf9](c107bf9)) * harden personality profiles, department validation, and template rendering ([#158](#158)) ([10b2299](10b2299)) * pre-PR review improvements for ExecutionLoop + ReAct loop ([#124](#124)) ([8dfb3c0](8dfb3c0)) * split events.py into per-domain event modules ([#136](#136)) ([e9cba89](e9cba89)) ### Documentation * add ADR-001 memory layer evaluation and selection ([#178](#178)) ([db3026f](db3026f)), closes [#39](#39) * add agent scaling research findings to DESIGN_SPEC ([#145](#145)) ([57e487b](57e487b)) * add CLAUDE.md, contributing guide, and dev documentation ([#65](#65)) ([55c1025](55c1025)), closes [#54](#54) * add crash recovery, sandboxing, analytics, and testing decisions ([#127](#127)) ([5c11595](5c11595)) * address external review feedback with MVP scope and new protocols ([#128](#128)) ([3b30b9a](3b30b9a)) * expand design spec with pluggable strategy protocols ([#121](#121)) ([6832db6](6832db6)) * finalize 23 design decisions (ADR-002) ([#190](#190)) ([8c39742](8c39742)) * update project docs for M2.5 conventions and add docs-consistency review agent ([#114](#114)) ([99766ee](99766ee)) ### Tests * add e2e single agent integration tests ([#24](#24)) ([#156](#156)) ([f566fb4](f566fb4)) * add provider adapter integration tests ([#90](#90)) ([40a61f4](40a61f4)) ### CI/CD * add Release Please for automated versioning and GitHub Releases ([#278](#278)) ([a488758](a488758)) * bump actions/checkout from 4 to 6 ([#95](#95)) ([1897247](1897247)) * bump actions/upload-artifact from 4 to 7 ([#94](#94)) ([27b1517](27b1517)) * bump anchore/scan-action from 6.5.1 to 7.3.2 ([#271](#271)) ([80a1c15](80a1c15)) * bump docker/build-push-action from 6.19.2 to 7.0.0 ([#273](#273)) ([dd0219e](dd0219e)) * bump docker/login-action from 3.7.0 to 4.0.0 ([#272](#272)) ([33d6238](33d6238)) * bump docker/metadata-action from 5.10.0 to 6.0.0 ([#270](#270)) ([baee04e](baee04e)) * bump docker/setup-buildx-action from 3.12.0 to 4.0.0 ([#274](#274)) ([5fc06f7](5fc06f7)) * bump sigstore/cosign-installer from 3.9.1 to 4.1.0 ([#275](#275)) ([29dd16c](29dd16c)) * harden CI/CD pipeline ([#92](#92)) ([ce4693c](ce4693c)) * split vulnerability scans into critical-fail and high-warn tiers ([#277](#277)) ([aba48af](aba48af)) ### Maintenance * add /worktree skill for parallel worktree management ([#171](#171)) ([951e337](951e337)) * add design spec context loading to research-link skill ([8ef9685](8ef9685)) * add post-merge-cleanup skill ([#70](#70)) ([f913705](f913705)) * add pre-pr-review skill and update CLAUDE.md ([#103](#103)) ([92e9023](92e9023)) * add research-link skill and rename skill files to SKILL.md ([#101](#101)) ([651c577](651c577)) * bump aiosqlite from 0.21.0 to 0.22.1 ([#191](#191)) ([3274a86](3274a86)) * bump pyyaml from 6.0.2 to 6.0.3 in the minor-and-patch group ([#96](#96)) ([0338d0c](0338d0c)) * bump ruff from 0.15.4 to 0.15.5 ([a49ee46](a49ee46)) * fix M0 audit items ([#66](#66)) ([c7724b5](c7724b5)) * **main:** release ai-company 0.1.1 ([#282](#282)) ([2f4703d](2f4703d)) * pin setup-uv action to full SHA ([#281](#281)) ([4448002](4448002)) * post-audit cleanup — PEP 758, loggers, bug fixes, refactoring, tests, hookify rules ([#148](#148)) ([c57a6a9](c57a6a9)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). --------- Signed-off-by: Aurelio <19254254+Aureliolo@users.noreply.github.com>
Summary
hr_repositories.py)except Exceptionblocks to specific types across 5+ locationslogger.warning()before raises per CLAUDE.md rulesNotBlankStrtype violations, added model validatorsCloses #45, closes #47
Test plan
uv run pytest tests/ -n auto)uv run mypy src/ tests/)test_circuit_breaker_after_max_errors(not related to this PR)Review coverage
Pre-reviewed by 9 agents: code-reviewer, python-reviewer, pr-test-analyzer, silent-failure-hunter, comment-analyzer, type-design-analyzer, logging-audit, resilience-audit, docs-consistency. 57 findings triaged and implemented.
🤖 Generated with Claude Code