feat: add progressive trust and promotion/demotion subsystems by Aureliolo · Pull Request #196 · Aureliolo/synthorg

Aureliolo · 2026-03-10T10:43:47Z

Summary

Progressive Trust (security/trust/): 4 pluggable strategies (disabled, weighted, per-category, milestone) behind TrustStrategy protocol, with trust state management, decay detection, re-verification, and defense-in-depth elevated gate enforcement
Promotion/Demotion (hr/promotion/): criteria evaluation, approval decisions (seniority-based), model mapping, cooldown enforcement, and trust re-evaluation integration behind pluggable protocols (PromotionCriteriaStrategy, PromotionApprovalStrategy, ModelMappingStrategy)
Review hardening: 33 findings from 10 parallel review agents addressed — computed fields, cross-field validators, structured logging on all error/state-transition paths, config tightening, defense-in-depth gates

Closes #43, closes #49

Changes

Source (17 files)

security/trust/: config models, 4 strategy implementations, service orchestrator, protocol, enums, errors, domain models
hr/promotion/: config models, threshold evaluator, seniority approval strategy, model mapping, service orchestrator, protocols, domain models
observability/events/: trust and promotion event constants

Tests (10 files, 199 tests in trust+promotion)

Full coverage for all strategies, service methods, config validation, edge cases
23 new tests added during review for coverage gaps (decay, approval, level helpers, re-verification interval, time/history criteria)

Docs

DESIGN_SPEC.md: §15.3 project structure updated with trust/ and promotion/ subsystems
CLAUDE.md: package structure and event constants updated

Test plan

uv run ruff check src/ tests/ — all passed
uv run ruff format --check src/ tests/ — all formatted
uv run mypy src/ tests/ — 0 errors (806 files)
uv run pytest tests/ -n auto --cov=ai_company --cov-fail-under=80 — 6399 passed, 94.98% coverage
Pre-commit hooks pass (commitizen, gitleaks, ruff, trailing whitespace)

Review coverage

Pre-reviewed by 10 agents (code-reviewer, python-reviewer, pr-test-analyzer, silent-failure-hunter, comment-analyzer, type-design-analyzer, logging-audit, resilience-audit, security-reviewer, docs-consistency). 33 findings addressed, 0 skipped.

github-actions · 2026-03-10T10:44:00Z

Dependency Review

✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.

Scanned Files

None

coderabbitai · 2026-03-10T10:44:12Z

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 76e5873a-80e4-41fd-90e4-814f2f41ee1b

📥 Commits

Reviewing files that changed from the base of the PR and between cfeba92 and 6c649a6.

📒 Files selected for processing (56)

CLAUDE.md
DESIGN_SPEC.md
README.md
src/ai_company/api/approval_store.py
src/ai_company/config/defaults.py
src/ai_company/config/schema.py
src/ai_company/hr/enums.py
src/ai_company/hr/errors.py
src/ai_company/hr/promotion/__init__.py
src/ai_company/hr/promotion/approval_protocol.py
src/ai_company/hr/promotion/config.py
src/ai_company/hr/promotion/criteria_protocol.py
src/ai_company/hr/promotion/model_mapping_protocol.py
src/ai_company/hr/promotion/models.py
src/ai_company/hr/promotion/seniority_approval_strategy.py
src/ai_company/hr/promotion/seniority_model_mapping.py
src/ai_company/hr/promotion/service.py
src/ai_company/hr/promotion/threshold_evaluator.py
src/ai_company/hr/registry.py
src/ai_company/observability/events/hr.py
src/ai_company/observability/events/promotion.py
src/ai_company/observability/events/trust.py
src/ai_company/security/trust/__init__.py
src/ai_company/security/trust/config.py
src/ai_company/security/trust/disabled_strategy.py
src/ai_company/security/trust/enums.py
src/ai_company/security/trust/errors.py
src/ai_company/security/trust/levels.py
src/ai_company/security/trust/milestone_strategy.py
src/ai_company/security/trust/models.py
src/ai_company/security/trust/per_category_strategy.py
src/ai_company/security/trust/protocol.py
src/ai_company/security/trust/service.py
src/ai_company/security/trust/weighted_strategy.py
tests/unit/config/conftest.py
tests/unit/hr/promotion/__init__.py
tests/unit/hr/promotion/conftest.py
tests/unit/hr/promotion/test_config.py
tests/unit/hr/promotion/test_models.py
tests/unit/hr/promotion/test_seniority_approval_strategy.py
tests/unit/hr/promotion/test_seniority_model_mapping.py
tests/unit/hr/promotion/test_service.py
tests/unit/hr/promotion/test_threshold_evaluator.py
tests/unit/hr/test_enums.py
tests/unit/hr/test_registry.py
tests/unit/observability/test_events.py
tests/unit/security/trust/__init__.py
tests/unit/security/trust/conftest.py
tests/unit/security/trust/test_config.py
tests/unit/security/trust/test_disabled_strategy.py
tests/unit/security/trust/test_enums.py
tests/unit/security/trust/test_milestone_strategy.py
tests/unit/security/trust/test_models.py
tests/unit/security/trust/test_per_category_strategy.py
tests/unit/security/trust/test_service.py
tests/unit/security/trust/test_weighted_strategy.py

📝 Walkthrough

Summary by CodeRabbit

New Features
- Promotion/demotion workflow for agents (evaluate, request, approve, apply) and automatic model-tier adjustments.
- Progressive trust system with multiple strategies (disabled/weighted/per-category/milestone), re‑verification, decay and configurable gates.
Observability
- Structured logging/events added for promotion and trust lifecycles.
Configuration
- New top-level default/config fields for promotion and trust.
Documentation
- README updated to list Promotion/Demotion and Progressive Trust.
Tests
- Extensive unit tests covering both subsystems and strategies.

Walkthrough

Introduces two new subsystems: a promotion/demotion engine (configs, protocols, evaluators, approval/model‑mapping strategies, orchestrator) and a progressive trust framework (configs, enums, protocols, four strategies, orchestrator). Integrates both into root config, registry updates, observability event constants, and extensive unit tests.

Changes

Cohort / File(s)	Summary
Core HR `src/ai_company/hr/enums.py`, `src/ai_company/hr/errors.py`, `src/ai_company/hr/registry.py`	Added `PromotionDirection`, lifecycle events `PROMOTED`/`DEMOTED`, promotion-specific errors, and `AgentRegistryService.update_identity` with allowed-field checks and new HR event constant.
Promotion API & Models `src/ai_company/hr/promotion/__init__.py`, `src/ai_company/hr/promotion/models.py`, `src/ai_company/hr/promotion/config.py`	New promotion package exposing frozen Pydantic configs and domain models (CriterionResult, PromotionEvaluation, PromotionRequest/Record, approval decision) and top-level `PromotionConfig`.
Promotion Protocols & Implementations `src/ai_company/hr/promotion/criteria_protocol.py`, `.../approval_protocol.py`, `.../model_mapping_protocol.py`, `.../threshold_evaluator.py`, `.../seniority_approval_strategy.py`, `.../seniority_model_mapping.py`	Added pluggable protocols (criteria, approval, model-mapping) and concrete implementations: `ThresholdEvaluator`, `SeniorityApprovalStrategy`, `SeniorityModelMapping`.
Promotion Orchestrator & Tests `src/ai_company/hr/promotion/service.py`, `tests/unit/hr/promotion/*`	New `PromotionService` orchestrating evaluate/request/apply flows (cooldown, approval store integration, model mapping, history); many unit tests and fixtures covering flows and edge cases.
Trust API & Models `src/ai_company/security/trust/__init__.py`, `.../models.py`, `.../enums.py`, `.../errors.py`, `.../levels.py`	New trust package with enums, errors, level ordering/transition constants, and immutable domain models (TrustState, TrustEvaluationResult, TrustChangeRecord).
Trust Protocols & Strategies `src/ai_company/security/trust/protocol.py`, `.../disabled_strategy.py`, `.../weighted_strategy.py`, `.../per_category_strategy.py`, `.../milestone_strategy.py`, `.../config.py`	Added `TrustStrategy` protocol and four concrete strategies (disabled, weighted, per-category, milestone) plus comprehensive `TrustConfig` validation and strategy-specific config models.
Trust Orchestrator & Tests `src/ai_company/security/trust/service.py`, `tests/unit/security/trust/*`	Added `TrustService` managing per-agent trust state, evaluations, change application (with optional approvals), decay checks, history, and many unit tests.
Config & Defaults `src/ai_company/config/schema.py`, `src/ai_company/config/defaults.py`, `tests/unit/config/conftest.py`	Root config extended with `trust` and `promotion` fields; defaults include empty `trust`/`promotion` entries; tests updated to provide fixtures.
Observability `src/ai_company/observability/events/promotion.py`, `.../trust.py`, `.../hr.py`, `tests/unit/observability/test_events.py`	Added structured event constants for promotion and trust flows; HR registry update event added; observability tests updated to discover new modules.
Tests & Test Helpers `tests/unit/...` (many files)	Extensive unit tests added for promotion and trust subsystems (configs, models, strategies, services) with fixtures and helpers.
Docs & Metadata `DESIGN_SPEC.md`, `CLAUDE.md`, `README.md`, `pyproject.toml`, `requirements.txt`	Design docs and README updated to document promotion and progressive trust; manifest/metadata adjusted.
Minor Typing/Imports `src/ai_company/api/approval_store.py`	Moved `Callable` import from TYPE_CHECKING to top-level (minor import change).

Sequence Diagram(s)

sequenceDiagram
    rect rgba(200,200,255,0.5)
    participant Client
    participant PromotionService
    participant Registry as AgentRegistry
    participant Criteria as PromotionCriteria
    participant Approval as PromotionApproval
    participant ModelMap as ModelMapping
    end

    Client->>PromotionService: evaluate_promotion(agent_id)
    PromotionService->>Registry: get_identity(agent_id)
    Registry-->>PromotionService: AgentIdentity
    PromotionService->>Criteria: evaluate(agent_id,current→target,snapshot)
    Criteria-->>PromotionService: PromotionEvaluation
    PromotionService-->>Client: PromotionEvaluation

    Client->>PromotionService: request_promotion(agent_id,evaluation)
    PromotionService->>Approval: decide(evaluation,agent_identity)
    Approval-->>PromotionService: PromotionApprovalDecision
    alt auto-approved
        PromotionService-->>Client: PromotionRequest(status=APPROVED)
    else needs-human
        PromotionService->>Approval: create approval item (approval_store)
        PromotionService-->>Client: PromotionRequest(status=PENDING,approval_id)
    end

    Client->>PromotionService: apply_promotion(request)
    PromotionService->>ModelMap: resolve_model(agent_identity,new_level)
    ModelMap-->>PromotionService: new_model_id | None
    PromotionService->>Registry: update_identity(agent_id, level, model_id?)
    Registry-->>PromotionService: updated AgentIdentity
    PromotionService-->>Client: PromotionRecord

sequenceDiagram
    rect rgba(200,255,200,0.5)
    participant Client
    participant TrustService
    participant Strategy as TrustStrategy
    participant ApprovalStore
    participant Registry as AgentRegistry
    end

    Client->>TrustService: initialize_agent(agent_id)
    TrustService->>Strategy: initial_state(agent_id)
    Strategy-->>TrustService: TrustState
    TrustService-->>Client: TrustState

    Client->>TrustService: evaluate_agent(agent_id,snapshot)
    TrustService->>Strategy: evaluate(agent_id,current_state,snapshot)
    Strategy-->>TrustService: TrustEvaluationResult
    TrustService-->>Client: TrustEvaluationResult

    Client->>TrustService: apply_trust_change(agent_id,result)
    TrustService->>TrustService: _enforce_elevated_gate(result)
    alt requires approval
        TrustService->>ApprovalStore: create approval item
    end
    TrustService->>Registry: (optional) update related agent info
    TrustService-->>Client: TrustChangeRecord | None

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related PRs

feat: add progressive trust and promotion/demotion subsystems #196 — Implements similar progressive trust and promotion/demotion subsystems; likely overlaps on config/models/strategy/service surfaces.
feat: implement tool permission checking (#16) #147 — Adds/enhances ToolAccessLevel and related permission semantics used by trust strategies and transitions.
feat: implement YAML config loader with Pydantic validation #75 — Modifies configuration loader/defaults and RootConfig schema; related to how trust and promotion config fields are integrated.

✨ Finishing Touches

📝 Generate docstrings (stacked PR)
📝 Generate docstrings (commit on current branch)

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch feat/progressive-trust-and-promotion

✨ Simplify code

Create PR with simplified code
Commit simplified code in branch feat/progressive-trust-and-promotion

Tip

Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs).
Share your feedback on Discord.

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

greptile-apps · 2026-03-10T10:48:37Z

Greptile Summary

This PR introduces two substantial new subsystems — a four-strategy progressive trust engine (security/trust/) and a pluggable promotion/demotion engine (hr/promotion/) — along with their full test suites (199 tests, 94.98% coverage) and observability event constants. Both subsystems follow the project's established patterns: frozen Pydantic v2 models, structured logging via event constants, defense-in-depth approval gates, and pluggable protocol-based strategies.

The pre-review addressed all 33 previously flagged findings. Three new issues remain:

WeightedTrustStrategy only promotes — _score_to_level iterates only the upward TRANSITION_KEYS, so an agent using this strategy can never be recommended for demotion. The class docstring says "promotes/demotes" but the implementation is promotion-only. This makes TrustService.check_decay a silent no-op for any agent on the weighted strategy. The docstring should either be corrected or a downward path added.
apply_promotion uses PROMOTION_APPLIED (a success event) in a WARNING log for trust re-evaluation failure (service.py line 471) — structured log queries for PROMOTION_APPLIED will ingest this warning as a successful promotion, misreporting failure cases. PROMOTION_EVALUATE_FAILED is the correct constant.
apply_promotion uses PROMOTION_REQUESTED (a creation event) in a WARNING log for non-approved / non-rejected status (service.py line 381) — when request.status is PENDING, the log event signals a new request was created rather than a failed application attempt. PROMOTION_REJECTED should be used for this fallback path.

Confidence Score: 4/5

Safe to merge with minor fixes — the three remaining issues are logging semantic errors and a docstring/design gap, not security or data-integrity problems.
The security-critical paths (elevated gate double-enforcement, eligibility guard, approval store verification) are correctly implemented. The weighted strategy demotion gap is a functional limitation rather than a runtime error. The two wrong event constants in apply_promotion are observability/ops issues that don't affect correctness or safety, but they should be fixed before real production log analysis relies on these events.
src/ai_company/hr/promotion/service.py (two wrong log event constants) and src/ai_company/security/trust/weighted_strategy.py (promotion-only implementation vs. docstring claim of demotion support).

Important Files Changed

Filename	Overview
src/ai_company/security/trust/service.py	Trust service orchestrator — correctly implements defense-in-depth elevated gate (double enforcement in both `evaluate_agent` and `apply_trust_change`), in-memory state management, and decay timestamp ordering. No issues found.
src/ai_company/security/trust/weighted_strategy.py	Weighted trust strategy — `_score_to_level` only considers upward `TRANSITION_KEYS` transitions; no demotion path exists despite the class docstring claiming "promotes/demotes". `check_decay` is effectively a no-op for agents using this strategy.
src/ai_company/security/trust/milestone_strategy.py	Milestone trust strategy — `clean_history_days` filtering correctly skips windows wider than the configured horizon; `_check_decay` implements idle, error-rate, and re-verification decay. Previously flagged issues are resolved.
src/ai_company/hr/promotion/service.py	Promotion service — eligibility guard added in `request_promotion`. Two structured-logging semantic errors: `PROMOTION_REQUESTED` used on non-approved/non-rejected warning path (line 381), and `PROMOTION_APPLIED` used on trust re-evaluation failure warning (line 471).
src/ai_company/security/trust/config.py	Trust config — security invariant validators for `standard_to_elevated` cover all three strategy types; weights-sum and mutual-exclusion validators are thorough. No issues found.
src/ai_company/hr/promotion/threshold_evaluator.py	Threshold evaluator — N-of-M + required-criteria logic is clear and correct; separate `_DEFAULT_THRESHOLDS` / `_DEMOTION_THRESHOLDS` for directional evaluation is well-structured.
src/ai_company/hr/promotion/models.py	Promotion domain models — cross-field validators for direction consistency and model-change field consistency are well-implemented; `criteria_met_count` computed field avoids redundant stored state.
src/ai_company/observability/events/promotion.py	Promotion event constants — all necessary events are defined. Previously flagged `PROMOTION_APPROVED` dead-code has been removed.

Sequence Diagram

sequenceDiagram
    participant Caller
    participant PromotionService
    participant TrustService
    participant ApprovalStore
    participant Registry

    Note over Caller,Registry: Promotion Flow
    Caller->>PromotionService: evaluate_promotion(agent_id)
    PromotionService->>Registry: get(agent_id)
    PromotionService->>PromotionService: _next_level(identity.level)
    PromotionService->>PromotionService: criteria.evaluate(snapshot)
    PromotionService-->>Caller: PromotionEvaluation

    Caller->>PromotionService: request_promotion(agent_id, evaluation)
    PromotionService->>PromotionService: check eligibility + cooldown
    PromotionService->>PromotionService: approval.decide(evaluation)
    alt auto_approve
        PromotionService-->>Caller: PromotionRequest(APPROVED)
    else requires_human
        PromotionService->>ApprovalStore: add(ApprovalItem)
        PromotionService-->>Caller: PromotionRequest(PENDING)
    end

    Caller->>PromotionService: apply_promotion(request)
    PromotionService->>PromotionService: _verify_approval(request)
    PromotionService->>Registry: update_identity(level)
    PromotionService->>TrustService: evaluate_agent(snapshot) [best-effort]
    PromotionService-->>Caller: PromotionRecord

    Note over Caller,TrustService: Trust Evaluation Flow
    Caller->>TrustService: evaluate_agent(agent_id, snapshot)
    TrustService->>TrustService: strategy.evaluate(state, snapshot)
    TrustService->>TrustService: _enforce_elevated_gate(result)
    TrustService-->>Caller: TrustEvaluationResult

    Caller->>TrustService: apply_trust_change(agent_id, result)
    TrustService->>TrustService: _enforce_elevated_gate(result) [defense-in-depth]
    alt requires_human_approval
        TrustService->>ApprovalStore: add(ApprovalItem)
        TrustService-->>Caller: None (awaiting approval)
    else auto
        TrustService->>TrustService: update trust state
        TrustService-->>Caller: TrustChangeRecord
    end

_{Last reviewed commit: 6c649a6}

src/ai_company/security/trust/weighted_strategy.py

src/ai_company/hr/promotion/service.py

src/ai_company/observability/events/hr.py

Copilot

Pull request overview

This PR implements the progressive trust system (security/trust/) and promotion/demotion subsystem (hr/promotion/) as defined in DESIGN_SPEC §11.3 and §8.4. Both subsystems use pluggable protocol patterns with multiple strategy implementations, plus a comprehensive test suite of ~199 tests covering all strategies, service methods, config validation, and edge cases.

Changes:

Progressive Trust (security/trust/): 4 trust strategies (disabled, weighted, per-category, milestone), trust service orchestrator, config models, domain models, event constants
Promotion/Demotion (hr/promotion/): threshold evaluator, seniority approval strategy, model mapping, service orchestrator, config models, domain models, event constants
Infrastructure: AgentRegistryService.update_identity(), new HR/trust/promotion enums and errors, RootConfig extended with trust and promotion fields

Reviewed changes

Copilot reviewed 53 out of 53 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
`src/ai_company/security/trust/weighted_strategy.py`	Contains a critical bug in `error_factor` computation
`src/ai_company/security/trust/service.py`	`apply_trust_change` sets `last_promoted_at` even on demotions
`src/ai_company/observability/events/hr.py`	Duplicate `# ── Registry` section header
`src/ai_company/security/trust/config.py`	Trust config models with validation
`src/ai_company/security/trust/milestone_strategy.py`	Milestone + decay logic
`src/ai_company/security/trust/per_category_strategy.py`	Per-category trust tracks
`src/ai_company/security/trust/disabled_strategy.py`	Static trust strategy
`src/ai_company/hr/promotion/service.py`	Promotion orchestrator
`src/ai_company/hr/promotion/threshold_evaluator.py`	N-of-M criteria evaluation
`src/ai_company/hr/promotion/seniority_approval_strategy.py`	Approval decisions
`src/ai_company/hr/promotion/seniority_model_mapping.py`	Model tier resolution
`src/ai_company/hr/registry.py`	New `update_identity` method
`src/ai_company/config/schema.py`	RootConfig extended with trust/promotion
Test files	Comprehensive coverage for all new subsystems

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-10T10:49:47Z

src/ai_company/security/trust/weighted_strategy.py

+        # Error penalty: 1 - failure_rate
+        error_factor = 1.0
+        for window in snapshot.windows:
+            if window.data_point_count > 0 and window.success_rate is not None:
+                error_factor = 1.0 - (1.0 - window.success_rate)
+                break


The error_factor computation in _compute_score is a no-op: 1.0 - (1.0 - window.success_rate) simplifies to just window.success_rate, making error_factor identical to completion_factor. Both factors end up using window.success_rate, so the weighted trust score counts success rate twice (once under completion_rate weight and once under error_rate weight) and the intended error penalty is never applied. The formula should likely be computing an error rate factor differently — for example using window.tasks_failed / window.data_point_count — so that it actually penalizes failures distinctly from the success-rate criterion.

Copilot · 2026-03-10T10:49:48Z

src/ai_company/observability/events/hr.py

+# ── Registry ─────────────────────────────────────────────────────
+
+HR_REGISTRY_IDENTITY_UPDATED: Final[str] = "hr.registry.identity_updated"


The duplicate # ── Registry ───────────────────────────────────────────────────── section header was added at line 38 when HR_REGISTRY_IDENTITY_UPDATED was introduced, but there is already an identical section header at line 32 for the same "Registry" group. The new constant should either be placed under the existing # ── Registry block (lines 32–36) or the duplicate heading should be renamed to something more descriptive (e.g., # ── Registry: identity updates ─────────────────────────────────).

Copilot · 2026-03-10T10:49:48Z

src/ai_company/security/trust/service.py

+        updated = state.model_copy(
+            update={
+                "global_level": result.recommended_level,
+                "last_promoted_at": now,
+                "trust_score": result.score,
+            },
+        )


In apply_trust_change, the state update always sets last_promoted_at to now regardless of whether the change is a promotion or a demotion. This means a demotion will incorrectly update last_promoted_at, which could affect demotion-tracking logic or reporting (e.g., if last_promoted_at is used to compute how long since the last promotion). A separate last_demoted_at field or a conditional update based on the direction of change (promotion vs demotion) should be used.

coderabbitai

Actionable comments posted: 28

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/ai_company/config/schema.py (1)

467-498: 🧹 Nitpick | 🔵 Trivial

Update the RootConfig docstring to include new attributes.

The docstring's Attributes section lists all configuration fields but is missing the newly added trust and promotion attributes.

Proposed docstring update

         mcp: MCP bridge configuration.
         security: Security subsystem configuration.
+        trust: Progressive trust configuration.
+        promotion: Promotion/demotion configuration.
     """

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/ai_company/config/schema.py` around lines 467 - 498, The RootConfig
docstring is out of date and missing the new attributes; update the Attributes
section of the RootConfig class docstring to include brief entries for the trust
and promotion fields (e.g., "trust: Trust configuration." and "promotion:
Promotion rules/configuration.") so the docstring accurately documents all
top-level config fields and matches the model schema.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@DESIGN_SPEC.md`:
- Around line 2860-2869: The DESIGN_SPEC.md implementation-status callouts are
inconsistent: you added concrete packages for hr/promotion/ and security/trust/
but left earlier status notes claiming promotions/demotions and M7
security/progressive trust are unimplemented; update those earlier status
callouts to reflect the new implementations and revise the implementation-status
text around the promotion subsystem and M7 security/trust to mark them
implemented (or document the exact deviations), including the same fixes in the
other referenced section(s) noted (around the later block for 3102-3112); also
add the required user-alert language that any deviation from DESIGN_SPEC.md must
be surfaced to the user with an explanation so they can choose to proceed or
update the spec, rather than allowing silent divergence.

In `@src/ai_company/hr/promotion/config.py`:
- Around line 77-90: The seniority_model_map field is currently a plain dict (in
the config model with frozen=True), allowing post-validation mutation that
circumvents _validate_model_map_keys; change the construction to deep-copy the
incoming dict and wrap it in a read-only MappingProxyType before assignment so
callers cannot mutate it afterwards, e.g., perform copy.deepcopy on the provided
mapping during model validation/initialization (within _validate_model_map_keys
or a pre/post validator) and replace self.seniority_model_map with
MappingProxyType(deepcopied_map) after validating keys against SeniorityLevel
values; ensure the model remains frozen and that external callers only see an
immutable mapping.

In `@src/ai_company/hr/promotion/models.py`:
- Around line 180-193: The validator _validate_model_fields should also reject
no-op changes where model_changed is True but old_model_id == new_model_id;
update the logic in _validate_model_fields to (1) keep the existing checks that
model_changed=True requires both old_model_id and new_model_id not None and
model_changed=False requires both IDs be None, and (2) add a new check when
model_changed is True that raises ValueError if self.old_model_id ==
self.new_model_id (with a clear message like "model_changed=True requires
old_model_id and new_model_id to differ") so identical IDs are not accepted.

In `@src/ai_company/hr/promotion/service.py`:
- Around line 142-147: The code is logging errors using the
PROMOTION_EVALUATE_START event which is misleading; replace those usages with a
dedicated error/failure event constant (e.g. PROMOTION_EVALUATE_ERROR or
PROMOTION_EVALUATE_FAILURE) and update the logger calls that currently pass
PROMOTION_EVALUATE_START (locations around the PromotionError raises and the
logger.warning lines) to use the new constant while keeping the same metadata
(agent_id, error=msg); ensure any other occurrences noted (around lines where
PromotionError is raised) are updated consistently and add the new constant to
the module where event constants are defined.
- Around line 463-516: Move the self._approval_store None check to the top of
_create_approval (before creating approval_id, now, or instantiating
ApprovalItem) so you bail early when no store is configured; specifically, in
_create_approval check self._approval_store is not None and raise
PromotionError/log PROMOTION_APPROVAL_SUBMITTED if missing, then proceed to
generate approval_id, now and construct the ApprovalItem (class ApprovalItem)
and call self._approval_store.add.
- Around line 414-419: The call to
self._trust_service.evaluate_agent(request.agent_id, snapshot) currently
discards the returned TrustEvaluationResult; either explicitly document this
side-effect-only use or consume the result—capture the return value (e.g.,
result = await self._trust_service.evaluate_agent(...)) and then log it via the
existing logger or attach fields from result to the promotion record created in
this service so the evaluation outcome is visible; if you prefer to keep it
side-effect-only, add a concise comment next to the call stating that
evaluate_agent updates internal state and its return value is intentionally
ignored.

In `@src/ai_company/hr/promotion/threshold_evaluator.py`:
- Around line 128-192: The _evaluate_criteria method currently hardcodes three
criteria (quality_score, success_rate, tasks_completed) inline, making extension
difficult; refactor by extracting each criterion into reusable evaluators and a
registry: create small functions or callable objects (e.g.,
evaluate_quality_score, evaluate_success_rate, evaluate_tasks_completed) that
accept (snapshot, thresholds, direction) and return CriterionResult, then
replace the inline logic in _evaluate_criteria with a loop over a registered
list/dict of these evaluator callables (e.g., CRITERION_REGISTRY) so
adding/removing criteria only requires registering a new evaluator rather than
modifying _evaluate_criteria itself. Ensure names like AgentPerformanceSnapshot,
PromotionDirection, CriterionResult and thresholds keys
("quality_score","success_rate","tasks_completed") are used by the evaluators
for compatibility.
- Around line 154-177: The aggregation strategy for windowed metrics in
threshold_evaluator.py is inconsistent and undocumented: the success_rate loop
in the snapshot handling (searching snapshot.windows for the first
window.success_rate) implicitly picks the most recent non-None window, while the
tasks_completed loop uses max(...) to pick peak performance; add concise inline
comments above the success_rate aggregation (near the loop that picks the first
non-None value) stating "use most recent window value for success_rate" and
above the tasks_completed aggregation stating "use peak value across windows for
tasks_completed" so readers can understand the intended rationale; reference the
success_rate aggregation block and the total_tasks / tasks_completed max(...)
computation when adding the comments.

In `@src/ai_company/observability/events/hr.py`:
- Around line 38-40: The file defines HR_REGISTRY_IDENTITY_UPDATED under a
duplicated "Registry" section header; remove the second header and move/keep the
constant HR_REGISTRY_IDENTITY_UPDATED within the existing Registry section (the
one around the earlier header) so there is only one "Registry" section
containing the HR_REGISTRY_IDENTITY_UPDATED constant.

In `@src/ai_company/observability/events/promotion.py`:
- Line 18: DEMOTION_APPLIED currently emits "demotion.applied" which breaks the
module's event contract (promotion.<subject>.<action>); update the constant
DEMOTION_APPLIED in promotion.py so its value follows the promotion namespace
(e.g., "promotion.demotion.applied") and adjust any consumers/tests that rely on
the old string to use the new value.

In `@src/ai_company/security/trust/config.py`:
- Around line 127-178: In MilestoneCriteria._validate_approval_flags add
structured logging of the validation failure before raising the ValueError:
import or use the module's logger (e.g., logger) and log a clear message
including the class name, the conflicting field values (self.auto_promote and
self.requires_human_approval) and any relevant context, then raise the same
ValueError; ensure logging uses structured fields (or key/value pairs) so
observability tools can index the values.

In `@src/ai_company/security/trust/milestone_strategy.py`:
- Around line 211-216: The check in milestone_strategy.py uses (now -
state.last_evaluated_at).days to enforce milestone.time_active_days, but that
measures time since last evaluation rather than agent tenure; update the logic
in the function/method containing this block (referenced by
milestone.time_active_days, state.last_evaluated_at, and days_active) to compute
days_active from the agent's hire/start date (e.g., agent.hiring_date or
agent.started_at) instead of state.last_evaluated_at; ensure you handle a
missing hiring/start date (return False or treat as not old enough), compute
(now - agent.hiring_date).days, and then compare that value to
milestone.time_active_days.

In `@src/ai_company/security/trust/per_category_strategy.py`:
- Around line 161-188: In _check_category_criteria, rename the misleading local
variable total_tasks to max_tasks_completed to reflect that it uses
max(window.tasks_completed) across snapshot.windows (best single-window
performance), update all references (the for-loop accumulation and the
subsequent comparison against criteria_config.tasks_completed) to use
max_tasks_completed, and ensure any related comments/docstrings are adjusted to
match the new name.

In `@src/ai_company/security/trust/service.py`:
- Around line 197-204: The current update always sets last_promoted_at even on
demotions; change the update logic around state.model_copy so that
last_promoted_at is only set when result.recommended_level represents a higher
rank than state.global_level (i.e., an actual promotion). Implement this by
comparing levels using the same ranking map used elsewhere (e.g., _LEVEL_RANK)
or by a numeric/ordinal comparison between state.global_level and
result.recommended_level, and only include "last_promoted_at": now in the update
dict when the comparison indicates a promotion; otherwise leave last_promoted_at
unchanged while still updating global_level and trust_score.
- Around line 345-357: _infer_reason currently ignores whether an evaluation is
a promotion or demotion; update it to compare result.recommended_level against
result.current_level and return promotion reasons for increases and demotion
reasons for decreases. Concretely: in _infer_reason, after reading
strategy/result.score, compute a delta = result.recommended_level -
result.current_level; if delta > 0 keep existing promotion mappings (e.g.,
milestone -> MILESTONE_ACHIEVED, weighted/per_category -> SCORE_THRESHOLD); if
delta < 0 pick an appropriate demotion reason from TrustChangeReason (prefer
RE_VERIFICATION_FAILED or ERROR_RATE when the result provides failure/error
signals, otherwise default to TRUST_DECAY) and return that; if delta == 0 fall
back to existing MANUAL or strategy-based logic. Use the TrustEvaluationResult
fields recommended_level, current_level, strategy_name, score and any
error/failure indicators to choose the correct TrustChangeReason.

In `@src/ai_company/security/trust/weighted_strategy.py`:
- Around line 147-159: The error_factor calculation is currently redundant and
equals success_rate; update the branch in weighted_strategy to use a true
failure signal: when iterating snapshot.windows (same loop guard using
window.data_point_count > 0 and window.success_rate is not None), set
error_factor = 1.0 - window.success_rate (or use window.failure_rate if that
field exists) instead of the current 1.0 - (1.0 - window.success_rate) so
completion_factor and error_factor are distinct inputs to the scoring logic.
- Around line 188-216: The loop that computes the recommended level currently
allows multi-step jumps and never evaluates downward transitions; restrict
transitions to adjacent ranks only and evaluate both promotion and demotion so
_check_human_approval remains consistent. Specifically, in the recommendation
logic (where _TRANSITION_KEYS, _TRUST_LEVEL_RANK, self._thresholds and the local
variable recommended are used) compute current_rank/from_rank/to_rank and only
consider a transition if abs(to_rank - current_rank) == 1; for upward moves
require score >= threshold.score, for downward moves require score <
threshold.score (or the appropriate comparison used by TrustThreshold), and set
recommended accordingly. This ensures _check_human_approval (which looks up
thresholds for exact from->to pairs) will see only adjacent transitions and that
demotions are evaluated.

In `@tests/unit/hr/promotion/conftest.py`:
- Around line 39-43: The current logic makes success_rate=0 produce
tasks_failed=0 which is inconsistent; change the branch so that success_rate==0
is treated explicitly: if success_rate == 0 then if tasks_completed > 0 raise a
ValueError (invalid input), otherwise set total = max(1, 0) and compute
tasks_failed = total (so all tasks are failures); keep the existing calculation
for success_rate > 0. Update the block around variables total, success_rate,
tasks_completed, tasks_failed in tests/unit/hr/promotion/conftest.py
accordingly.

In `@tests/unit/hr/promotion/test_config.py`:
- Around line 1-13: Add a module-level 30s timeout for all tests in this file by
declaring the pytest timeout marker at module scope (e.g., set pytestmark to
pytest.mark.timeout(30)) so every test in this module uses the 30-second limit;
place this declaration near the top of the file after the imports (affects the
tests in tests/unit/hr/promotion/test_config.py and ensures the module-level
marker applies to
ModelMappingConfig/PromotionApprovalConfig/PromotionConfig/PromotionCriteriaConfig
tests).
- Around line 45-65: Add frozen/immutability tests for the nested config classes
by creating instances of PromotionCriteriaConfig, PromotionApprovalConfig, and
ModelMappingConfig and asserting that attempting to mutate a field (e.g.,
setting an attribute like min_criteria_met or any other field) raises a
TypeError; add one test function per class (e.g.,
test_immutable_promotion_criteria_config,
test_immutable_promotion_approval_config, test_immutable_model_mapping_config)
in the same test module so the frozen nature of each Pydantic model is
explicitly verified.

In `@tests/unit/hr/promotion/test_models.py`:
- Around line 1-17: This test module lacks the timeout marker; add a
module-level pytest timeout by defining pytestmark = pytest.mark.timeout(30)
near the top of the file (after the imports where pytest is imported) so all
tests in this module use a 30-second timeout; use the existing pytest import and
the pytestmark symbol to apply the marker.

In `@tests/unit/hr/promotion/test_seniority_approval_strategy.py`:
- Around line 1-16: Add the module-level pytest timeout marker by defining
pytestmark = pytest.mark.timeout(30) near the top of the test module (after the
existing imports) so pytest applies a 30s timeout to all tests in this file; you
can place it below the imports that include pytest and above the test helpers
that use make_agent_identity and SeniorityApprovalStrategy.

In `@tests/unit/hr/promotion/test_seniority_model_mapping.py`:
- Around line 1-13: The test module test_seniority_model_mapping.py is missing
the required pytest timeout marker; add a module-level marker by declaring
pytestmark = pytest.mark.timeout(30) (using the pytest symbol) near the top of
the file (after the imports) so the entire module (including tests that exercise
SeniorityModelMapping, ModelMappingConfig, and helpers like make_agent_identity)
runs with a 30s timeout.

In `@tests/unit/hr/promotion/test_threshold_evaluator.py`:
- Around line 1-16: The test module lacks the 30-second timeout marker; add a
module-level pytest mark by inserting "pytestmark = pytest.mark.timeout(30)"
directly after the existing imports (you already import pytest), e.g., in the
tests/unit/hr/promotion/test_threshold_evaluator.py file near the top of the
module so it applies to all tests that exercise ThresholdEvaluator and helpers
like make_performance_snapshot.

In `@tests/unit/security/trust/test_enums.py`:
- Around line 44-72: Multiple single-value tests for TrustChangeReason should be
consolidated into one parameterized test: replace the seven individual test_*
functions (e.g., test_score_threshold, test_milestone_achieved,
test_human_approval, etc.) with a single pytest.mark.parametrize that maps enum
members (TrustChangeReason.SCORE_THRESHOLD,
TrustChangeReason.MILESTONE_ACHIEVED, ...) to their expected string values and
asserts member.value == expected; keep the existing test_member_count as-is and
adjust the members-are-strings test to assert isinstance(member.value, str) when
iterating TrustChangeReason.
- Around line 17-34: The tests duplicate value assertions for TrustStrategyType
(test_disabled_value, test_weighted_value, test_per_category_value,
test_milestone_value); replace them with a single parametric test using
pytest.mark.parametrize that iterates over (member, expected_value) pairs and
asserts member.value == expected_value, and similarly convert
test_members_are_strings to a parametric test over TrustStrategyType members (or
include the isinstance check in the same parametrize) while keeping
test_member_count unchanged; update references to TrustStrategyType and test
function names accordingly.

In `@tests/unit/security/trust/test_milestone_strategy.py`:
- Around line 254-278: The test test_milestone_requires_human_for_elevated
should not conditionally skip assertions; replace the guarded check with
explicit assertions: call MilestoneTrustStrategy.evaluate (as already done) and
assert result.recommended_level == ToolAccessLevel.ELEVATED first, then assert
result.requires_human_approval is True; update references to
result.recommended_level and result.requires_human_approval and fail the test if
promotion didn't occur so the human-approval requirement is always validated.

In `@tests/unit/security/trust/test_service.py`:
- Around line 288-369: Tests in TestElevatedGateEnforcement mark methods async
although they call the synchronous TrustService._enforce_elevated_gate; change
the three test methods (test_elevated_gate_enforces_human_approval,
test_elevated_gate_no_override_when_already_elevated,
test_elevated_gate_no_override_when_approval_set) to regular (non-async)
functions so they reflect synchronous behavior and avoid unnecessary async
pytest handling, keeping assertions and setup unchanged.

---

Outside diff comments:
In `@src/ai_company/config/schema.py`:
- Around line 467-498: The RootConfig docstring is out of date and missing the
new attributes; update the Attributes section of the RootConfig class docstring
to include brief entries for the trust and promotion fields (e.g., "trust: Trust
configuration." and "promotion: Promotion rules/configuration.") so the
docstring accurately documents all top-level config fields and matches the model
schema.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 40cba637-cd6f-4748-ad05-7c0fe0924f0a

📥 Commits

Reviewing files that changed from the base of the PR and between 2d091ea and b72b304.

📒 Files selected for processing (53)

CLAUDE.md
DESIGN_SPEC.md
src/ai_company/config/defaults.py
src/ai_company/config/schema.py
src/ai_company/hr/enums.py
src/ai_company/hr/errors.py
src/ai_company/hr/promotion/__init__.py
src/ai_company/hr/promotion/approval_protocol.py
src/ai_company/hr/promotion/config.py
src/ai_company/hr/promotion/criteria_protocol.py
src/ai_company/hr/promotion/model_mapping_protocol.py
src/ai_company/hr/promotion/models.py
src/ai_company/hr/promotion/seniority_approval_strategy.py
src/ai_company/hr/promotion/seniority_model_mapping.py
src/ai_company/hr/promotion/service.py
src/ai_company/hr/promotion/threshold_evaluator.py
src/ai_company/hr/registry.py
src/ai_company/observability/events/hr.py
src/ai_company/observability/events/promotion.py
src/ai_company/observability/events/trust.py
src/ai_company/security/trust/__init__.py
src/ai_company/security/trust/config.py
src/ai_company/security/trust/disabled_strategy.py
src/ai_company/security/trust/enums.py
src/ai_company/security/trust/errors.py
src/ai_company/security/trust/milestone_strategy.py
src/ai_company/security/trust/models.py
src/ai_company/security/trust/per_category_strategy.py
src/ai_company/security/trust/protocol.py
src/ai_company/security/trust/service.py
src/ai_company/security/trust/weighted_strategy.py
tests/unit/config/conftest.py
tests/unit/hr/promotion/__init__.py
tests/unit/hr/promotion/conftest.py
tests/unit/hr/promotion/test_config.py
tests/unit/hr/promotion/test_models.py
tests/unit/hr/promotion/test_seniority_approval_strategy.py
tests/unit/hr/promotion/test_seniority_model_mapping.py
tests/unit/hr/promotion/test_service.py
tests/unit/hr/promotion/test_threshold_evaluator.py
tests/unit/hr/test_enums.py
tests/unit/hr/test_registry.py
tests/unit/observability/test_events.py
tests/unit/security/trust/__init__.py
tests/unit/security/trust/conftest.py
tests/unit/security/trust/test_config.py
tests/unit/security/trust/test_disabled_strategy.py
tests/unit/security/trust/test_enums.py
tests/unit/security/trust/test_milestone_strategy.py
tests/unit/security/trust/test_models.py
tests/unit/security/trust/test_per_category_strategy.py
tests/unit/security/trust/test_service.py
tests/unit/security/trust/test_weighted_strategy.py

📜 Review details

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: Agent
GitHub Check: Greptile Review

🧰 Additional context used

📓 Path-based instructions (3)

**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use from ai_company.observability import get_logger and instantiate logger as logger = get_logger(__name__) in every module with business logic

Files:

src/ai_company/hr/enums.py
tests/unit/hr/promotion/__init__.py
src/ai_company/hr/errors.py
src/ai_company/observability/events/hr.py
tests/unit/hr/promotion/test_threshold_evaluator.py
src/ai_company/hr/promotion/seniority_model_mapping.py
src/ai_company/observability/events/trust.py
src/ai_company/config/defaults.py
tests/unit/hr/promotion/test_config.py
src/ai_company/hr/promotion/__init__.py
src/ai_company/security/trust/errors.py
src/ai_company/security/trust/config.py
src/ai_company/hr/promotion/seniority_approval_strategy.py
tests/unit/security/trust/test_models.py
tests/unit/config/conftest.py
src/ai_company/hr/promotion/model_mapping_protocol.py
src/ai_company/security/trust/protocol.py
tests/unit/security/trust/test_per_category_strategy.py
tests/unit/security/trust/conftest.py
src/ai_company/hr/promotion/criteria_protocol.py
tests/unit/hr/promotion/test_seniority_approval_strategy.py
tests/unit/security/trust/test_service.py
src/ai_company/hr/promotion/approval_protocol.py
tests/unit/security/trust/test_weighted_strategy.py
tests/unit/observability/test_events.py
src/ai_company/hr/registry.py
src/ai_company/hr/promotion/threshold_evaluator.py
src/ai_company/security/trust/disabled_strategy.py
src/ai_company/security/trust/enums.py
src/ai_company/security/trust/service.py
src/ai_company/hr/promotion/models.py
src/ai_company/hr/promotion/service.py
src/ai_company/security/trust/models.py
src/ai_company/hr/promotion/config.py
tests/unit/security/trust/test_config.py
tests/unit/security/trust/test_disabled_strategy.py
tests/unit/hr/promotion/conftest.py
tests/unit/security/trust/test_enums.py
src/ai_company/security/trust/__init__.py
src/ai_company/config/schema.py
tests/unit/security/trust/test_milestone_strategy.py
src/ai_company/security/trust/per_category_strategy.py
tests/unit/hr/test_enums.py
tests/unit/hr/promotion/test_models.py
src/ai_company/observability/events/promotion.py
tests/unit/security/trust/__init__.py
tests/unit/hr/test_registry.py
tests/unit/hr/promotion/test_service.py
tests/unit/hr/promotion/test_seniority_model_mapping.py
src/ai_company/security/trust/milestone_strategy.py
src/ai_company/security/trust/weighted_strategy.py

src/ai_company/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/ai_company/**/*.py: Never use import logging, logging.getLogger(), or print() in application code — only use the centralized logger
Use event name constants from ai_company.observability.events.<domain> modules (e.g., PROVIDER_CALL_START from events.provider, BUDGET_RECORD_ADDED from events.budget) instead of string literals
Always use structured logging with logger.info(EVENT, key=value) format — never use format strings like logger.info('msg %s', val)
No from __future__ import annotations — Python 3.14 has PEP 649 native lazy annotations
Use except A, B: syntax (no parentheses) per PEP 758 — ruff enforces this on Python 3.14
Add type hints to all public functions and classes; mypy strict mode is enforced
Add Google-style docstrings to all public classes and functions — ruff D rules enforce this
Use immutability principles: create new objects instead of mutating existing ones; for non-Pydantic collections use copy.deepcopy() at construction and MappingProxyType for read-only enforcement
Use frozen Pydantic models for config/identity; use separate mutable-via-copy models (with model_copy(update=...)) for runtime state that evolves
Use NotBlankStr from core.types for all identifier and name fields instead of manual whitespace validators, including optional (NotBlankStr | None) and tuple variants
Use @computed_field for derived values in Pydantic models instead of storing and validating redundant fields (e.g., TokenUsage.total_tokens)
Prefer asyncio.TaskGroup for fan-out/fan-in parallel operations in new code (e.g., multiple tool invocations, parallel agent calls) over bare create_task
Keep functions under 50 lines and files under 800 lines
Keep line length at 88 characters (enforced by ruff)
Handle all errors explicitly; never silently swallow exceptions
Validate at system boundaries (user input, external APIs, config files)
Never use real vendor names (Anthropic, OpenAI, Claude, GPT) in project-owned code, docstrings...

Files:

src/ai_company/hr/enums.py
src/ai_company/hr/errors.py
src/ai_company/observability/events/hr.py
src/ai_company/hr/promotion/seniority_model_mapping.py
src/ai_company/observability/events/trust.py
src/ai_company/config/defaults.py
src/ai_company/hr/promotion/__init__.py
src/ai_company/security/trust/errors.py
src/ai_company/security/trust/config.py
src/ai_company/hr/promotion/seniority_approval_strategy.py
src/ai_company/hr/promotion/model_mapping_protocol.py
src/ai_company/security/trust/protocol.py
src/ai_company/hr/promotion/criteria_protocol.py
src/ai_company/hr/promotion/approval_protocol.py
src/ai_company/hr/registry.py
src/ai_company/hr/promotion/threshold_evaluator.py
src/ai_company/security/trust/disabled_strategy.py
src/ai_company/security/trust/enums.py
src/ai_company/security/trust/service.py
src/ai_company/hr/promotion/models.py
src/ai_company/hr/promotion/service.py
src/ai_company/security/trust/models.py
src/ai_company/hr/promotion/config.py
src/ai_company/security/trust/__init__.py
src/ai_company/config/schema.py
src/ai_company/security/trust/per_category_strategy.py
src/ai_company/observability/events/promotion.py
src/ai_company/security/trust/milestone_strategy.py
src/ai_company/security/trust/weighted_strategy.py

tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Add test markers @pytest.mark.unit, @pytest.mark.integration, @pytest.mark.e2e, or @pytest.mark.slow to all test files
Maintain 80% minimum code coverage — enforced in CI with pytest --cov=ai_company --cov-fail-under=80
Use asyncio_mode = 'auto' in pytest configuration — no manual @pytest.mark.asyncio needed on async tests
Set test timeout to 30 seconds per test — use @pytest.mark.timeout(30) or configure in pytest.ini
Use @pytest.mark.parametrize for testing similar cases instead of duplicating test functions

Files:

tests/unit/hr/promotion/__init__.py
tests/unit/hr/promotion/test_threshold_evaluator.py
tests/unit/hr/promotion/test_config.py
tests/unit/security/trust/test_models.py
tests/unit/config/conftest.py
tests/unit/security/trust/test_per_category_strategy.py
tests/unit/security/trust/conftest.py
tests/unit/hr/promotion/test_seniority_approval_strategy.py
tests/unit/security/trust/test_service.py
tests/unit/security/trust/test_weighted_strategy.py
tests/unit/observability/test_events.py
tests/unit/security/trust/test_config.py
tests/unit/security/trust/test_disabled_strategy.py
tests/unit/hr/promotion/conftest.py
tests/unit/security/trust/test_enums.py
tests/unit/security/trust/test_milestone_strategy.py
tests/unit/hr/test_enums.py
tests/unit/hr/promotion/test_models.py
tests/unit/security/trust/__init__.py
tests/unit/hr/test_registry.py
tests/unit/hr/promotion/test_service.py
tests/unit/hr/promotion/test_seniority_model_mapping.py

🧠 Learnings (11)

📚 Learning: 2026-03-10T09:29:47.580Z

Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-10T09:29:47.580Z
Learning: Applies to src/ai_company/**/*.py : Add type hints to all public functions and classes; mypy strict mode is enforced

Applied to files:

src/ai_company/hr/errors.py

📚 Learning: 2026-03-10T09:29:47.580Z

Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-10T09:29:47.580Z
Learning: Applies to src/ai_company/**/*.py : Use event name constants from `ai_company.observability.events.<domain>` modules (e.g., `PROVIDER_CALL_START` from `events.provider`, `BUDGET_RECORD_ADDED` from `events.budget`) instead of string literals

Applied to files:

src/ai_company/observability/events/hr.py
src/ai_company/observability/events/trust.py
tests/unit/observability/test_events.py
src/ai_company/observability/events/promotion.py
CLAUDE.md

📚 Learning: 2026-03-10T09:29:47.581Z

Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-10T09:29:47.581Z
Learning: Applies to src/ai_company/**/*.py : Use frozen Pydantic models for config/identity; use separate mutable-via-copy models (with `model_copy(update=...)`) for runtime state that evolves

Applied to files:

src/ai_company/hr/promotion/seniority_model_mapping.py
tests/unit/hr/promotion/test_config.py
tests/unit/security/trust/test_models.py
src/ai_company/hr/promotion/model_mapping_protocol.py
src/ai_company/hr/registry.py
src/ai_company/hr/promotion/models.py
src/ai_company/security/trust/models.py
src/ai_company/hr/promotion/config.py
src/ai_company/config/schema.py

📚 Learning: 2026-03-10T09:29:47.581Z

Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-10T09:29:47.581Z
Learning: Applies to src/ai_company/**/*.py : Validate at system boundaries (user input, external APIs, config files)

Applied to files:

src/ai_company/security/trust/config.py
CLAUDE.md

📚 Learning: 2026-03-10T09:29:47.581Z

Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-10T09:29:47.581Z
Learning: Applies to src/ai_company/**/*.py : Use immutability principles: create new objects instead of mutating existing ones; for non-Pydantic collections use `copy.deepcopy()` at construction and `MappingProxyType` for read-only enforcement

Applied to files:

src/ai_company/security/trust/models.py
src/ai_company/hr/promotion/config.py

📚 Learning: 2026-03-10T09:29:47.581Z

Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-10T09:29:47.581Z
Learning: Always read `DESIGN_SPEC.md` before implementing any feature or planning any issue — treat it as the starting point for architecture, data models, and behavior

Applied to files:

DESIGN_SPEC.md

📚 Learning: 2026-03-10T09:29:47.581Z

Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-10T09:29:47.581Z
Learning: If implementation deviates from DESIGN_SPEC.md, alert the user and explain why — user decides whether to proceed or update the spec; never silently diverge

Applied to files:

DESIGN_SPEC.md

📚 Learning: 2026-03-10T09:29:47.580Z

Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-10T09:29:47.580Z
Learning: Applies to **/*.py : Use `from ai_company.observability import get_logger` and instantiate logger as `logger = get_logger(__name__)` in every module with business logic

Applied to files:

CLAUDE.md

📚 Learning: 2026-03-10T09:29:47.580Z

Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-10T09:29:47.580Z
Learning: Applies to src/ai_company/**/*.py : Always use structured logging with `logger.info(EVENT, key=value)` format — never use format strings like `logger.info('msg %s', val)`

Applied to files:

CLAUDE.md

📚 Learning: 2026-03-10T09:29:47.580Z

Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-10T09:29:47.580Z
Learning: Applies to src/ai_company/**/*.py : Never use `import logging`, `logging.getLogger()`, or `print()` in application code — only use the centralized logger

Applied to files:

CLAUDE.md

📚 Learning: 2026-03-10T09:29:47.581Z

Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-10T09:29:47.581Z
Learning: Applies to src/ai_company/**/*.py : Handle all errors explicitly; never silently swallow exceptions

Applied to files:

CLAUDE.md

🧬 Code graph analysis (32)

tests/unit/hr/promotion/test_threshold_evaluator.py (3)

src/ai_company/hr/enums.py (1)

PromotionDirection (44-48)

src/ai_company/hr/promotion/threshold_evaluator.py (3)

ThresholdEvaluator (40-192)

name (51-53)

evaluate (55-126)

src/ai_company/hr/promotion/config.py (1)

PromotionCriteriaConfig (15-33)

tests/unit/hr/promotion/test_config.py (2)

src/ai_company/core/enums.py (1)

SeniorityLevel (6-21)

src/ai_company/hr/promotion/config.py (4)

ModelMappingConfig (63-90)

PromotionApprovalConfig (36-60)

PromotionConfig (93-126)

PromotionCriteriaConfig (15-33)

src/ai_company/security/trust/config.py (3)

src/ai_company/core/enums.py (1)

ToolAccessLevel (274-291)

src/ai_company/observability/_logger.py (1)

get_logger (8-28)

src/ai_company/security/trust/enums.py (1)

TrustStrategyType (6-12)

tests/unit/security/trust/test_models.py (3)

src/ai_company/core/enums.py (1)

ToolAccessLevel (274-291)

src/ai_company/security/trust/enums.py (1)

TrustChangeReason (15-24)

src/ai_company/security/trust/models.py (4)

TrustChangeRecord (71-105)

TrustEvaluationResult (108-147)

TrustState (22-68)

should_change (145-147)

tests/unit/config/conftest.py (2)

src/ai_company/hr/promotion/config.py (1)

PromotionConfig (93-126)

src/ai_company/security/trust/config.py (1)

TrustConfig (214-369)

src/ai_company/hr/promotion/model_mapping_protocol.py (10)

src/ai_company/core/agent.py (1)

AgentIdentity (265-323)

src/ai_company/core/enums.py (1)

SeniorityLevel (6-21)

src/ai_company/hr/promotion/seniority_model_mapping.py (2)

name (35-37)

resolve_model (39-87)

src/ai_company/security/trust/disabled_strategy.py (1)

name (39-41)

src/ai_company/security/trust/milestone_strategy.py (1)

name (64-66)

src/ai_company/security/trust/per_category_strategy.py (1)

name (52-54)

src/ai_company/security/trust/weighted_strategy.py (1)

name (64-66)

src/ai_company/hr/promotion/approval_protocol.py (1)

name (26-28)

src/ai_company/hr/promotion/criteria_protocol.py (1)

name (24-26)

src/ai_company/security/trust/protocol.py (1)

name (24-26)

src/ai_company/security/trust/protocol.py (5)

src/ai_company/security/trust/models.py (2)

TrustEvaluationResult (108-147)

TrustState (22-68)

src/ai_company/security/trust/disabled_strategy.py (3)

name (39-41)

evaluate (43-80)

initial_state (82-94)

src/ai_company/security/trust/milestone_strategy.py (3)

name (64-66)

evaluate (68-148)

initial_state (150-163)

src/ai_company/security/trust/per_category_strategy.py (3)

name (52-54)

evaluate (56-144)

initial_state (146-159)

src/ai_company/security/trust/weighted_strategy.py (3)

name (64-66)

evaluate (68-114)

initial_state (116-129)

tests/unit/security/trust/test_per_category_strategy.py (1)

src/ai_company/security/trust/per_category_strategy.py (3)

name (52-54)

initial_state (146-159)

evaluate (56-144)

tests/unit/security/trust/conftest.py (4)

src/ai_company/core/enums.py (1)

ToolAccessLevel (274-291)

src/ai_company/hr/performance/models.py (2)

AgentPerformanceSnapshot (266-301)

WindowMetrics (197-263)

src/ai_company/security/trust/config.py (5)

MilestoneCriteria (127-177)

ReVerificationConfig (180-211)

TrustConfig (214-369)

TrustThreshold (21-35)

WeightedTrustWeights (38-96)

src/ai_company/security/trust/enums.py (1)

TrustStrategyType (6-12)

src/ai_company/hr/promotion/criteria_protocol.py (14)

src/ai_company/core/enums.py (1)

SeniorityLevel (6-21)

src/ai_company/hr/promotion/models.py (1)

PromotionEvaluation (49-93)

src/ai_company/hr/promotion/seniority_approval_strategy.py (1)

name (39-41)

src/ai_company/hr/promotion/seniority_model_mapping.py (1)

name (35-37)

src/ai_company/security/trust/disabled_strategy.py (2)

name (39-41)

evaluate (43-80)

src/ai_company/security/trust/milestone_strategy.py (2)

name (64-66)

evaluate (68-148)

src/ai_company/security/trust/per_category_strategy.py (2)

name (52-54)

evaluate (56-144)

src/ai_company/security/trust/weighted_strategy.py (2)

name (64-66)

evaluate (68-114)

src/ai_company/hr/promotion/approval_protocol.py (1)

name (26-28)

src/ai_company/hr/promotion/model_mapping_protocol.py (1)

name (23-25)

src/ai_company/hr/promotion/threshold_evaluator.py (2)

name (51-53)

evaluate (55-126)

src/ai_company/security/trust/protocol.py (2)

name (24-26)

evaluate (28-45)

src/ai_company/engine/parallel_models.py (1)

agent_id (79-81)

src/ai_company/engine/parallel.py (1)

snapshot (69-78)

tests/unit/hr/promotion/test_seniority_approval_strategy.py (5)

src/ai_company/core/enums.py (1)

SeniorityLevel (6-21)

src/ai_company/hr/enums.py (1)

PromotionDirection (44-48)

src/ai_company/hr/promotion/config.py (1)

PromotionApprovalConfig (36-60)

src/ai_company/hr/promotion/models.py (2)

PromotionEvaluation (49-93)

requires_human (111-113)

src/ai_company/hr/promotion/seniority_approval_strategy.py (2)

name (39-41)

decide (43-70)

tests/unit/security/trust/test_service.py (4)

src/ai_company/security/trust/enums.py (1)

TrustChangeReason (15-24)

src/ai_company/security/trust/errors.py (1)

TrustEvaluationError (8-9)

src/ai_company/security/trust/models.py (3)

TrustEvaluationResult (108-147)

TrustState (22-68)

should_change (145-147)

src/ai_company/security/trust/service.py (1)

_infer_reason (346-357)

tests/unit/security/trust/test_weighted_strategy.py (1)

src/ai_company/security/trust/weighted_strategy.py (4)

WeightedTrustStrategy (44-216)

name (64-66)

initial_state (116-129)

evaluate (68-114)

src/ai_company/hr/registry.py (2)

src/ai_company/core/agent.py (1)

AgentIdentity (265-323)

src/ai_company/hr/errors.py (1)

AgentNotFoundError (60-61)

src/ai_company/hr/promotion/threshold_evaluator.py (5)

src/ai_company/core/enums.py (2)

SeniorityLevel (6-21)

compare_seniority (51-64)

src/ai_company/hr/enums.py (1)

PromotionDirection (44-48)

src/ai_company/hr/promotion/models.py (2)

CriterionResult (24-46)

PromotionEvaluation (49-93)

src/ai_company/hr/performance/models.py (1)

AgentPerformanceSnapshot (266-301)

src/ai_company/hr/promotion/config.py (1)

PromotionCriteriaConfig (15-33)

src/ai_company/security/trust/disabled_strategy.py (5)

src/ai_company/core/enums.py (1)

ToolAccessLevel (274-291)

src/ai_company/security/trust/milestone_strategy.py (3)

name (64-66)

evaluate (68-148)

initial_state (150-163)

src/ai_company/security/trust/per_category_strategy.py (3)

name (52-54)

evaluate (56-144)

initial_state (146-159)

src/ai_company/security/trust/weighted_strategy.py (3)

name (64-66)

evaluate (68-114)

initial_state (116-129)

src/ai_company/security/trust/protocol.py (3)

name (24-26)

evaluate (28-45)

initial_state (47-56)

src/ai_company/security/trust/service.py (6)

src/ai_company/core/enums.py (2)

ApprovalRiskLevel (443-449)

ToolAccessLevel (274-291)

src/ai_company/security/trust/errors.py (1)

TrustEvaluationError (8-9)

src/ai_company/security/trust/models.py (4)

TrustChangeRecord (71-105)

TrustEvaluationResult (108-147)

TrustState (22-68)

should_change (145-147)

tests/unit/api/conftest.py (1)

approval_store (338-339)

src/ai_company/security/trust/protocol.py (4)

TrustStrategy (16-56)

initial_state (47-56)

name (24-26)

evaluate (28-45)

src/ai_company/security/trust/weighted_strategy.py (3)

initial_state (116-129)

name (64-66)

evaluate (68-114)

src/ai_company/hr/promotion/models.py (2)

src/ai_company/core/enums.py (2)

ApprovalStatus (434-440)

SeniorityLevel (6-21)

src/ai_company/hr/enums.py (1)

PromotionDirection (44-48)

src/ai_company/security/trust/models.py (4)

src/ai_company/core/enums.py (1)

ToolAccessLevel (274-291)

src/ai_company/security/trust/enums.py (1)

TrustChangeReason (15-24)

src/ai_company/engine/parallel_models.py (1)

agent_id (79-81)

src/ai_company/tools/base.py (2)

description (138-140)

category (128-130)

src/ai_company/hr/promotion/config.py (2)

src/ai_company/core/enums.py (1)

SeniorityLevel (6-21)

src/ai_company/tools/base.py (1)

description (138-140)

tests/unit/security/trust/test_config.py (3)

src/ai_company/core/enums.py (1)

ToolAccessLevel (274-291)

src/ai_company/security/trust/config.py (3)

MilestoneCriteria (127-177)

TrustThreshold (21-35)

WeightedTrustWeights (38-96)

src/ai_company/security/trust/enums.py (1)

TrustStrategyType (6-12)

tests/unit/security/trust/test_disabled_strategy.py (4)

src/ai_company/core/enums.py (1)

ToolAccessLevel (274-291)

src/ai_company/security/trust/disabled_strategy.py (4)

DisabledTrustStrategy (24-94)

name (39-41)

initial_state (82-94)

evaluate (43-80)

src/ai_company/security/trust/models.py (1)

TrustState (22-68)

tests/unit/security/trust/conftest.py (1)

make_performance_snapshot (22-60)

tests/unit/hr/promotion/conftest.py (3)

src/ai_company/core/enums.py (1)

SeniorityLevel (6-21)

src/ai_company/hr/performance/models.py (2)

AgentPerformanceSnapshot (266-301)

WindowMetrics (197-263)

src/ai_company/hr/registry.py (1)

AgentRegistryService (32-242)

tests/unit/security/trust/test_enums.py (1)

src/ai_company/security/trust/enums.py (2)

TrustChangeReason (15-24)

TrustStrategyType (6-12)

src/ai_company/config/schema.py (2)

src/ai_company/hr/promotion/config.py (1)

PromotionConfig (93-126)

src/ai_company/security/trust/config.py (1)

TrustConfig (214-369)

tests/unit/security/trust/test_milestone_strategy.py (2)

src/ai_company/security/trust/config.py (2)

MilestoneCriteria (127-177)

TrustConfig (214-369)

src/ai_company/security/trust/milestone_strategy.py (4)

MilestoneTrustStrategy (50-291)

name (64-66)

initial_state (150-163)

evaluate (68-148)

src/ai_company/security/trust/per_category_strategy.py (4)

src/ai_company/core/enums.py (1)

ToolAccessLevel (274-291)

src/ai_company/security/trust/models.py (2)

TrustEvaluationResult (108-147)

TrustState (22-68)

src/ai_company/hr/performance/models.py (1)

AgentPerformanceSnapshot (266-301)

src/ai_company/security/trust/config.py (2)

CategoryTrustCriteria (99-124)

TrustConfig (214-369)

tests/unit/hr/test_enums.py (1)

src/ai_company/hr/enums.py (2)

PromotionDirection (44-48)

LifecycleEventType (32-41)

tests/unit/hr/test_registry.py (4)

src/ai_company/core/enums.py (2)

AgentStatus (67-73)

SeniorityLevel (6-21)

tests/unit/hr/conftest.py (2)

registry (174-176)

make_agent_identity (37-55)

src/ai_company/hr/registry.py (3)

register (43-69)

update_identity (198-237)

get (101-111)

src/ai_company/hr/errors.py (1)

AgentNotFoundError (60-61)

tests/unit/hr/promotion/test_seniority_model_mapping.py (4)

src/ai_company/core/enums.py (1)

SeniorityLevel (6-21)

src/ai_company/hr/promotion/config.py (1)

ModelMappingConfig (63-90)

src/ai_company/hr/promotion/seniority_model_mapping.py (3)

SeniorityModelMapping (22-87)

name (35-37)

resolve_model (39-87)

tests/unit/hr/promotion/conftest.py (1)

make_agent_identity (62-81)

src/ai_company/security/trust/milestone_strategy.py (3)

src/ai_company/security/trust/models.py (2)

TrustEvaluationResult (108-147)

TrustState (22-68)

src/ai_company/hr/performance/models.py (1)

AgentPerformanceSnapshot (266-301)

src/ai_company/security/trust/config.py (2)

MilestoneCriteria (127-177)

TrustConfig (214-369)

src/ai_company/security/trust/weighted_strategy.py (4)

src/ai_company/core/enums.py (1)

ToolAccessLevel (274-291)

src/ai_company/security/trust/models.py (2)

TrustEvaluationResult (108-147)

TrustState (22-68)

src/ai_company/hr/performance/models.py (1)

AgentPerformanceSnapshot (266-301)

src/ai_company/security/trust/config.py (1)

TrustThreshold (21-35)

🪛 LanguageTool

CLAUDE.md

[style] ~87-~87: A comma is missing here.
Context: ...nder ai_company.observability.events (e.g. PROVIDER_CALL_START from `events.prov...

(EG_NO_COMMA)

DESIGN_SPEC.md

src/ai_company/hr/promotion/config.py

src/ai_company/hr/promotion/models.py

src/ai_company/hr/promotion/service.py

tests/unit/hr/promotion/test_threshold_evaluator.py

tests/unit/security/trust/test_enums.py

coderabbitai · 2026-03-10T10:56:21Z

tests/unit/security/trust/test_enums.py

+    def test_score_threshold(self) -> None:
+        assert TrustChangeReason.SCORE_THRESHOLD.value == "score_threshold"
+
+    def test_milestone_achieved(self) -> None:
+        assert TrustChangeReason.MILESTONE_ACHIEVED.value == "milestone_achieved"
+
+    def test_human_approval(self) -> None:
+        assert TrustChangeReason.HUMAN_APPROVAL.value == "human_approval"
+
+    def test_trust_decay(self) -> None:
+        assert TrustChangeReason.TRUST_DECAY.value == "trust_decay"
+
+    def test_re_verification_failed(self) -> None:
+        assert (
+            TrustChangeReason.RE_VERIFICATION_FAILED.value == "re_verification_failed"
+        )
+
+    def test_manual(self) -> None:
+        assert TrustChangeReason.MANUAL.value == "manual"
+
+    def test_error_rate(self) -> None:
+        assert TrustChangeReason.ERROR_RATE.value == "error_rate"
+
+    def test_members_are_strings(self) -> None:
+        for member in TrustChangeReason:
+            assert isinstance(member, str)
+
+    def test_member_count(self) -> None:
+        assert len(TrustChangeReason) == 7


🧹 Nitpick | 🔵 Trivial

Same parametrize suggestion applies to TrustChangeReason tests.

The seven individual value tests could be consolidated similarly.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tests/unit/security/trust/test_enums.py` around lines 44 - 72, Multiple single-value tests for TrustChangeReason should be consolidated into one parameterized test: replace the seven individual test_* functions (e.g., test_score_threshold, test_milestone_achieved, test_human_approval, etc.) with a single pytest.mark.parametrize that maps enum members (TrustChangeReason.SCORE_THRESHOLD, TrustChangeReason.MILESTONE_ACHIEVED, ...) to their expected string values and asserts member.value == expected; keep the existing test_member_count as-is and adjust the members-are-strings test to assert isinstance(member.value, str) when iterating TrustChangeReason.

tests/unit/security/trust/test_milestone_strategy.py

tests/unit/security/trust/test_service.py

gemini-code-assist · 2026-03-10T11:33:49Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly expands the system's capabilities by integrating sophisticated mechanisms for managing agent trust and career progression. It enables dynamic adjustment of agent privileges and seniority based on performance and predefined rules, ensuring a more secure and adaptive operational environment. The changes also reflect a robust internal review process, leading to a more resilient and well-tested codebase.

Highlights

Progressive Trust Subsystem: Introduced a new progressive trust subsystem with four pluggable strategies (disabled, weighted, per-category, milestone) to manage agent tool access levels. This includes trust state management, decay detection, re-verification, and defense-in-depth elevated gate enforcement.
Promotion/Demotion Subsystem: Added a promotion/demotion subsystem for agents, featuring criteria evaluation, seniority-based approval decisions, model mapping, cooldown enforcement, and integration with trust re-evaluation.
Review Hardening & Test Coverage: Addressed 33 findings from 10 parallel review agents, including improvements to computed fields, cross-field validators, structured logging, and configuration tightening. Enhanced test coverage with 23 new tests for critical paths.

Changelog

CLAUDE.md
- Updated project structure documentation to reflect the new 'trust/' and 'promotion/' subsystems.
- Added new event constants for trust and promotion to the logging guidelines.
DESIGN_SPEC.md
- Updated the project structure section (§15.3) to include the newly added trust/ and promotion/ subsystems.
src/ai_company/config/defaults.py
- Added default empty dictionaries for 'trust' and 'promotion' configurations.
src/ai_company/config/schema.py
- Imported PromotionConfig and TrustConfig models.
- Added trust and promotion fields to the RootConfig with default factories and descriptions.
src/ai_company/hr/enums.py
- Extended LifecycleEventType with PROMOTED and DEMOTED values.
- Introduced a new PromotionDirection enum to specify promotion or demotion.
src/ai_company/hr/errors.py
- Added new error classes: PromotionError, PromotionCooldownError, and PromotionApprovalRequiredError.
src/ai_company/hr/promotion/init.py
- Added the promotion and demotion subsystem, exposing its key components.
src/ai_company/hr/promotion/approval_protocol.py
- Defined the PromotionApprovalStrategy protocol for pluggable approval decision logic.
src/ai_company/hr/promotion/config.py
- Added configuration models (PromotionCriteriaConfig, PromotionApprovalConfig, ModelMappingConfig, PromotionConfig) for the promotion subsystem.
src/ai_company/hr/promotion/criteria_protocol.py
- Defined the PromotionCriteriaStrategy protocol for pluggable criteria evaluation.
src/ai_company/hr/promotion/model_mapping_protocol.py
- Defined the ModelMappingStrategy protocol for mapping seniority levels to LLM models.
src/ai_company/hr/promotion/models.py
- Added Pydantic models for CriterionResult, PromotionEvaluation, PromotionApprovalDecision, PromotionRecord, and PromotionRequest.
src/ai_company/hr/promotion/seniority_approval_strategy.py
- Implemented SeniorityApprovalStrategy for seniority-based promotion approval decisions.
src/ai_company/hr/promotion/seniority_model_mapping.py
- Implemented SeniorityModelMapping strategy for resolving LLM models based on seniority level.
src/ai_company/hr/promotion/service.py
- Added PromotionService to orchestrate agent promotions and demotions, including evaluation, approval, and model mapping.
src/ai_company/hr/promotion/threshold_evaluator.py
- Implemented ThresholdEvaluator for evaluating promotion/demotion criteria based on configurable thresholds.
src/ai_company/hr/registry.py
- Added update_identity asynchronous method to allow updating agent identity fields.
- Imported HR_REGISTRY_IDENTITY_UPDATED event constant.
src/ai_company/observability/events/hr.py
- Added HR_REGISTRY_IDENTITY_UPDATED event constant for logging identity updates.
src/ai_company/observability/events/promotion.py
- Added event constants for structured logging related to promotion and demotion processes.
src/ai_company/observability/events/trust.py
- Added event constants for structured logging related to trust evaluation and changes.
src/ai_company/security/trust/init.py
- Added the progressive trust subsystem, exposing its key components.
src/ai_company/security/trust/config.py
- Added configuration models (TrustThreshold, WeightedTrustWeights, CategoryTrustCriteria, MilestoneCriteria, ReVerificationConfig, TrustConfig) for the progressive trust subsystem.
src/ai_company/security/trust/disabled_strategy.py
- Implemented DisabledTrustStrategy which maintains a static access level for agents.
src/ai_company/security/trust/enums.py
- Added TrustStrategyType and TrustChangeReason enumerations for trust domain concepts.
src/ai_company/security/trust/errors.py
- Added TrustError and TrustEvaluationError for trust-related exceptions.
src/ai_company/security/trust/milestone_strategy.py
- Implemented MilestoneTrustStrategy for trust evaluation based on explicit milestones, re-verification, and decay.
src/ai_company/security/trust/models.py
- Added Pydantic models for TrustState, TrustChangeRecord, and TrustEvaluationResult.
src/ai_company/security/trust/per_category_strategy.py
- Implemented PerCategoryTrustStrategy to manage separate trust tracks per tool category.
src/ai_company/security/trust/protocol.py
- Defined the TrustStrategy protocol for pluggable progressive trust strategies.
src/ai_company/security/trust/service.py
- Added TrustService to orchestrate progressive trust state management and evaluation.
src/ai_company/security/trust/weighted_strategy.py
- Implemented WeightedTrustStrategy to compute a trust score from weighted performance factors.
tests/unit/config/conftest.py
- Updated RootConfigFactory to include trust and promotion configurations for testing.
tests/unit/hr/promotion/init.py
- Added unit tests for the promotion subsystem.
tests/unit/hr/promotion/conftest.py
- Added shared fixtures and builder functions for promotion unit tests.
tests/unit/hr/promotion/test_config.py
- Added unit tests for promotion configuration models, including validation and defaults.
tests/unit/hr/promotion/test_models.py
- Added unit tests for promotion domain models, including computed fields and immutability.
tests/unit/hr/promotion/test_seniority_approval_strategy.py
- Added unit tests for the SeniorityApprovalStrategy, covering promotion and demotion approval logic.
tests/unit/hr/promotion/test_seniority_model_mapping.py
- Added unit tests for the SeniorityModelMapping strategy, covering model resolution and overrides.
tests/unit/hr/promotion/test_service.py
- Added unit tests for the PromotionService orchestrator, covering evaluation, requests, and application of changes.
tests/unit/hr/promotion/test_threshold_evaluator.py
- Added unit tests for the ThresholdEvaluator strategy, covering promotion and demotion criteria evaluation.
tests/unit/hr/test_enums.py
- Added tests for the new PromotionDirection enum.
- Updated tests for LifecycleEventType to include new promotion/demotion events.
tests/unit/hr/test_registry.py
- Added tests for the new update_identity method in AgentRegistryService.
tests/unit/observability/test_events.py
- Updated event discovery tests to include the new trust and promotion observability modules.
tests/unit/security/trust/init.py
- Added unit tests for the trust subsystem.
tests/unit/security/trust/conftest.py
- Added shared fixtures and helper functions for trust subsystem tests.
tests/unit/security/trust/test_config.py
- Added tests for trust configuration models, including strategy-specific validations and the elevated gate invariant.
tests/unit/security/trust/test_disabled_strategy.py
- Added tests for the DisabledTrustStrategy, verifying its static behavior.
tests/unit/security/trust/test_enums.py
- Added tests for trust domain enumerations, TrustStrategyType and TrustChangeReason.
tests/unit/security/trust/test_milestone_strategy.py
- Added tests for the MilestoneTrustStrategy, covering milestone achievement, decay, and re-verification.
tests/unit/security/trust/test_models.py
- Added tests for trust domain models, including TrustState, TrustChangeRecord, and TrustEvaluationResult.
tests/unit/security/trust/test_per_category_strategy.py
- Added tests for the PerCategoryTrustStrategy, verifying category-specific trust and global level derivation.
tests/unit/security/trust/test_service.py
- Added tests for the TrustService orchestrator, covering agent initialization, evaluation, and change application.
tests/unit/security/trust/test_weighted_strategy.py
- Added tests for the WeightedTrustStrategy, verifying score computation and level recommendations.

Activity

The pull request introduces two major new subsystems: progressive trust and promotion/demotion for agents.
The author has addressed 33 findings identified by 10 different parallel review agents, demonstrating a thorough review and refinement process.
An additional 23 tests were implemented during the review phase to ensure comprehensive coverage of new features like decay, approval logic, level helpers, re-verification intervals, and time/history criteria.
All automated quality checks, including ruff check, ruff format, mypy, pytest, and pre-commit hooks, have passed successfully.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces critical new subsystems for progressive trust and agent promotion/demotion. However, it introduces critical architectural flaws in the core service orchestrators (PromotionService and TrustService), which trust evaluation and request objects from the caller without verification, allowing easy bypass of human approval gates. Furthermore, the AgentRegistryService contains a generic identity update method vulnerable to mass assignment. These issues collectively enable straightforward privilege escalation. Beyond these critical security concerns, there are also potential improvements regarding a logic bug in the weighted trust score calculation, fragility in how per-category trust transitions are configured, and a potentially incorrect aggregation of metrics across multiple time windows.

gemini-code-assist · 2026-03-10T11:37:26Z

src/ai_company/hr/promotion/service.py

+        approval_id: NotBlankStr | None = None
+        status = ApprovalStatus.PENDING
+
+        if decision.auto_approve:
+            status = ApprovalStatus.APPROVED
+        elif decision.requires_human:
+            if self._approval_store is None:
+                msg = (
+                    f"Promotion for agent {agent_id!r} requires human "
+                    f"approval but no approval store is configured"
+                )
+                logger.warning(
+                    PROMOTION_REQUESTED,
+                    agent_id=agent_id,
+                    error=msg,
+                )
+                raise PromotionError(msg)
+            approval_id = await self._create_approval(
+                agent_id=agent_id,
+                evaluation=evaluation,
+                initiated_by=initiated_by,
+            )


The apply_promotion method trusts the status field of the provided PromotionRequest object without verifying it against a trusted server-side store. Since PromotionRequest is a Pydantic model that can be deserialized from untrusted user input (e.g., in an API endpoint), an attacker can craft a request with status="approved" and an arbitrary target_level (such as "c_suite"). This allows an attacker to bypass the human approval gate and escalate an agent's seniority level and associated authorities.

gemini-code-assist · 2026-03-10T11:37:26Z

src/ai_company/security/trust/service.py

+        self,
+        agent_id: NotBlankStr,
+        result: TrustEvaluationResult,
+    ) -> TrustChangeRecord | None:
+        """Apply a trust level change based on evaluation result.
+
+        If human approval is required, creates an approval item and
+        returns None. The change is applied when the approval is granted.
+
+        Args:
+            agent_id: Agent whose trust to change.
+            result: Evaluation result to apply.
+
+        Returns:
+            Change record if applied, None if awaiting approval.
+
+        Raises:
+            TrustEvaluationError: If agent not initialized.
+        """
+        if not result.should_change:
+            return None
+
+        key = str(agent_id)
+        state = self._trust_states.get(key)
+        if state is None:
+            msg = f"Agent {agent_id!r} not initialized for trust tracking"
+            logger.warning(
+                TRUST_EVALUATE_FAILED,
+                agent_id=agent_id,
+                error=msg,
+            )
+            raise TrustEvaluationError(msg)


The apply_trust_change method trusts the requires_human_approval and recommended_level fields of the TrustEvaluationResult object passed as an argument. An attacker can provide a malicious result object with requires_human_approval=False and recommended_level="elevated" to bypass the progressive trust evaluation logic and the mandatory human approval gate for elevated access. This leads to unauthorized privilege escalation, granting agents access to sensitive tool categories.

gemini-code-assist · 2026-03-10T11:37:26Z

src/ai_company/hr/registry.py

+        **updates: Any,
+    ) -> AgentIdentity:
+        """Update agent identity fields via model_copy(update=...).
+
+        Used by TrustService (tools.access_level) and PromotionService
+        (level, model) to update agent identity in a single mutation point.
+
+        Args:
+            agent_id: The agent identifier.
+            **updates: Fields to update on the AgentIdentity.
+
+        Returns:
+            Updated agent identity.
+
+        Raises:
+            AgentNotFoundError: If the agent is not found.
+        """
+        key = str(agent_id)
+        async with self._lock:
+            identity = self._agents.get(key)
+            if identity is None:
+                msg = f"Agent {agent_id!r} not found in registry"
+                logger.warning(
+                    HR_REGISTRY_IDENTITY_UPDATED,
+                    agent_id=key,
+                    error=msg,
+                )
+                raise AgentNotFoundError(msg)
+            updated = identity.model_copy(update=updates)


The update_identity method uses model_copy(update=updates) with arbitrary keyword arguments (**updates: Any). This implementation allows a caller to overwrite any field in the AgentIdentity model, including security-sensitive fields like authority, status, or tools.access_level. When this method is called with data derived from untrusted sources (as seen in the PromotionService and TrustService orchestrators), it enables mass assignment vulnerabilities and privilege escalation.

gemini-code-assist · 2026-03-10T11:37:26Z

src/ai_company/security/trust/weighted_strategy.py

+        score = (
+            self._weights.task_difficulty * difficulty_factor
+            + self._weights.completion_rate * completion_factor
+            + self._weights.error_rate * error_factor
+            + self._weights.human_feedback * feedback_factor


The calculation for error_factor appears to be redundant with completion_factor. The formula 1.0 - (1.0 - window.success_rate) simplifies to just window.success_rate, which is the same value used for completion_factor. This means the error_rate weight is effectively just added to the completion_rate weight, and there's no distinct penalty for errors. This is likely a bug in the scoring formula design that could lead to incorrect trust scores.

A possible alternative to differentiate the factors would be to give a bonus for a perfect record.

Suggested change

score = (

self._weights.task_difficulty * difficulty_factor

+ self._weights.completion_rate * completion_factor

+ self._weights.error_rate * error_factor

+ self._weights.human_feedback * feedback_factor

error_factor = 1.0

for window in snapshot.windows:

if window.data_point_count > 0 and window.success_rate is not None:

# Give a bonus for a perfect record, otherwise use success rate.

# This makes the factor different from completion_factor.

if window.tasks_failed > 0:

error_factor = window.success_rate

else:

error_factor = 1.0

break

gemini-code-assist · 2026-03-10T11:37:26Z

src/ai_company/hr/promotion/threshold_evaluator.py

+                met=rate_met,
+                current_value=success_rate,
+                threshold=rate_threshold,


The logic total_tasks = max(total_tasks, float(window.tasks_completed)) to aggregate tasks completed across multiple performance windows might be incorrect. If windows are overlapping (e.g., 7d and 30d), this could lead to double-counting or misrepresenting the total work done. Depending on the intent, you might want to use the value from the longest window, or ensure the windows are non-overlapping. This same logic appears in src/ai_company/security/trust/milestone_strategy.py.

gemini-code-assist · 2026-03-10T11:37:26Z

src/ai_company/security/trust/milestone_strategy.py

+
+    @staticmethod
+    def _check_time_and_history(


The logic total_tasks = max(total_tasks, window.tasks_completed) to aggregate tasks completed across multiple performance windows might be incorrect. If windows are overlapping (e.g., 7d and 30d), this could lead to double-counting or misrepresenting the total work done. Depending on the intent, you might want to use the value from the longest window, or ensure the windows are non-overlapping. This same logic appears in src/ai_company/hr/promotion/threshold_evaluator.py.

gemini-code-assist · 2026-03-10T11:37:27Z

src/ai_company/security/trust/per_category_strategy.py

+                ):
+                    try:
+                        new_level = ToolAccessLevel(to_str)
+                    except ValueError:


Parsing the transition key using split('_to_') is fragile. If a tool access level's value ever contains _to_, this logic will break. A more robust approach would be to structure the category_criteria configuration differently, for example as a list of objects with from and to keys, instead of using a dictionary where keys are concatenated strings.

For example, the config could be structured like this:

category_criteria: file_system: - from: sandboxed to: restricted tasks_completed: 5 quality_score_min: 6.0

src/ai_company/security/trust/milestone_strategy.py

src/ai_company/observability/events/promotion.py

Implement two interrelated subsystems: progressive trust (§11.3) with pluggable strategies (disabled, weighted, per-category, milestone) and promotion/demotion (§8.4) with criteria evaluation, approval gates, model mapping, and trust re-evaluation integration. - security/trust/: TrustStrategy protocol, 4 strategy implementations, TrustService orchestrator, config with elevated-gate security invariant - hr/promotion/: PromotionCriteriaStrategy, ApprovalStrategy, and ModelMappingStrategy protocols with initial implementations, PromotionService orchestrator with cooldown and approval store - hr/registry: add update_identity() for generic field updates - config/schema: add trust and promotion to RootConfig - 170+ unit tests across both subsystems, 94.88% coverage

Pre-reviewed by 10 agents, 33 findings addressed: - Add computed_field for PromotionApprovalDecision.requires_human - Add cross-field validators (MilestoneCriteria mutual exclusivity, PromotionRecord model_changed consistency, ModelMappingConfig keys) - Add per-category ELEVATED gate (defense-in-depth) - Fix weighted strategy factor differentiation - Add structured logging to all error/state-transition paths - Replace silent returns with explicit raises in trust/promotion services - Tighten config validation (non-empty milestones, thresholds) - Add 23 new tests covering decay, approval, level helpers, edge cases - Update DESIGN_SPEC.md and CLAUDE.md project structure sections

…t, and Greptile - Fix error_factor/completion_factor redundancy in weighted strategy - Prevent multi-level trust jumps (adjacent-level-only transitions) - Add eligibility check in request_promotion - Add notification placeholder callback for promotion/demotion events - Add created_at field to TrustState for tenure tracking - Fix time_active_days to use agent tenure, not evaluation frequency - Enforce MappingProxyType on seniority_model_map and threshold dicts - Raise PromotionError when human approval needed but no store configured - Best-effort trust re-evaluation after promotion application - Extract shared trust level constants to levels.py - Add direction consistency and no-op model change validators - Fix split("_to_") fragility with maxsplit=1 - Fix _infer_reason to distinguish promotions from demotions - Move last_decay_check_at update after evaluation - Update Mapping type annotation for MappingProxyType compatibility - Add ~20 new tests, fix conditional assertions, add timeout markers - Update DESIGN_SPEC.md and README.md for implementation status

coderabbitai

Actionable comments posted: 9

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

DESIGN_SPEC.md (1)
1655-1661: ⚠️ Potential issue | 🟡 Minor

Update the backlog entry for promotions.

This section now says promotions/demotions ship in hr/promotion/, but §18.1 still lists “Agent promotions” as future work. Please remove that backlog row or narrow it to the still-deferred scope so the spec does not present both states at once. Based on learnings, If implementation deviates from DESIGN_SPEC.md, alert the user and explain why — user decides whether to proceed or update the spec; never silently diverge.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@DESIGN_SPEC.md` around lines 1655 - 1661, The DESIGN_SPEC backlog entry for
“Agent promotions” conflicts with the implemented hr/promotion/ module
(ThresholdEvaluator, SeniorityApprovalStrategy, SeniorityModelMapping,
PromotionService) — update the backlog by either removing the “Agent promotions”
row or narrowing it to any genuinely deferred sub-features not yet implemented;
also add a clear note in the spec that implementation currently diverges from
the original backlog and instruct the maintainer to alert stakeholders and
explain the divergence so they can choose to accept the current implementation
or update the spec.

♻️ Duplicate comments (1)

tests/unit/security/trust/test_service.py (1)

328-404: 🧹 Nitpick | 🔵 Trivial

Make the elevated-gate tests synchronous.

TrustService._enforce_elevated_gate is a regular method, so keeping these three tests async adds needless event-loop setup and obscures what is actually being exercised.

♻️ Suggested cleanup

-    async def test_elevated_gate_enforces_human_approval(
+    def test_elevated_gate_enforces_human_approval(
         self,
         trust_config: TrustConfig,
     ) -> None:
@@
-    async def test_elevated_gate_no_override_when_already_elevated(
+    def test_elevated_gate_no_override_when_already_elevated(
         self,
         trust_config: TrustConfig,
     ) -> None:
@@
-    async def test_elevated_gate_no_override_when_approval_set(
+    def test_elevated_gate_no_override_when_approval_set(
         self,
         trust_config: TrustConfig,
     ) -> None:

#!/bin/bash
# Verify that the production helper is synchronous while these tests are async.
rg -n -C2 'def _enforce_elevated_gate|async def test_elevated_gate_|def test_elevated_gate_' \
  src/ai_company/security/trust/service.py \
  tests/unit/security/trust/test_service.py

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tests/unit/security/trust/test_service.py` around lines 328 - 404, The three
tests (test_elevated_gate_enforces_human_approval,
test_elevated_gate_no_override_when_already_elevated,
test_elevated_gate_no_override_when_approval_set) are marked async but call the
synchronous TrustService._enforce_elevated_gate; change each test from "async
def" to a plain "def" and remove any unnecessary event-loop/await scaffolding so
they run synchronously and directly call service._enforce_elevated_gate; keep
all assertions and setup unchanged.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@README.md`:
- Around line 40-45: The README's "Security/Approval System (M7)" paragraph
understates delivered functionality by implying all approval gating is still
planned; update README.md so it explicitly notes that human approval paths for
elevated trust access and promotion/demotion decisions are implemented (e.g.,
mention "human approval paths for elevated trust and promotion/demotion are in
place"), and scope the outstanding work to the missing authentication and
tool-invocation workflow gates (specifically JWT/OAuth and the approval workflow
gates for tool invocation), referencing the existing "Security/Approval System
(M7)" and "Status" lines so the new wording clarifies shipped features vs
remaining gaps.

In `@src/ai_company/security/trust/config.py`:
- Around line 251-274: The dict fields promotion_thresholds,
initial_category_levels, category_criteria, and milestones are still mutable
after validation and should be converted to read-only mappings to enforce
invariants (e.g., the _validate_elevated_gate check); import MappingProxyType
and wrap a deep-copied version of each dict into MappingProxyType after
validation (use a root validator or post-init hook on the same model class that
defines these fields) — mirror the pattern used for
ModelMappingConfig.seniority_model_map in hr/promotion/config.py: deep-copy the
incoming dicts, validate them, then replace each field with
MappingProxyType(copy.deepcopy(...)) so external code cannot mutate them.

In `@src/ai_company/security/trust/models.py`:
- Around line 48-73: The dict fields category_levels and milestone_progress on
the frozen TrustState can still be mutated after construction; add a Pydantic
validator on TrustState (or field validators for category_levels and
milestone_progress) that wraps incoming dicts with types.MappingProxyType (or
another immutable mapping) so instances are truly immutable, mirroring the
approach used by ModelMappingConfig.seniority_model_map; ensure the validator
handles None/default_factory and preserves typing, and keep using
model_copy(update=...) for state updates.

In `@tests/unit/hr/promotion/test_models.py`:
- Around line 241-354: Combine the four similar tests into a single
pytest.mark.parametrize test that builds one PromotionEvaluation once and calls
PromotionRecord with parameters for model_changed, old_model_id, new_model_id
and the expected ValidationError match; specifically replace the separate
test_model_changed_true_missing_old_model_id_raises,
test_model_changed_true_missing_new_model_id_raises,
test_model_changed_false_with_old_model_id_raises, and
test_model_changed_true_same_model_ids_raises with one function (e.g.,
test_model_changed_validation) decorated with `@pytest.mark.parametrize` over
tuples of (model_changed, old_model_id, new_model_id, expected_error_substring),
and in the test assert pytest.raises(ValidationError,
match=expected_error_substring) when constructing PromotionRecord, reusing the
PromotionEvaluation creation inside the test body.

In `@tests/unit/hr/promotion/test_seniority_approval_strategy.py`:
- Around line 57-135: Collapse the four near-duplicate tests
(test_junior_to_mid_auto_approves, test_mid_to_senior_requires_human,
test_senior_to_lead_requires_human, test_custom_threshold_lead) into a single
parametrized test using pytest.mark.parametrize that iterates rows of
(current_level, target_level, config, expected_auto_approve,
expected_requires_human); inside the test instantiate SeniorityApprovalStrategy
(use PromotionApprovalConfig where a non-default threshold is needed), build the
evaluation with _make_evaluation, create identity via make_agent_identity, call
strategy.decide, and assert decision.auto_approve and decision.requires_human
for each row — this removes duplication while still exercising
SeniorityApprovalStrategy.decide and keeping PromotionApprovalConfig,
_make_evaluation, and make_agent_identity references for locating the code.

In `@tests/unit/hr/promotion/test_service.py`:
- Around line 1-40: Add a 30-second per-test timeout marker to the module by
defining pytestmark = pytest.mark.timeout(30) near the top of the file (add
import pytest if missing); this ensures all async tests in the PromotionService
test module use a 30s timeout and prevents hangs in approval/tracker code paths.

In `@tests/unit/security/trust/conftest.py`:
- Around line 43-62: The helper in make_performance_snapshot computes total
using int(tasks_completed / success_rate) which truncates and yields
inconsistent counts vs the stored float success_rate; change it to use ceiling
(e.g., math.ceil(tasks_completed / success_rate)) so total >= tasks_completed
and tasks_failed = total - tasks_completed reflects at least one inferred
failure when the float success_rate implies it; update the success_rate>0 branch
to import/use math.ceil, compute total = max(1, math.ceil(tasks_completed /
success_rate)), then recompute tasks_failed before constructing WindowMetrics
(window_size, data_point_count, tasks_completed, tasks_failed,
avg_quality_score, success_rate).

In `@tests/unit/security/trust/test_config.py`:
- Around line 250-259: The test shows promotion_thresholds is a plain dict so
typos are ignored; update WeightedTrustStrategy to validate keys on
construction: in WeightedTrustStrategy.__init__ (or the config loader) define
the exact allowed transition names (e.g., "sandboxed_to_restricted",
"restricted_to_something" — whatever the class expects) and check
promotion_thresholds.keys() against that set, raising a ValueError (or
ConfigError) listing unknown keys when any extra/misspelled keys are present;
this ensures TrustStrategyType.WEIGHTED / promotion_thresholds / TrustThreshold
mis-keys fail fast.

In `@tests/unit/security/trust/test_weighted_strategy.py`:
- Around line 74-100: Add a non-floor demotion regression test: create a new
async test in tests/unit/security/trust/test_weighted_strategy.py that mirrors
the existing low-score test but starts current_state.global_level at
ToolAccessLevel.RESTRICTED (or STANDARD) with a non-zero trust_score; provide a
snapshot that causes the WeightedTrustStrategy.evaluate to produce a collapsing
score below the demotion threshold and assert that result.score is not None and
below the relevant threshold, result.recommended_level equals the next-lower
ToolAccessLevel (e.g., RESTRICTED -> SANDBOXED or STANDARD -> RESTRICTED) and
result.should_change is True; reference WeightedTrustStrategy.evaluate,
TrustState, TrustConfig, and ToolAccessLevel when locating code to ensure the
strategy supports downward transitions and the test fails if only floor demotion
is implemented.

---

Outside diff comments:
In `@DESIGN_SPEC.md`:
- Around line 1655-1661: The DESIGN_SPEC backlog entry for “Agent promotions”
conflicts with the implemented hr/promotion/ module (ThresholdEvaluator,
SeniorityApprovalStrategy, SeniorityModelMapping, PromotionService) — update the
backlog by either removing the “Agent promotions” row or narrowing it to any
genuinely deferred sub-features not yet implemented; also add a clear note in
the spec that implementation currently diverges from the original backlog and
instruct the maintainer to alert stakeholders and explain the divergence so they
can choose to accept the current implementation or update the spec.

---

Duplicate comments:
In `@tests/unit/security/trust/test_service.py`:
- Around line 328-404: The three tests
(test_elevated_gate_enforces_human_approval,
test_elevated_gate_no_override_when_already_elevated,
test_elevated_gate_no_override_when_approval_set) are marked async but call the
synchronous TrustService._enforce_elevated_gate; change each test from "async
def" to a plain "def" and remove any unnecessary event-loop/await scaffolding so
they run synchronously and directly call service._enforce_elevated_gate; keep
all assertions and setup unchanged.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: b19a7bbb-3237-488e-90a0-574b1a694f96

📥 Commits

Reviewing files that changed from the base of the PR and between b72b304 and cfeba92.

📒 Files selected for processing (29)

DESIGN_SPEC.md
README.md
src/ai_company/api/approval_store.py
src/ai_company/config/schema.py
src/ai_company/hr/promotion/config.py
src/ai_company/hr/promotion/models.py
src/ai_company/hr/promotion/service.py
src/ai_company/hr/promotion/threshold_evaluator.py
src/ai_company/observability/events/hr.py
src/ai_company/observability/events/promotion.py
src/ai_company/security/trust/config.py
src/ai_company/security/trust/levels.py
src/ai_company/security/trust/milestone_strategy.py
src/ai_company/security/trust/models.py
src/ai_company/security/trust/per_category_strategy.py
src/ai_company/security/trust/service.py
src/ai_company/security/trust/weighted_strategy.py
tests/unit/hr/promotion/conftest.py
tests/unit/hr/promotion/test_config.py
tests/unit/hr/promotion/test_models.py
tests/unit/hr/promotion/test_seniority_approval_strategy.py
tests/unit/hr/promotion/test_seniority_model_mapping.py
tests/unit/hr/promotion/test_service.py
tests/unit/hr/promotion/test_threshold_evaluator.py
tests/unit/security/trust/conftest.py
tests/unit/security/trust/test_config.py
tests/unit/security/trust/test_milestone_strategy.py
tests/unit/security/trust/test_service.py
tests/unit/security/trust/test_weighted_strategy.py

📜 Review details

🧰 Additional context used

📓 Path-based instructions (3)

**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use from ai_company.observability import get_logger and instantiate logger as logger = get_logger(__name__) in every module with business logic

Files:

tests/unit/hr/promotion/test_threshold_evaluator.py
tests/unit/security/trust/test_milestone_strategy.py
tests/unit/security/trust/conftest.py
src/ai_company/security/trust/levels.py
src/ai_company/security/trust/per_category_strategy.py
tests/unit/security/trust/test_service.py
tests/unit/hr/promotion/conftest.py
tests/unit/hr/promotion/test_seniority_model_mapping.py
src/ai_company/observability/events/hr.py
src/ai_company/security/trust/config.py
tests/unit/hr/promotion/test_config.py
src/ai_company/security/trust/weighted_strategy.py
tests/unit/hr/promotion/test_seniority_approval_strategy.py
src/ai_company/hr/promotion/service.py
src/ai_company/config/schema.py
tests/unit/hr/promotion/test_models.py
tests/unit/hr/promotion/test_service.py
tests/unit/security/trust/test_weighted_strategy.py
src/ai_company/hr/promotion/models.py
src/ai_company/security/trust/milestone_strategy.py
src/ai_company/security/trust/models.py
src/ai_company/hr/promotion/threshold_evaluator.py
src/ai_company/hr/promotion/config.py
src/ai_company/observability/events/promotion.py
tests/unit/security/trust/test_config.py
src/ai_company/security/trust/service.py
src/ai_company/api/approval_store.py

tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Add test markers @pytest.mark.unit, @pytest.mark.integration, @pytest.mark.e2e, or @pytest.mark.slow to all test files
Maintain 80% minimum code coverage — enforced in CI with pytest --cov=ai_company --cov-fail-under=80
Use asyncio_mode = 'auto' in pytest configuration — no manual @pytest.mark.asyncio needed on async tests
Set test timeout to 30 seconds per test — use @pytest.mark.timeout(30) or configure in pytest.ini
Use @pytest.mark.parametrize for testing similar cases instead of duplicating test functions

Files:

tests/unit/hr/promotion/test_threshold_evaluator.py
tests/unit/security/trust/test_milestone_strategy.py
tests/unit/security/trust/conftest.py
tests/unit/security/trust/test_service.py
tests/unit/hr/promotion/conftest.py
tests/unit/hr/promotion/test_seniority_model_mapping.py
tests/unit/hr/promotion/test_config.py
tests/unit/hr/promotion/test_seniority_approval_strategy.py
tests/unit/hr/promotion/test_models.py
tests/unit/hr/promotion/test_service.py
tests/unit/security/trust/test_weighted_strategy.py
tests/unit/security/trust/test_config.py

src/ai_company/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/ai_company/**/*.py: Never use import logging, logging.getLogger(), or print() in application code — only use the centralized logger
Use event name constants from ai_company.observability.events.<domain> modules (e.g., PROVIDER_CALL_START from events.provider, BUDGET_RECORD_ADDED from events.budget) instead of string literals
Always use structured logging with logger.info(EVENT, key=value) format — never use format strings like logger.info('msg %s', val)
No from __future__ import annotations — Python 3.14 has PEP 649 native lazy annotations
Use except A, B: syntax (no parentheses) per PEP 758 — ruff enforces this on Python 3.14
Add type hints to all public functions and classes; mypy strict mode is enforced
Add Google-style docstrings to all public classes and functions — ruff D rules enforce this
Use immutability principles: create new objects instead of mutating existing ones; for non-Pydantic collections use copy.deepcopy() at construction and MappingProxyType for read-only enforcement
Use frozen Pydantic models for config/identity; use separate mutable-via-copy models (with model_copy(update=...)) for runtime state that evolves
Use NotBlankStr from core.types for all identifier and name fields instead of manual whitespace validators, including optional (NotBlankStr | None) and tuple variants
Use @computed_field for derived values in Pydantic models instead of storing and validating redundant fields (e.g., TokenUsage.total_tokens)
Prefer asyncio.TaskGroup for fan-out/fan-in parallel operations in new code (e.g., multiple tool invocations, parallel agent calls) over bare create_task
Keep functions under 50 lines and files under 800 lines
Keep line length at 88 characters (enforced by ruff)
Handle all errors explicitly; never silently swallow exceptions
Validate at system boundaries (user input, external APIs, config files)
Never use real vendor names (Anthropic, OpenAI, Claude, GPT) in project-owned code, docstrings...

Files:

src/ai_company/security/trust/levels.py
src/ai_company/security/trust/per_category_strategy.py
src/ai_company/observability/events/hr.py
src/ai_company/security/trust/config.py
src/ai_company/security/trust/weighted_strategy.py
src/ai_company/hr/promotion/service.py
src/ai_company/config/schema.py
src/ai_company/hr/promotion/models.py
src/ai_company/security/trust/milestone_strategy.py
src/ai_company/security/trust/models.py
src/ai_company/hr/promotion/threshold_evaluator.py
src/ai_company/hr/promotion/config.py
src/ai_company/observability/events/promotion.py
src/ai_company/security/trust/service.py
src/ai_company/api/approval_store.py

🧠 Learnings (11)

📚 Learning: 2026-03-10T09:29:47.581Z

Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-10T09:29:47.581Z
Learning: Applies to tests/**/*.py : Set test timeout to 30 seconds per test — use `pytest.mark.timeout(30)` or configure in pytest.ini

Applied to files:

tests/unit/hr/promotion/test_threshold_evaluator.py
tests/unit/hr/promotion/test_seniority_model_mapping.py
tests/unit/hr/promotion/test_config.py
tests/unit/hr/promotion/test_seniority_approval_strategy.py
tests/unit/hr/promotion/test_models.py

📚 Learning: 2026-03-10T09:29:47.581Z

Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-10T09:29:47.581Z
Learning: Applies to tests/**/*.py : Add test markers `pytest.mark.unit`, `pytest.mark.integration`, `pytest.mark.e2e`, or `pytest.mark.slow` to all test files

Applied to files:

tests/unit/hr/promotion/test_threshold_evaluator.py
tests/unit/hr/promotion/test_seniority_model_mapping.py
tests/unit/hr/promotion/test_config.py
tests/unit/hr/promotion/test_seniority_approval_strategy.py
tests/unit/hr/promotion/test_models.py

📚 Learning: 2026-03-10T09:29:47.581Z

Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-10T09:29:47.581Z
Learning: Applies to tests/**/*.py : Maintain 80% minimum code coverage — enforced in CI with `pytest --cov=ai_company --cov-fail-under=80`

Applied to files:

tests/unit/hr/promotion/test_threshold_evaluator.py
tests/unit/hr/promotion/test_config.py

📚 Learning: 2026-03-10T09:29:47.580Z

Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-10T09:29:47.580Z
Learning: Applies to src/ai_company/**/*.py : Use event name constants from `ai_company.observability.events.<domain>` modules (e.g., `PROVIDER_CALL_START` from `events.provider`, `BUDGET_RECORD_ADDED` from `events.budget`) instead of string literals

Applied to files:

src/ai_company/observability/events/hr.py
src/ai_company/hr/promotion/service.py
src/ai_company/observability/events/promotion.py

📚 Learning: 2026-03-10T09:29:47.581Z

Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-10T09:29:47.581Z
Learning: Applies to src/ai_company/**/*.py : Use frozen Pydantic models for config/identity; use separate mutable-via-copy models (with `model_copy(update=...)`) for runtime state that evolves

Applied to files:

src/ai_company/security/trust/config.py
tests/unit/hr/promotion/test_config.py
src/ai_company/config/schema.py
src/ai_company/hr/promotion/models.py
src/ai_company/security/trust/models.py
src/ai_company/hr/promotion/config.py

📚 Learning: 2026-03-10T09:29:47.581Z

Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-10T09:29:47.581Z
Learning: Applies to src/ai_company/**/*.py : Validate at system boundaries (user input, external APIs, config files)

Applied to files:

src/ai_company/security/trust/config.py

📚 Learning: 2026-03-10T09:29:47.580Z

Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-10T09:29:47.580Z
Learning: Applies to src/ai_company/**/*.py : Always use structured logging with `logger.info(EVENT, key=value)` format — never use format strings like `logger.info('msg %s', val)`

Applied to files:

src/ai_company/security/trust/config.py

📚 Learning: 2026-03-10T09:29:47.581Z

Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-10T09:29:47.581Z
Learning: Applies to src/ai_company/**/*.py : Use immutability principles: create new objects instead of mutating existing ones; for non-Pydantic collections use `copy.deepcopy()` at construction and `MappingProxyType` for read-only enforcement

Applied to files:

tests/unit/hr/promotion/test_config.py
src/ai_company/security/trust/models.py
src/ai_company/hr/promotion/config.py

📚 Learning: 2026-03-10T09:29:47.581Z

Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-10T09:29:47.581Z
Learning: If implementation deviates from DESIGN_SPEC.md, alert the user and explain why — user decides whether to proceed or update the spec; never silently diverge

Applied to files:

DESIGN_SPEC.md

📚 Learning: 2026-03-10T09:29:47.581Z

Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-10T09:29:47.581Z
Learning: When a spec section is referenced (e.g. 'Section 10.2'), read that section verbatim before coding and update `DESIGN_SPEC.md` after approved deviations

Applied to files:

DESIGN_SPEC.md

📚 Learning: 2026-03-10T09:29:47.581Z

Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-10T09:29:47.581Z
Learning: Always read `DESIGN_SPEC.md` before implementing any feature or planning any issue — treat it as the starting point for architecture, data models, and behavior

Applied to files:

DESIGN_SPEC.md

🧬 Code graph analysis (20)

tests/unit/hr/promotion/test_threshold_evaluator.py (5)

src/ai_company/hr/enums.py (1)

PromotionDirection (44-48)

src/ai_company/hr/promotion/threshold_evaluator.py (3)

ThresholdEvaluator (47-199)

name (58-60)

evaluate (62-133)

tests/unit/hr/promotion/conftest.py (1)

make_performance_snapshot (26-69)

src/ai_company/hr/promotion/config.py (1)

PromotionCriteriaConfig (17-36)

src/ai_company/hr/promotion/models.py (1)

criteria_met_count (103-105)

tests/unit/security/trust/test_milestone_strategy.py (11)

src/ai_company/core/enums.py (1)

ToolAccessLevel (274-291)

src/ai_company/security/trust/config.py (3)

MilestoneCriteria (127-184)

ReVerificationConfig (187-218)

TrustConfig (221-376)

src/ai_company/security/trust/milestone_strategy.py (4)

MilestoneTrustStrategy (37-280)

name (51-53)

initial_state (137-150)

evaluate (55-135)

src/ai_company/security/trust/models.py (1)

TrustState (22-73)

tests/unit/hr/promotion/conftest.py (1)

make_performance_snapshot (26-69)

tests/unit/security/trust/test_weighted_strategy.py (1)

test_name_property (24-26)

tests/unit/security/trust/test_per_category_strategy.py (1)

test_name_property (43-46)

tests/unit/security/trust/test_disabled_strategy.py (1)

test_name_property (18-20)

src/ai_company/security/trust/per_category_strategy.py (3)

name (41-43)

initial_state (135-148)

evaluate (45-133)

src/ai_company/security/trust/weighted_strategy.py (3)

name (48-50)

initial_state (100-113)

evaluate (52-98)

src/ai_company/security/trust/disabled_strategy.py (3)

name (39-41)

initial_state (82-94)

evaluate (43-80)

tests/unit/security/trust/conftest.py (4)

src/ai_company/core/enums.py (1)

ToolAccessLevel (274-291)

src/ai_company/hr/performance/models.py (2)

AgentPerformanceSnapshot (266-301)

WindowMetrics (197-263)

src/ai_company/security/trust/config.py (5)

MilestoneCriteria (127-184)

ReVerificationConfig (187-218)

TrustConfig (221-376)

TrustThreshold (21-35)

WeightedTrustWeights (38-96)

src/ai_company/security/trust/enums.py (1)

TrustStrategyType (6-12)

src/ai_company/security/trust/levels.py (1)

src/ai_company/core/enums.py (1)

ToolAccessLevel (274-291)

src/ai_company/security/trust/per_category_strategy.py (5)

src/ai_company/core/enums.py (1)

ToolAccessLevel (274-291)

src/ai_company/security/trust/models.py (1)

TrustEvaluationResult (113-152)

src/ai_company/hr/performance/models.py (1)

AgentPerformanceSnapshot (266-301)

src/ai_company/security/trust/config.py (2)

CategoryTrustCriteria (99-124)

TrustConfig (221-376)

src/ai_company/security/trust/protocol.py (2)

name (24-26)

evaluate (28-45)

tests/unit/hr/promotion/conftest.py (5)

src/ai_company/core/enums.py (1)

SeniorityLevel (6-21)

src/ai_company/hr/promotion/config.py (4)

ModelMappingConfig (66-103)

PromotionApprovalConfig (39-63)

PromotionConfig (106-139)

PromotionCriteriaConfig (17-36)

src/ai_company/hr/promotion/approval_protocol.py (1)

name (26-28)

src/ai_company/hr/promotion/criteria_protocol.py (1)

name (24-26)

src/ai_company/hr/promotion/model_mapping_protocol.py (1)

name (23-25)

tests/unit/hr/promotion/test_seniority_model_mapping.py (3)

src/ai_company/core/enums.py (1)

SeniorityLevel (6-21)

src/ai_company/hr/promotion/config.py (1)

ModelMappingConfig (66-103)

src/ai_company/hr/promotion/seniority_model_mapping.py (3)

SeniorityModelMapping (22-87)

name (35-37)

resolve_model (39-87)

src/ai_company/security/trust/config.py (3)

src/ai_company/core/enums.py (1)

ToolAccessLevel (274-291)

src/ai_company/observability/_logger.py (1)

get_logger (8-28)

src/ai_company/security/trust/enums.py (1)

TrustStrategyType (6-12)

tests/unit/hr/promotion/test_config.py (5)

src/ai_company/core/enums.py (1)

SeniorityLevel (6-21)

src/ai_company/hr/promotion/config.py (1)

PromotionCriteriaConfig (17-36)

tests/unit/hr/promotion/test_models.py (5)

test_frozen (50-59)

test_frozen (127-141)

test_frozen (171-178)

test_frozen (215-239)

test_frozen (457-480)

tests/unit/security/trust/test_config.py (2)

test_frozen (56-59)

test_frozen (122-125)

tests/unit/security/trust/test_models.py (3)

test_frozen (55-58)

test_frozen (144-153)

test_frozen (229-237)

src/ai_company/security/trust/weighted_strategy.py (5)

src/ai_company/core/enums.py (1)

ToolAccessLevel (274-291)

src/ai_company/security/trust/models.py (2)

TrustEvaluationResult (113-152)

TrustState (22-73)

src/ai_company/hr/performance/models.py (1)

AgentPerformanceSnapshot (266-301)

src/ai_company/security/trust/config.py (2)

TrustConfig (221-376)

TrustThreshold (21-35)

src/ai_company/security/trust/protocol.py (3)

name (24-26)

evaluate (28-45)

initial_state (47-56)

tests/unit/hr/promotion/test_seniority_approval_strategy.py (6)

src/ai_company/core/enums.py (1)

SeniorityLevel (6-21)

src/ai_company/hr/enums.py (1)

PromotionDirection (44-48)

src/ai_company/hr/promotion/config.py (1)

PromotionApprovalConfig (39-63)

src/ai_company/hr/promotion/models.py (2)

PromotionEvaluation (49-105)

requires_human (123-125)

src/ai_company/hr/promotion/seniority_approval_strategy.py (2)

name (39-41)

decide (43-70)

src/ai_company/hr/promotion/criteria_protocol.py (1)

name (24-26)

src/ai_company/hr/promotion/service.py (10)

src/ai_company/hr/enums.py (1)

PromotionDirection (44-48)

src/ai_company/hr/promotion/models.py (3)

PromotionEvaluation (49-105)

requires_human (123-125)

criteria_met_count (103-105)

src/ai_company/hr/promotion/approval_protocol.py (3)

PromotionApprovalStrategy (18-45)

decide (30-45)

name (26-28)

src/ai_company/hr/promotion/config.py (1)

PromotionConfig (106-139)

src/ai_company/hr/promotion/model_mapping_protocol.py (3)

ModelMappingStrategy (15-42)

name (23-25)

resolve_model (27-42)

src/ai_company/hr/registry.py (2)

AgentRegistryService (32-242)

get (101-111)

src/ai_company/hr/promotion/threshold_evaluator.py (2)

evaluate (62-133)

name (58-60)

src/ai_company/security/trust/disabled_strategy.py (2)

evaluate (43-80)

name (39-41)

src/ai_company/hr/promotion/seniority_approval_strategy.py (2)

decide (43-70)

name (39-41)

src/ai_company/hr/promotion/seniority_model_mapping.py (2)

name (35-37)

resolve_model (39-87)

src/ai_company/config/schema.py (2)

src/ai_company/hr/promotion/config.py (1)

PromotionConfig (106-139)

src/ai_company/security/trust/config.py (1)

TrustConfig (221-376)

tests/unit/hr/promotion/test_models.py (1)

src/ai_company/hr/promotion/models.py (7)

CriterionResult (24-46)

PromotionApprovalDecision (108-125)

PromotionEvaluation (49-105)

PromotionRecord (128-208)

PromotionRequest (211-251)

criteria_met_count (103-105)

requires_human (123-125)

tests/unit/security/trust/test_weighted_strategy.py (3)

src/ai_company/core/enums.py (1)

ToolAccessLevel (274-291)

src/ai_company/security/trust/models.py (1)

TrustState (22-73)

src/ai_company/security/trust/weighted_strategy.py (4)

WeightedTrustStrategy (26-200)

name (48-50)

initial_state (100-113)

evaluate (52-98)

src/ai_company/security/trust/milestone_strategy.py (5)

src/ai_company/core/enums.py (1)

ToolAccessLevel (274-291)

src/ai_company/security/trust/models.py (2)

TrustEvaluationResult (113-152)

TrustState (22-73)

src/ai_company/hr/performance/models.py (1)

AgentPerformanceSnapshot (266-301)

src/ai_company/security/trust/config.py (2)

MilestoneCriteria (127-184)

TrustConfig (221-376)

src/ai_company/security/trust/protocol.py (3)

name (24-26)

evaluate (28-45)

initial_state (47-56)

src/ai_company/security/trust/models.py (2)

src/ai_company/core/enums.py (1)

ToolAccessLevel (274-291)

src/ai_company/security/trust/enums.py (1)

TrustChangeReason (15-24)

src/ai_company/hr/promotion/threshold_evaluator.py (5)

src/ai_company/core/enums.py (2)

SeniorityLevel (6-21)

compare_seniority (51-64)

src/ai_company/hr/enums.py (1)

PromotionDirection (44-48)

src/ai_company/hr/promotion/models.py (2)

CriterionResult (24-46)

PromotionEvaluation (49-105)

src/ai_company/hr/performance/models.py (1)

AgentPerformanceSnapshot (266-301)

src/ai_company/hr/promotion/config.py (1)

PromotionCriteriaConfig (17-36)

src/ai_company/hr/promotion/config.py (1)

src/ai_company/core/enums.py (1)

SeniorityLevel (6-21)

tests/unit/security/trust/test_config.py (3)

src/ai_company/core/enums.py (1)

ToolAccessLevel (274-291)

src/ai_company/security/trust/config.py (6)

CategoryTrustCriteria (99-124)

MilestoneCriteria (127-184)

ReVerificationConfig (187-218)

TrustConfig (221-376)

TrustThreshold (21-35)

WeightedTrustWeights (38-96)

src/ai_company/security/trust/enums.py (1)

TrustStrategyType (6-12)

🪛 LanguageTool

README.md

[typographical] ~33-~33: To join two clauses or introduce examples, consider using an em dash.
Context: ...aggregation - Progressive Trust (M7) - 4 strategies (disabled/weighted/per-cate...

(DASH_RULE)

[typographical] ~34-~34: To join two clauses or introduce examples, consider using an em dash.
Context: ...evaluation - Promotion/Demotion (M7) - Criteria evaluation (ThresholdEvaluator)...

(DASH_RULE)

[grammar] ~34-~34: Please add a punctuation mark at the end of paragraph.
Context: ...eniorityModelMapping), PromotionService orchestrator ### Not implemented yet (planned miles...

(PUNCTUATION_PARAGRAPH_END)

[typographical] ~38-~38: To join two clauses or introduce examples, consider using an em dash.
Context: ...ones) - Memory Backend Adapter (M5) - Memory protocols, retrieval pipeline, or...

(DASH_RULE)

[typographical] ~39-~39: To join two clauses or introduce examples, consider using an em dash.
Context: ..., Temporal KG) planned - CLI Surface - cli/ package is placeholder-only - **S...

(DASH_RULE)

[typographical] ~40-~40: To join two clauses or introduce examples, consider using an em dash.
Context: ...only - Security/Approval System (M7) - SecOps agent with rule engine (soft-allo...

(DASH_RULE)

[typographical] ~41-~41: To join two clauses or introduce examples, consider using an em dash.
Context: ...e planned - Advanced Product Surface - web dashboard, external integrations ##...

(DASH_RULE)

🔇 Additional comments (38)

src/ai_company/observability/events/hr.py (1)

37-37: LGTM!

The new event constant follows the established hr.<subject>.<action> naming convention and is correctly placed within the existing Registry section.

src/ai_company/security/trust/levels.py (1)

1-39: LGTM!

Clean extraction of shared trust level constants. The canonical ordering tuple and rank lookup are correctly derived from ToolAccessLevel enum members, and CUSTOM is intentionally excluded since it bypasses the hierarchy. The use of tuples ensures immutability of the ordering and transition definitions.

src/ai_company/security/trust/models.py (1)

76-152: LGTM!

TrustChangeRecord and TrustEvaluationResult are well-structured frozen models with appropriate field types. The @computed_field decorator for should_change is a clean way to derive whether a trust update is warranted. The allow_inf_nan=False config on TrustEvaluationResult provides proper score validation.

src/ai_company/security/trust/config.py (3)

21-97: LGTM!

TrustThreshold and WeightedTrustWeights are well-designed with appropriate field constraints. The weights sum validation with _WEIGHTS_SUM_TOLERANCE handles floating-point precision correctly, and the structured logging provides good observability for configuration errors.

99-185: LGTM!

CategoryTrustCriteria and MilestoneCriteria have appropriate defaults and constraints. The mutual exclusivity validation on auto_promote and requires_human_approval with structured logging is correctly implemented.

276-376: LGTM!

The _validate_elevated_gate and _validate_strategy_specific_fields validators provide strong security guarantees. The elevated gate invariant is enforced across all strategy types (weighted, milestone, per-category), and strategy-specific required fields are validated with clear error messages.

src/ai_company/hr/promotion/config.py (3)

17-64: LGTM!

PromotionCriteriaConfig and PromotionApprovalConfig are well-structured frozen models with appropriate defaults and constraints. Using tuple[NotBlankStr, ...] for required_criteria ensures immutability.

66-104: LGTM!

The seniority_model_map immutability concern from the previous review has been properly addressed. The validator correctly validates keys against SeniorityLevel values, deep-copies the map, and wraps it in MappingProxyType for read-only enforcement.

106-139: LGTM!

PromotionConfig aggregates sub-configs appropriately with sensible defaults. The allow_inf_nan=False constraint provides additional safety for numeric fields.

src/ai_company/config/schema.py (1)

23-23: LGTM!

Clean integration of TrustConfig and PromotionConfig into RootConfig. The new fields follow the established pattern of using default_factory for nested frozen configs, and the docstring is appropriately updated.

Also applies to: 31-31, 498-499, 599-606

src/ai_company/security/trust/weighted_strategy.py (6)

115-161: LGTM - scoring factors are now distinct.

The previous concern about error_rate double-counting success_rate has been addressed. The scoring factors are now:

completion_factor: uses window.success_rate directly

error_factor: uses 1.0 - (tasks_failed / data_point_count), which is semantically distinct since data_point_count includes non-task events

The weighted computation correctly applies configured weights to independent signals.

163-184: LGTM - single-step transitions enforced.

The previous concern about multi-level jumps has been addressed. The _score_to_level method now correctly checks if from_level != current_level: continue, ensuring only adjacent-level promotions are considered.

26-51: LGTM!

The class docstring clearly documents the four weighted factors and their sources. Initialization correctly stores config references for weights and thresholds.

52-114: LGTM!

The evaluate method follows the protocol interface correctly with structured logging events. The initial_state method properly initializes trust state with the configured level and a zero score.

186-200: LGTM!

_check_human_approval correctly looks up the threshold for the specific transition and returns the requires_human_approval flag.

163-184: Weighted strategy only handles promotions; demotion is by design handled elsewhere.

_score_to_level intentionally evaluates upward transitions only—TRANSITION_KEYS contains only upward transitions and the config uses promotion_thresholds (no demotion thresholds). Demotion is handled via the service layer by comparing recommended_level vs current_level (reason: TRUST_DECAY), with explicit decay logic in the milestone strategy for idle/error-rate conditions.

Minor note: The module docstring mentions "promotes/demotes" but this strategy only promotes; consider clarifying it as "promotes agents based on weighted score."

src/ai_company/security/trust/milestone_strategy.py (5)

191-216: LGTM - tenure calculation fixed.

The previous concern about time_active_days measuring evaluation frequency instead of agent tenure has been addressed. The code now correctly uses state.created_at (line 203) with an explicit comment (line 200) clarifying this is "agent tenure" rather than time since last evaluation.

37-54: LGTM!

Clean initialization storing config, milestones, and re-verification settings. The name property correctly identifies the strategy.

55-136: LGTM!

The evaluate method correctly:

Checks for milestone-based promotion (single-step via TRANSITION_KEYS)

Checks for decay when re-verification is enabled and agent is above SANDBOXED

Builds a detailed result with joined reason parts

The structured logging with event constants provides good observability.

218-280: LGTM!

The _check_decay method comprehensively handles three decay conditions:

Idle decay: Demotes if idle_days >= decay_on_idle_days

Error rate decay: Demotes if error rate exceeds threshold

Re-verification failure: Demotes if quality score drops below minimum after verification interval

Each decay condition is logged with TRUST_DECAY_DETECTED and the specific reason, providing excellent observability.

171-190: Single-window max approach is intentional—agents must demonstrate tasks within a recent rolling window period.

The code correctly uses the maximum tasks_completed value across all rolling time windows (7d, 30d, 90d) rather than summing across windows. This ensures agents demonstrate capability within a recent period rather than relying purely on historical cumulative performance. The design is consistent across other checks (clean history validation, error rate decay) and aligns with the rolling-window architecture of the performance tracking system.

src/ai_company/security/trust/per_category_strategy.py (1)

1-178: LGTM!

The implementation correctly addresses the past review comment by renaming the variable to max_tasks_completed (line 166), clarifying that it represents the best single-window count rather than a cumulative total. The per-category evaluation logic, global level derivation (minimum rank across categories), and elevated gate enforcement are all well-structured.

src/ai_company/security/trust/service.py (2)

199-224: LGTM!

The past review comment about last_promoted_at being incorrectly updated for demotions has been addressed. The code now correctly uses TRUST_LEVEL_RANK comparison (lines 204-206) to determine if the change is a promotion before conditionally updating last_promoted_at (lines 211-212).

359-385: LGTM!

The past review comment about _infer_reason not distinguishing promotions from demotions has been addressed. The method now checks for demotion first (lines 372-376) and returns TRUST_DECAY appropriately, while promotions get strategy-specific reasons.

src/ai_company/hr/promotion/models.py (2)

192-208: LGTM!

The past review comment about rejecting no-op model changes has been addressed. The validator now includes a check (lines 200-202) that raises ValueError when model_changed=True but old_model_id == new_model_id, preventing misleading history records.

1-251: Well-designed frozen models.

All models correctly use ConfigDict(frozen=True) for immutability, NotBlankStr for identifier fields, and @computed_field for derived values (criteria_met_count, requires_human). The PromotionEvaluation._validate_direction_consistency validator (lines 89-99) ensures data integrity between direction and level ordering. Based on learnings: "Use frozen Pydantic models for config/identity."

src/ai_company/hr/promotion/threshold_evaluator.py (3)

161-179: LGTM!

The past review comment about clarifying aggregation strategies has been addressed. Line 161 documents "use most recent window value" for success_rate, explaining the break on first non-None value.

181-197: LGTM!

The past review comment has been addressed. Line 181 documents "use peak value across windows" for tasks_completed, explaining the max() aggregation strategy.

1-199: Well-structured threshold evaluator.

The implementation correctly uses MappingProxyType for read-only threshold defaults (lines 30-44), direction-aware threshold selection (lines 95-99), and proper N-of-M eligibility logic (line 112). Event constants and structured logging follow coding guidelines.

src/ai_company/observability/events/promotion.py (1)

1-21: LGTM!

The past review comment about DEMOTION_APPLIED namespace has been addressed. Line 19 now correctly uses "promotion.demotion.applied" to maintain consistency with the promotion.<subject>.<action> naming convention, ensuring it will be captured by promotion-related observability filters.

src/ai_company/hr/promotion/service.py (4)

156-174: LGTM!

The past review comment about using the wrong event for error logging has been addressed. The code now correctly uses PROMOTION_EVALUATE_FAILED (lines 159, 169) for error conditions instead of PROMOTION_EVALUATE_START.

458-473: LGTM!

The past review comment about the discarded trust evaluation result has been addressed. Lines 458-459 now include a comment explaining the intent: "Best-effort trust re-evaluation — promotion is already applied, so failures here must not prevent the record from being returned." The side-effect-only pattern is now clearly documented.

534-551: LGTM!

The past review comment about checking approval store availability early has been addressed. The _approval_store is None check (lines 544-551) is now at the top of _create_approval, before any object creation, avoiding unnecessary work when no store is configured.

1-589: Well-orchestrated promotion service.

The service properly coordinates criteria evaluation, approval decisions, model mapping, and trust integration. Error handling follows best-effort patterns where appropriate (trust re-evaluation, notifications) with proper logging. All past review comments have been addressed.

src/ai_company/api/approval_store.py (1)

8-8: LGTM!

The Callable import is correctly moved to top-level since it's used at runtime in the __init__ type hint (line 37). The noqa: TC003 suppression is appropriate.

tests/unit/hr/promotion/conftest.py (3)

43-53: LGTM!

The past review comment about the success_rate=0 edge case has been addressed. The code now correctly handles three scenarios:

success_rate > 0: computes total and tasks_failed normally

success_rate == 0 with tasks_completed > 0: treats all tasks as failures (lines 46-50)

Both zero: sets everything to 0 (lines 51-53)

72-91: LGTM!

The make_agent_identity builder correctly uses generic test names ("test-provider", "test-small-001") instead of real vendor names, complying with the coding guideline: "Never use real vendor names in tests."

97-130: LGTM!

The fixtures provide clean defaults for promotion workflow tests, aligning with the configuration models defined in src/ai_company/hr/promotion/config.py.

README.md

coderabbitai · 2026-03-10T11:54:07Z

src/ai_company/security/trust/config.py

+    promotion_thresholds: dict[str, TrustThreshold] = Field(
+        default_factory=dict,
+        description="Thresholds for trust level transitions",
+    )
+
+    # Per-category strategy config
+    initial_category_levels: dict[str, ToolAccessLevel] = Field(
+        default_factory=dict,
+        description="Initial per-category trust levels",
+    )
+    category_criteria: dict[str, dict[str, CategoryTrustCriteria]] = Field(
+        default_factory=dict,
+        description="Per-category promotion criteria",
+    )
+
+    # Milestone strategy config
+    milestones: dict[str, MilestoneCriteria] = Field(
+        default_factory=dict,
+        description="Milestone criteria for trust transitions",
+    )
+    re_verification: ReVerificationConfig = Field(
+        default_factory=ReVerificationConfig,
+        description="Re-verification configuration",
+    )


🛠️ Refactor suggestion | 🟠 Major

Dict fields remain mutable despite frozen=True.

The promotion_thresholds, initial_category_levels, category_criteria, and milestones dict fields can be mutated after validation (e.g., config.promotion_thresholds["new_key"] = ...), potentially bypassing the _validate_elevated_gate security invariant. Consider wrapping these in MappingProxyType after validation, similar to how ModelMappingConfig.seniority_model_map is handled in hr/promotion/config.py.

As per coding guidelines: "Use immutability principles: create new objects instead of mutating existing ones; for non-Pydantic collections use copy.deepcopy() at construction and MappingProxyType for read-only enforcement".

🔒 Example approach for promotion_thresholds

+import copy +from types import MappingProxyType + class TrustConfig(BaseModel): # ... existing fields ... + `@model_validator`(mode="after") + def _wrap_dict_fields_readonly(self) -> Self: + """Wrap dict fields in MappingProxyType for immutability.""" + for field_name in ( + "promotion_thresholds", + "initial_category_levels", + "milestones", + ): + raw = getattr(self, field_name) + if not isinstance(raw, MappingProxyType): + object.__setattr__( + self, + field_name, + MappingProxyType(copy.deepcopy(raw)), + ) + # category_criteria is nested dict + if not isinstance(self.category_criteria, MappingProxyType): + wrapped = { + k: MappingProxyType(copy.deepcopy(v)) + for k, v in self.category_criteria.items() + } + object.__setattr__( + self, + "category_criteria", + MappingProxyType(wrapped), + ) + return self

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/ai_company/security/trust/config.py` around lines 251 - 274, The dict fields promotion_thresholds, initial_category_levels, category_criteria, and milestones are still mutable after validation and should be converted to read-only mappings to enforce invariants (e.g., the _validate_elevated_gate check); import MappingProxyType and wrap a deep-copied version of each dict into MappingProxyType after validation (use a root validator or post-init hook on the same model class that defines these fields) — mirror the pattern used for ModelMappingConfig.seniority_model_map in hr/promotion/config.py: deep-copy the incoming dicts, validate them, then replace each field with MappingProxyType(copy.deepcopy(...)) so external code cannot mutate them.

coderabbitai · 2026-03-10T11:54:07Z

src/ai_company/security/trust/models.py

+    category_levels: dict[str, ToolAccessLevel] = Field(
+        default_factory=dict,
+        description="Per-category trust levels",
+    )
+    trust_score: float | None = Field(
+        default=None,
+        ge=0.0,
+        le=1.0,
+        description="Weighted trust score",
+    )
+    last_evaluated_at: AwareDatetime | None = Field(
+        default=None,
+        description="When trust was last evaluated",
+    )
+    last_promoted_at: AwareDatetime | None = Field(
+        default=None,
+        description="When trust level was last promoted",
+    )
+    last_decay_check_at: AwareDatetime | None = Field(
+        default=None,
+        description="When decay was last checked",
+    )
+    milestone_progress: dict[str, int | float] = Field(
+        default_factory=dict,
+        description="Milestone tracking data",
+    )


🧹 Nitpick | 🔵 Trivial

Consider wrapping dict fields for full immutability.

While TrustState is frozen, the category_levels and milestone_progress dict fields can still be mutated after construction (e.g., state.category_levels["key"] = value). Since this model represents state that evolves via model_copy(update=...), consider using a validator to wrap these in MappingProxyType for defense-in-depth, consistent with how ModelMappingConfig.seniority_model_map handles this.

That said, this is a lower priority since the intended usage pattern (immutable snapshots replaced via model_copy) mitigates the risk.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/ai_company/security/trust/models.py` around lines 48 - 73, The dict fields category_levels and milestone_progress on the frozen TrustState can still be mutated after construction; add a Pydantic validator on TrustState (or field validators for category_levels and milestone_progress) that wraps incoming dicts with types.MappingProxyType (or another immutable mapping) so instances are truly immutable, mirroring the approach used by ModelMappingConfig.seniority_model_map; ensure the validator handles None/default_factory and preserves typing, and keep using model_copy(update=...) for state updates.

coderabbitai · 2026-03-10T11:54:07Z

tests/unit/hr/promotion/test_models.py

+    def test_model_changed_true_missing_old_model_id_raises(self) -> None:
+        """model_changed=True with old_model_id=None raises ValidationError."""
+        evaluation = PromotionEvaluation(
+            agent_id="agent-001",
+            current_level=SeniorityLevel.JUNIOR,
+            target_level=SeniorityLevel.MID,
+            direction=PromotionDirection.PROMOTION,
+            criteria_results=(),
+            required_criteria_met=True,
+            eligible=True,
+            evaluated_at=datetime.now(UTC),
+            strategy_name="threshold_evaluator",
+        )
+        with pytest.raises(ValidationError, match="old_model_id and new_model_id"):
+            PromotionRecord(
+                agent_id="agent-001",
+                agent_name="test-agent",
+                old_level=SeniorityLevel.JUNIOR,
+                new_level=SeniorityLevel.MID,
+                direction=PromotionDirection.PROMOTION,
+                evaluation=evaluation,
+                effective_at=datetime.now(UTC),
+                initiated_by="system",
+                model_changed=True,
+                old_model_id=None,
+                new_model_id="test-large-001",
+            )
+
+    def test_model_changed_true_missing_new_model_id_raises(self) -> None:
+        """model_changed=True with new_model_id=None raises ValidationError."""
+        evaluation = PromotionEvaluation(
+            agent_id="agent-001",
+            current_level=SeniorityLevel.JUNIOR,
+            target_level=SeniorityLevel.MID,
+            direction=PromotionDirection.PROMOTION,
+            criteria_results=(),
+            required_criteria_met=True,
+            eligible=True,
+            evaluated_at=datetime.now(UTC),
+            strategy_name="threshold_evaluator",
+        )
+        with pytest.raises(ValidationError, match="old_model_id and new_model_id"):
+            PromotionRecord(
+                agent_id="agent-001",
+                agent_name="test-agent",
+                old_level=SeniorityLevel.JUNIOR,
+                new_level=SeniorityLevel.MID,
+                direction=PromotionDirection.PROMOTION,
+                evaluation=evaluation,
+                effective_at=datetime.now(UTC),
+                initiated_by="system",
+                model_changed=True,
+                old_model_id="test-small-001",
+                new_model_id=None,
+            )
+
+    def test_model_changed_false_with_old_model_id_raises(self) -> None:
+        """model_changed=False with old_model_id set raises ValidationError."""
+        evaluation = PromotionEvaluation(
+            agent_id="agent-001",
+            current_level=SeniorityLevel.JUNIOR,
+            target_level=SeniorityLevel.MID,
+            direction=PromotionDirection.PROMOTION,
+            criteria_results=(),
+            required_criteria_met=True,
+            eligible=True,
+            evaluated_at=datetime.now(UTC),
+            strategy_name="threshold_evaluator",
+        )
+        with pytest.raises(ValidationError, match="model IDs to be None"):
+            PromotionRecord(
+                agent_id="agent-001",
+                agent_name="test-agent",
+                old_level=SeniorityLevel.JUNIOR,
+                new_level=SeniorityLevel.MID,
+                direction=PromotionDirection.PROMOTION,
+                evaluation=evaluation,
+                effective_at=datetime.now(UTC),
+                initiated_by="system",
+                model_changed=False,
+                old_model_id="test-small-001",
+            )
+
+    def test_model_changed_true_same_model_ids_raises(self) -> None:
+        """model_changed=True with identical model IDs raises ValidationError."""
+        evaluation = PromotionEvaluation(
+            agent_id="agent-001",
+            current_level=SeniorityLevel.JUNIOR,
+            target_level=SeniorityLevel.MID,
+            direction=PromotionDirection.PROMOTION,
+            criteria_results=(),
+            required_criteria_met=True,
+            eligible=True,
+            evaluated_at=datetime.now(UTC),
+            strategy_name="threshold_evaluator",
+        )
+        with pytest.raises(
+            ValidationError,
+            match="old_model_id and new_model_id to differ",
+        ):
+            PromotionRecord(
+                agent_id="agent-001",
+                agent_name="test-agent",
+                old_level=SeniorityLevel.JUNIOR,
+                new_level=SeniorityLevel.MID,
+                direction=PromotionDirection.PROMOTION,
+                evaluation=evaluation,
+                effective_at=datetime.now(UTC),
+                initiated_by="system",
+                model_changed=True,
+                old_model_id="test-small-001",
+                new_model_id="test-small-001",
+            )
+


🧹 Nitpick | 🔵 Trivial

Parameterize the model_changed validation matrix.

These cases rebuild the same PromotionEvaluation and PromotionRecord with only the model IDs and expected error changing. Folding them into one parametrized test will make future validator changes much cheaper to update.

As per coding guidelines: "Use @pytest.mark.parametrize for testing similar cases instead of duplicating test functions".

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tests/unit/hr/promotion/test_models.py` around lines 241 - 354, Combine the four similar tests into a single pytest.mark.parametrize test that builds one PromotionEvaluation once and calls PromotionRecord with parameters for model_changed, old_model_id, new_model_id and the expected ValidationError match; specifically replace the separate test_model_changed_true_missing_old_model_id_raises, test_model_changed_true_missing_new_model_id_raises, test_model_changed_false_with_old_model_id_raises, and test_model_changed_true_same_model_ids_raises with one function (e.g., test_model_changed_validation) decorated with `@pytest.mark.parametrize` over tuples of (model_changed, old_model_id, new_model_id, expected_error_substring), and in the test assert pytest.raises(ValidationError, match=expected_error_substring) when constructing PromotionRecord, reusing the PromotionEvaluation creation inside the test body.

coderabbitai · 2026-03-10T11:54:07Z

tests/unit/hr/promotion/test_seniority_approval_strategy.py

+    async def test_junior_to_mid_auto_approves(
+        self,
+        approval_config: PromotionApprovalConfig,
+    ) -> None:
+        """Junior to Mid promotion is auto-approved (below SENIOR threshold)."""
+        strategy = SeniorityApprovalStrategy(config=approval_config)
+        evaluation = _make_evaluation(
+            current_level=SeniorityLevel.JUNIOR,
+            target_level=SeniorityLevel.MID,
+            direction=PromotionDirection.PROMOTION,
+        )
+        identity = make_agent_identity(level=SeniorityLevel.JUNIOR)
+
+        decision = await strategy.decide(
+            evaluation=evaluation,
+            agent_identity=identity,
+        )
+        assert decision.auto_approve is True
+        assert decision.requires_human is False
+
+    async def test_mid_to_senior_requires_human(
+        self,
+        approval_config: PromotionApprovalConfig,
+    ) -> None:
+        """Mid to Senior promotion requires human approval (at threshold)."""
+        strategy = SeniorityApprovalStrategy(config=approval_config)
+        evaluation = _make_evaluation(
+            current_level=SeniorityLevel.MID,
+            target_level=SeniorityLevel.SENIOR,
+            direction=PromotionDirection.PROMOTION,
+        )
+        identity = make_agent_identity(level=SeniorityLevel.MID)
+
+        decision = await strategy.decide(
+            evaluation=evaluation,
+            agent_identity=identity,
+        )
+        assert decision.auto_approve is False
+        assert decision.requires_human is True
+
+    async def test_senior_to_lead_requires_human(
+        self,
+        approval_config: PromotionApprovalConfig,
+    ) -> None:
+        """Senior to Lead promotion also requires human approval."""
+        strategy = SeniorityApprovalStrategy(config=approval_config)
+        evaluation = _make_evaluation(
+            current_level=SeniorityLevel.SENIOR,
+            target_level=SeniorityLevel.LEAD,
+            direction=PromotionDirection.PROMOTION,
+        )
+        identity = make_agent_identity(level=SeniorityLevel.SENIOR)
+
+        decision = await strategy.decide(
+            evaluation=evaluation,
+            agent_identity=identity,
+        )
+        assert decision.auto_approve is False
+        assert decision.requires_human is True
+
+    async def test_custom_threshold_lead(self) -> None:
+        """Promotion to Senior auto-approved when threshold is LEAD."""
+        config = PromotionApprovalConfig(
+            human_approval_from_level=SeniorityLevel.LEAD,
+        )
+        strategy = SeniorityApprovalStrategy(config=config)
+        evaluation = _make_evaluation(
+            current_level=SeniorityLevel.MID,
+            target_level=SeniorityLevel.SENIOR,
+            direction=PromotionDirection.PROMOTION,
+        )
+        identity = make_agent_identity(level=SeniorityLevel.MID)
+
+        decision = await strategy.decide(
+            evaluation=evaluation,
+            agent_identity=identity,
+        )
+        assert decision.auto_approve is True
+        assert decision.requires_human is False


🧹 Nitpick | 🔵 Trivial

Collapse the approval decision matrix into parametrized cases.

These promotion/demotion tests only vary by levels, config, and expected flags/reason text. Converting them into one or two parametrized tests would keep the scenario table easier to extend without repeating the same arrange/act flow.

As per coding guidelines: "Use @pytest.mark.parametrize for testing similar cases instead of duplicating test functions".

Also applies to: 142-204

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tests/unit/hr/promotion/test_seniority_approval_strategy.py` around lines 57 - 135, Collapse the four near-duplicate tests (test_junior_to_mid_auto_approves, test_mid_to_senior_requires_human, test_senior_to_lead_requires_human, test_custom_threshold_lead) into a single parametrized test using pytest.mark.parametrize that iterates rows of (current_level, target_level, config, expected_auto_approve, expected_requires_human); inside the test instantiate SeniorityApprovalStrategy (use PromotionApprovalConfig where a non-default threshold is needed), build the evaluation with _make_evaluation, create identity via make_agent_identity, call strategy.decide, and assert decision.auto_approve and decision.requires_human for each row — this removes duplication while still exercising SeniorityApprovalStrategy.decide and keeping PromotionApprovalConfig, _make_evaluation, and make_agent_identity references for locating the code.

coderabbitai · 2026-03-10T11:54:07Z

tests/unit/hr/promotion/test_service.py

+"""Unit tests for PromotionService orchestrator."""
+
+from datetime import UTC, datetime
+from typing import TYPE_CHECKING
+from unittest.mock import AsyncMock
+
+import pytest
+
+from ai_company.api.approval_store import ApprovalStore
+from ai_company.core.enums import (
+    ApprovalStatus,
+    Complexity,
+    SeniorityLevel,
+    TaskType,
+)
+from ai_company.core.types import NotBlankStr
+from ai_company.hr.errors import (
+    PromotionApprovalRequiredError,
+    PromotionCooldownError,
+    PromotionError,
+)
+from ai_company.hr.performance.models import TaskMetricRecord
+from ai_company.hr.promotion.config import (
+    PromotionConfig,
+)
+from ai_company.hr.promotion.seniority_approval_strategy import (
+    SeniorityApprovalStrategy,
+)
+from ai_company.hr.promotion.seniority_model_mapping import (
+    SeniorityModelMapping,
+)
+from ai_company.hr.promotion.service import PromotionService
+from ai_company.hr.promotion.threshold_evaluator import ThresholdEvaluator
+
+from .conftest import make_agent_identity
+
+if TYPE_CHECKING:
+    from ai_company.hr.performance.tracker import PerformanceTracker
+    from ai_company.hr.registry import AgentRegistryService
+


⚠️ Potential issue | 🟡 Minor

Add the module timeout marker.

This file has many async tests but no 30-second timeout marker, so a hung approval or tracker path can stall the suite.

⏱️ Minimal fix

if TYPE_CHECKING: from ai_company.hr.performance.tracker import PerformanceTracker from ai_company.hr.registry import AgentRegistryService +pytestmark = pytest.mark.timeout(30) + # ── Helpers ──────────────────────────────────────────────────────

Based on learnings: Applies to tests/**/*.py : Set test timeout to 30 seconds per test — use pytest.mark.timeout(30) or configure in pytest.ini.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tests/unit/hr/promotion/test_service.py` around lines 1 - 40, Add a 30-second per-test timeout marker to the module by defining pytestmark = pytest.mark.timeout(30) near the top of the file (add import pytest if missing); this ensures all async tests in the PromotionService test module use a 30s timeout and prevents hangs in approval/tracker code paths.

coderabbitai · 2026-03-10T11:54:07Z

tests/unit/security/trust/conftest.py

+    if success_rate > 0:
+        total = max(1, int(tasks_completed / success_rate))
+        tasks_failed = total - tasks_completed
+    elif tasks_completed > 0:
+        # 0% success: every task is a failure
+        total = tasks_completed
+        tasks_failed = total
+        tasks_completed = 0
+    else:
+        total = 0
+        tasks_failed = 0
+
+    window = WindowMetrics(
+        window_size="30d",
+        data_point_count=total,
+        tasks_completed=tasks_completed,
+        tasks_failed=tasks_failed,
+        avg_quality_score=quality,
+        success_rate=success_rate,
+    )


⚠️ Potential issue | 🟠 Major

Stop emitting inconsistent failure counts in make_performance_snapshot.

Line 44 truncates the inferred total. For example, tasks_completed=10 and success_rate=0.95 produces tasks_failed=0 but still stores success_rate=0.95. Any trust path that derives error rate or “clean history” from counts instead of the float field will miss the failure case these tests intend to simulate.

🛠️ Safer direction

def make_performance_snapshot( agent_id: str = "agent-001", *, quality: float = 8.0, success_rate: float = 0.9, tasks_completed: int = 15, ) -> AgentPerformanceSnapshot: @@ - if success_rate > 0: - total = max(1, int(tasks_completed / success_rate)) - tasks_failed = total - tasks_completed + if success_rate >= 1: + total = tasks_completed + tasks_failed = 0 + elif success_rate > 0: + tasks_failed = max( + 1, + round(tasks_completed * (1 - success_rate) / success_rate), + ) + total = tasks_completed + tasks_failed + success_rate = tasks_completed / total elif tasks_completed > 0: # 0% success: every task is a failure total = tasks_completed tasks_failed = total tasks_completed = 0

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if success_rate > 0:

total = max(1, int(tasks_completed / success_rate))

tasks_failed = total - tasks_completed

elif tasks_completed > 0:

# 0% success: every task is a failure

total = tasks_completed

tasks_failed = total

tasks_completed = 0

else:

total = 0

tasks_failed = 0

window = WindowMetrics(

window_size="30d",

data_point_count=total,

tasks_completed=tasks_completed,

tasks_failed=tasks_failed,

avg_quality_score=quality,

success_rate=success_rate,

)

if success_rate >= 1:

total = tasks_completed

tasks_failed = 0

elif success_rate > 0:

tasks_failed = max(

1,

round(tasks_completed * (1 - success_rate) / success_rate),

)

total = tasks_completed + tasks_failed

success_rate = tasks_completed / total

elif tasks_completed > 0:

# 0% success: every task is a failure

total = tasks_completed

tasks_failed = total

tasks_completed = 0

else:

total = 0

tasks_failed = 0

window = WindowMetrics(

window_size="30d",

data_point_count=total,

tasks_completed=tasks_completed,

tasks_failed=tasks_failed,

avg_quality_score=quality,

success_rate=success_rate,

)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tests/unit/security/trust/conftest.py` around lines 43 - 62, The helper in make_performance_snapshot computes total using int(tasks_completed / success_rate) which truncates and yields inconsistent counts vs the stored float success_rate; change it to use ceiling (e.g., math.ceil(tasks_completed / success_rate)) so total >= tasks_completed and tasks_failed = total - tasks_completed reflects at least one inferred failure when the float success_rate implies it; update the success_rate>0 branch to import/use math.ceil, compute total = max(1, math.ceil(tasks_completed / success_rate)), then recompute tasks_failed before constructing WindowMetrics (window_size, data_point_count, tasks_completed, tasks_failed, avg_quality_score, success_rate).

coderabbitai · 2026-03-10T11:54:07Z

tests/unit/security/trust/test_config.py

+            (
+                TrustStrategyType.WEIGHTED,
+                {
+                    "promotion_thresholds": {
+                        "sandboxed_to_restricted": TrustThreshold(
+                            score=0.5,
+                        ),
+                    },
+                },
+            ),


⚠️ Potential issue | 🟠 Major

Fail fast on misspelled threshold keys.

promotion_thresholds is currently a plain dict[str, TrustThreshold], and WeightedTrustStrategy only looks up the fixed transition names. A typo here is therefore silently ignored instead of failing config load, which makes the trust policy look valid while disabling it.

Suggested regression

+ def test_unknown_threshold_key_raises(self) -> None: + with pytest.raises(ValueError, match="Unknown trust transition"): + TrustConfig( + strategy=TrustStrategyType.WEIGHTED, + promotion_thresholds={ + "sandbxed_to_restricted": TrustThreshold(score=0.5), + }, + )

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

(

TrustStrategyType.WEIGHTED,

{

"promotion_thresholds": {

"sandboxed_to_restricted": TrustThreshold(

score=0.5,

),

},

},

),

def test_unknown_threshold_key_raises(self) -> None:

with pytest.raises(ValueError, match="Unknown trust transition"):

TrustConfig(

strategy=TrustStrategyType.WEIGHTED,

promotion_thresholds={

"sandbxed_to_restricted": TrustThreshold(score=0.5),

},

)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tests/unit/security/trust/test_config.py` around lines 250 - 259, The test shows promotion_thresholds is a plain dict so typos are ignored; update WeightedTrustStrategy to validate keys on construction: in WeightedTrustStrategy.__init__ (or the config loader) define the exact allowed transition names (e.g., "sandboxed_to_restricted", "restricted_to_something" — whatever the class expects) and check promotion_thresholds.keys() against that set, raising a ValueError (or ConfigError) listing unknown keys when any extra/misspelled keys are present; this ensures TrustStrategyType.WEIGHTED / promotion_thresholds / TrustThreshold mis-keys fail fast.

coderabbitai · 2026-03-10T11:54:07Z

tests/unit/security/trust/test_weighted_strategy.py

+    async def test_evaluate_low_scores_stays_at_current(
+        self,
+        weighted_config: TrustConfig,
+    ) -> None:
+        strategy = WeightedTrustStrategy(config=weighted_config)
+        state = TrustState(
+            agent_id=NotBlankStr("agent-001"),
+            global_level=ToolAccessLevel.SANDBOXED,
+            trust_score=0.0,
+        )
+        snapshot = make_performance_snapshot(
+            "agent-001",
+            quality=2.0,
+            success_rate=0.3,
+            tasks_completed=3,
+        )
+
+        result = await strategy.evaluate(
+            agent_id=NotBlankStr("agent-001"),
+            current_state=state,
+            snapshot=snapshot,
+        )
+
+        assert result.score is not None
+        assert result.score < 0.5
+        assert result.recommended_level == ToolAccessLevel.SANDBOXED
+        assert result.should_change is False


⚠️ Potential issue | 🟠 Major

Add a non-floor demotion regression.

This only exercises the floor case (SANDBOXED). The current WeightedTrustStrategy implementation walks forward transition thresholds only, so a RESTRICTED or STANDARD agent with a collapsing score can stay over-trusted without this suite noticing, even though demotion support is part of the trust contract.

Suggested regression

+ async def test_evaluate_low_scores_demotes_one_level( + self, + weighted_config: TrustConfig, + ) -> None: + strategy = WeightedTrustStrategy(config=weighted_config) + state = TrustState( + agent_id=NotBlankStr("agent-001"), + global_level=ToolAccessLevel.RESTRICTED, + trust_score=0.7, + ) + snapshot = make_performance_snapshot( + "agent-001", + quality=1.0, + success_rate=0.1, + tasks_completed=1, + ) + + result = await strategy.evaluate( + agent_id=NotBlankStr("agent-001"), + current_state=state, + snapshot=snapshot, + ) + + assert result.recommended_level == ToolAccessLevel.SANDBOXED + assert result.should_change is True

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

async def test_evaluate_low_scores_stays_at_current(

self,

weighted_config: TrustConfig,

) -> None:

strategy = WeightedTrustStrategy(config=weighted_config)

state = TrustState(

agent_id=NotBlankStr("agent-001"),

global_level=ToolAccessLevel.SANDBOXED,

trust_score=0.0,

)

snapshot = make_performance_snapshot(

"agent-001",

quality=2.0,

success_rate=0.3,

tasks_completed=3,

)

result = await strategy.evaluate(

agent_id=NotBlankStr("agent-001"),

current_state=state,

snapshot=snapshot,

)

assert result.score is not None

assert result.score < 0.5

assert result.recommended_level == ToolAccessLevel.SANDBOXED

assert result.should_change is False

async def test_evaluate_low_scores_stays_at_current(

self,

weighted_config: TrustConfig,

) -> None:

strategy = WeightedTrustStrategy(config=weighted_config)

state = TrustState(

agent_id=NotBlankStr("agent-001"),

global_level=ToolAccessLevel.SANDBOXED,

trust_score=0.0,

)

snapshot = make_performance_snapshot(

"agent-001",

quality=2.0,

success_rate=0.3,

tasks_completed=3,

)

result = await strategy.evaluate(

agent_id=NotBlankStr("agent-001"),

current_state=state,

snapshot=snapshot,

)

assert result.score is not None

assert result.score < 0.5

assert result.recommended_level == ToolAccessLevel.SANDBOXED

assert result.should_change is False

async def test_evaluate_low_scores_demotes_one_level(

self,

weighted_config: TrustConfig,

) -> None:

strategy = WeightedTrustStrategy(config=weighted_config)

state = TrustState(

agent_id=NotBlankStr("agent-001"),

global_level=ToolAccessLevel.RESTRICTED,

trust_score=0.7,

)

snapshot = make_performance_snapshot(

"agent-001",

quality=1.0,

success_rate=0.1,

tasks_completed=1,

)

result = await strategy.evaluate(

agent_id=NotBlankStr("agent-001"),

current_state=state,

snapshot=snapshot,

)

assert result.recommended_level == ToolAccessLevel.SANDBOXED

assert result.should_change is True

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tests/unit/security/trust/test_weighted_strategy.py` around lines 74 - 100, Add a non-floor demotion regression test: create a new async test in tests/unit/security/trust/test_weighted_strategy.py that mirrors the existing low-score test but starts current_state.global_level at ToolAccessLevel.RESTRICTED (or STANDARD) with a non-zero trust_score; provide a snapshot that causes the WeightedTrustStrategy.evaluate to produce a collapsing score below the demotion threshold and assert that result.score is not None and below the relevant threshold, result.recommended_level equals the next-lower ToolAccessLevel (e.g., RESTRICTED -> SANDBOXED or STANDARD -> RESTRICTED) and result.should_change is True; reference WeightedTrustStrategy.evaluate, TrustState, TrustConfig, and ToolAccessLevel when locating code to ensure the strategy supports downward transitions and the test fails if only floor demotion is implemented.

Security fixes: - Add approval store verification in apply_promotion (defense-in-depth) - Re-enforce elevated gate in apply_trust_change (defense-in-depth) - Add field allowlist to update_identity (prevent mass assignment) Bug fixes: - Fix clean_history_days to filter by window period (was checking all windows) - Fix demotion tasks_completed threshold from 0.0 to 3.0 (was unsatisfiable) - Add category_criteria coverage validator on TrustConfig - Remove unused PROMOTION_APPROVED event constant - Rename total_tasks to max_tasks_completed in threshold_evaluator Test improvements: - Add frozen/immutability tests for nested config classes - Parametrize enum value tests - Convert elevated gate tests from async to sync (method is sync) - Update registry test for allowlist enforcement

greptile-apps · 2026-03-10T12:22:37Z

src/ai_company/hr/promotion/service.py

+            event = (
+                PROMOTION_REJECTED
+                if request.status == ApprovalStatus.REJECTED
+                else PROMOTION_REQUESTED
+            )
+            logger.warning(
+                event,
+                agent_id=request.agent_id,
+                status=request.status.value,
+            )
+            msg = f"Cannot apply promotion: request status is {request.status.value}"


Wrong log event for non-approved, non-rejected status

When request.status is PENDING (or any future non-APPROVED/non-REJECTED state), PROMOTION_REQUESTED is used as the structured log event. PROMOTION_REQUESTED semantically means "a new promotion request was created," so log aggregation tools would misclassify this warning as a successful request creation rather than a rejected call. A more appropriate fallback would be PROMOTION_REJECTED (since the promotion was refused regardless of the reason), or a dedicated PROMOTION_APPROVAL_REQUIRED constant.

Suggested change

event = (

PROMOTION_REJECTED

if request.status == ApprovalStatus.REJECTED

else PROMOTION_REQUESTED

)

logger.warning(

event,

agent_id=request.agent_id,

status=request.status.value,

)

msg = f"Cannot apply promotion: request status is {request.status.value}"

event = (

PROMOTION_REJECTED

if request.status == ApprovalStatus.REJECTED

else PROMOTION_REJECTED

)

Or, more explicitly, remove the conditional entirely and always use PROMOTION_REJECTED for this error path.

Prompt To Fix With AI

This is a comment left during a code review. Path: src/ai_company/hr/promotion/service.py Line: 377-387 Comment: **Wrong log event for non-approved, non-rejected status** When `request.status` is `PENDING` (or any future non-APPROVED/non-REJECTED state), `PROMOTION_REQUESTED` is used as the structured log event. `PROMOTION_REQUESTED` semantically means "a new promotion request was created," so log aggregation tools would misclassify this warning as a successful request creation rather than a rejected call. A more appropriate fallback would be `PROMOTION_REJECTED` (since the promotion was refused regardless of the reason), or a dedicated `PROMOTION_APPROVAL_REQUIRED` constant. ```suggestion event = ( PROMOTION_REJECTED if request.status == ApprovalStatus.REJECTED else PROMOTION_REJECTED ) ``` Or, more explicitly, remove the conditional entirely and always use `PROMOTION_REJECTED` for this error path. How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2026-03-10T12:22:38Z

src/ai_company/hr/promotion/service.py

+            except Exception:
+                logger.warning(
+                    PROMOTION_APPLIED,
+                    agent_id=request.agent_id,
+                    error="Trust re-evaluation failed after promotion; "
+                    "promotion still applied",
+                )


Success event constant used on failure/warning path

PROMOTION_APPLIED is a success-signal event, but here it is emitted at WARNING level to describe a trust re-evaluation failure. Any log-aggregation query filtering for PROMOTION_APPLIED will ingest this warning as a successful promotion event — misreporting failure cases.

PROMOTION_EVALUATE_FAILED is already imported and semantically correct here:

Suggested change

except Exception:

logger.warning(

PROMOTION_APPLIED,

agent_id=request.agent_id,

error="Trust re-evaluation failed after promotion; "

"promotion still applied",

)

logger.warning(

PROMOTION_EVALUATE_FAILED,

agent_id=request.agent_id,

error="Trust re-evaluation failed after promotion; "

"promotion still applied",

)

Prompt To Fix With AI

This is a comment left during a code review. Path: src/ai_company/hr/promotion/service.py Line: 469-475 Comment: **Success event constant used on failure/warning path** `PROMOTION_APPLIED` is a success-signal event, but here it is emitted at `WARNING` level to describe a trust re-evaluation failure. Any log-aggregation query filtering for `PROMOTION_APPLIED` will ingest this warning as a successful promotion event — misreporting failure cases. `PROMOTION_EVALUATE_FAILED` is already imported and semantically correct here: ```suggestion logger.warning( PROMOTION_EVALUATE_FAILED, agent_id=request.agent_id, error="Trust re-evaluation failed after promotion; " "promotion still applied", ) ``` How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2026-03-10T12:22:39Z

src/ai_company/security/trust/weighted_strategy.py

+class WeightedTrustStrategy:
+    """Trust strategy using a single weighted score.
+
+    Computes a trust score from four weighted factors derived from
+    the agent's performance snapshot:
+      - task_difficulty: quality score normalized to [0, 1]
+      - completion_rate: task success rate from the latest window
+      - error_rate: failure penalty (tasks_failed / data_point_count)
+      - human_feedback: task volume proxy (tasks / 100, capped at 1.0)
+        — placeholder until actual human feedback signals are available
+
+    The score is compared against configurable thresholds to
+    determine the recommended trust level.  Trust changes are
+    restricted to one adjacent level per evaluation.
+    """


WeightedTrustStrategy only promotes; demotion is never recommended

The class-level docstring says "promotes/demotes based on configurable thresholds," but _score_to_level exclusively iterates TRANSITION_KEYS, which only contains upward transitions (sandboxed_to_restricted, restricted_to_standard, standard_to_elevated). There is no corresponding downward path.

Concretely: an agent promoted to STANDARD whose weighted score subsequently drops to 0.0 will never be recommended for demotion by this strategy — _score_to_level returns current_level unchanged. This also means TrustService.check_decay is a silent no-op for any agent using the weighted strategy, even though it appears in the public service API.

If downward movement is not intended for this strategy, the class docstring should be corrected:

Suggested change

class WeightedTrustStrategy:

"""Trust strategy using a single weighted score.

Computes a trust score from four weighted factors derived from

the agent's performance snapshot:

- task_difficulty: quality score normalized to [0, 1]

- completion_rate: task success rate from the latest window

- error_rate: failure penalty (tasks_failed / data_point_count)

- human_feedback: task volume proxy (tasks / 100, capped at 1.0)

— placeholder until actual human feedback signals are available

The score is compared against configurable thresholds to

determine the recommended trust level. Trust changes are

restricted to one adjacent level per evaluation.

"""

class WeightedTrustStrategy:

"""Trust strategy using a single weighted score.

Computes a trust score from four weighted factors derived from

the agent's performance snapshot:

- task_difficulty: quality score normalized to [0, 1]

- completion_rate: task success rate from the latest window

- error_rate: failure penalty (tasks_failed / data_point_count)

- human_feedback: task volume proxy (tasks / 100, capped at 1.0)

— placeholder until actual human feedback signals are available

The score is compared against configurable thresholds to

determine the recommended trust level. Trust changes are

restricted to one adjacent level per evaluation (promotion only;

use the milestone strategy for decay/demotion detection).

"""

If downward movement is intended, _score_to_level needs a complementary demotion path.

Prompt To Fix With AI

This is a comment left during a code review. Path: src/ai_company/security/trust/weighted_strategy.py Line: 26-40 Comment: **`WeightedTrustStrategy` only promotes; demotion is never recommended** The class-level docstring says "promotes/demotes based on configurable thresholds," but `_score_to_level` exclusively iterates `TRANSITION_KEYS`, which only contains upward transitions (`sandboxed_to_restricted`, `restricted_to_standard`, `standard_to_elevated`). There is no corresponding downward path. Concretely: an agent promoted to `STANDARD` whose weighted score subsequently drops to 0.0 will never be recommended for demotion by this strategy — `_score_to_level` returns `current_level` unchanged. This also means `TrustService.check_decay` is a silent no-op for any agent using the `weighted` strategy, even though it appears in the public service API. If downward movement is not intended for this strategy, the class docstring should be corrected: ```suggestion class WeightedTrustStrategy: """Trust strategy using a single weighted score. Computes a trust score from four weighted factors derived from the agent's performance snapshot: - task_difficulty: quality score normalized to [0, 1] - completion_rate: task success rate from the latest window - error_rate: failure penalty (tasks_failed / data_point_count) - human_feedback: task volume proxy (tasks / 100, capped at 1.0) — placeholder until actual human feedback signals are available The score is compared against configurable thresholds to determine the recommended trust level. Trust changes are restricted to one adjacent level per evaluation (promotion only; use the milestone strategy for decay/demotion detection). """ ``` If downward movement *is* intended, `_score_to_level` needs a complementary demotion path. How can I resolve this? If you propose a fix, please make it concise.

Copilot

Pull request overview

Copilot reviewed 56 out of 56 changed files in this pull request and generated 11 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-10T12:24:50Z

src/ai_company/security/trust/per_category_strategy.py

+        for category, current_level in current_state.category_levels.items():
+            cat_criteria = self._criteria.get(category, {})
+            new_level = current_level
+
+            for transition_key, criteria in cat_criteria.items():
+                from_to = transition_key.split("_to_", maxsplit=1)
+                if len(from_to) != 2:  # noqa: PLR2004
+                    continue
+                from_str, to_str = from_to
+
+                if current_level.value != from_str:
+                    continue
+
+                if self._check_category_criteria(
+                    criteria_config=criteria,
+                    snapshot=snapshot,
+                ):
+                    try:
+                        new_level = ToolAccessLevel(to_str)
+                    except ValueError:
+                        logger.warning(
+                            TRUST_EVALUATE_FAILED,
+                            agent_id=agent_id,
+                            category=category,
+                            transition_key=transition_key,
+                            invalid_level=to_str,
+                        )
+                        continue
+
+                    if criteria.requires_human_approval:
+                        requires_human = True
+
+                    # Per-category ELEVATED always requires human approval
+                    if new_level == ToolAccessLevel.ELEVATED:
+                        requires_human = True
+
+            category_updates[category] = new_level
+
+        if category_updates:
+            min_rank = min(
+                TRUST_LEVEL_RANK.get(lv, 0) for lv in category_updates.values()
+            )
+            recommended = TRUST_LEVEL_ORDER[min_rank]
+        else:
+            recommended = current_state.global_level
+


In per_category_strategy.py, the evaluate method (lines 71-107) iterates over current_state.category_levels.items() to determine which categories to evaluate. However, it uses the current state's category levels rather than the config's initial_category_levels to drive evaluation. If a new category is added to config's initial_category_levels after an agent's state was created, that category won't be evaluated for the existing agent. More critically, the variable category_updates is populated from iterations over current_state.category_levels, not from self._initial_levels. This means newly configured categories (in self._initial_levels) that are not yet in current_state.category_levels are silently ignored during evaluation. Since initial_state is only called once at initialization, any categories added to config after initialization are permanently excluded.

Copilot · 2026-03-10T12:24:51Z

src/ai_company/hr/promotion/config.py

+    seniority_model_map: Any = Field(
+        default_factory=dict,
+        description="Explicit seniority level to model ID overrides "
+        "(wrapped as MappingProxyType after validation)",
+    )


In config.py (promotion), the seniority_model_map field on ModelMappingConfig is typed as Any (line 80), which loses type safety. The _validate_model_map_keys validator converts it to a MappingProxyType[str, str], but the field type annotation remains Any. This means consumers of this config field have no type guidance, and static analysis tools won't catch misuse. The field should be typed as dict[str, str] in the annotation (Pydantic will accept a plain dict at construction time), and the validator should document that it wraps the result in MappingProxyType. Alternatively, accept and store it as MappingProxyType[str, str] with a custom field type.

Copilot · 2026-03-10T12:24:51Z

src/ai_company/hr/registry.py

+        disallowed = set(updates.keys()) - self._UPDATABLE_FIELDS
+        if disallowed:
+            msg = (
+                f"Fields not allowed for update_identity: "
+                f"{sorted(disallowed)}; allowed: {sorted(self._UPDATABLE_FIELDS)}"
+            )
+            logger.warning(
+                HR_REGISTRY_IDENTITY_UPDATED,
+                agent_id=str(agent_id),
+                error=msg,
+            )
+            raise ValueError(msg)
+
+        key = str(agent_id)
+        async with self._lock:
+            identity = self._agents.get(key)
+            if identity is None:
+                msg = f"Agent {agent_id!r} not found in registry"
+                logger.warning(
+                    HR_REGISTRY_IDENTITY_UPDATED,
+                    agent_id=key,
+                    error=msg,
+                )
+                raise AgentNotFoundError(msg)
+            updated = identity.model_copy(update=updates)
+            self._agents[key] = updated
+
+        logger.info(
+            HR_REGISTRY_IDENTITY_UPDATED,
+            agent_id=key,
+            updated_fields=sorted(updates.keys()),
+        )
+        return updated


In registry.py, the update_identity method logs a HR_REGISTRY_IDENTITY_UPDATED event even on the error path (disallowed fields, line 232), which is confusing since the event name implies success. An error path should use a different event constant (e.g., a dedicated HR_REGISTRY_UPDATE_FAILED event or at least pass an error= kwarg to distinguish failure from success). Additionally, looking at line 252 (success path), the variable key (not agent_id) is used in the log call, which is fine for consistency, but the validation check on line 225-236 logs with agent_id=str(agent_id) directly (before key is defined), making the key usage inconsistent across the two log calls within this function.

Copilot · 2026-03-10T12:24:51Z

src/ai_company/security/trust/weighted_strategy.py

+        # Success rate from latest window
+        completion_factor = 0.0
+        for window in snapshot.windows:
+            if window.success_rate is not None:
+                completion_factor = window.success_rate
+                break
+
+        # Error penalty: 1 - (tasks_failed / data_point_count)
+        error_factor = 1.0
+        for window in snapshot.windows:
+            if window.data_point_count > 0:
+                error_factor = 1.0 - (window.tasks_failed / window.data_point_count)
+                break
+
+        # Task volume ratio (tasks completed / 100, capped at 1.0)
+        # — placeholder for human feedback until that signal is available
+        feedback_factor = 0.0
+        for window in snapshot.windows:
+            if window.tasks_completed > 0:
+                feedback_factor = min(window.tasks_completed / 100.0, 1.0)
+                break


In weighted_strategy.py, the _compute_score method (lines 133-138) uses only the first window's success_rate via a break after the first non-None value. This means if windows are ordered from newest to oldest, it uses the most recent window, which is correct. However, the code comment says "use latest window value" but the ordering of snapshot.windows is not documented or guaranteed in this context. In contrast, the error_factor calculation (lines 141-145) and feedback_factor (lines 149-153) also take the first valid window. If windows are not guaranteed to be in chronological order, the "latest" assumption could produce incorrect results. This should be documented or an explicit sort/selection should be added.

Copilot · 2026-03-10T12:24:52Z

src/ai_company/security/trust/milestone_strategy.py

+        current_rank = TRUST_LEVEL_RANK.get(state.global_level, 0)
+        if current_rank <= 0:
+            return None
+
+        # Idle decay
+        last_eval = state.last_evaluated_at
+        if last_eval is not None:
+            idle_days = (now - last_eval).days
+            if idle_days >= self._re_verification.decay_on_idle_days:
+                demoted = TRUST_LEVEL_ORDER[current_rank - 1]
+                logger.info(
+                    TRUST_DECAY_DETECTED,
+                    agent_id=state.agent_id,
+                    reason="idle",
+                    idle_days=idle_days,
+                    threshold=self._re_verification.decay_on_idle_days,
+                )
+                return demoted
+
+        # Error rate decay
+        for window in snapshot.windows:
+            if window.data_point_count > 0 and window.success_rate is not None:
+                error_rate = 1.0 - window.success_rate
+                if error_rate > self._re_verification.decay_on_error_rate:
+                    demoted = TRUST_LEVEL_ORDER[current_rank - 1]
+                    logger.info(
+                        TRUST_DECAY_DETECTED,
+                        agent_id=state.agent_id,
+                        reason="error_rate",
+                        error_rate=error_rate,
+                        threshold=self._re_verification.decay_on_error_rate,
+                    )
+                    return demoted
+                break


In milestone_strategy.py, the _check_decay method at lines 256-285 checks idle decay first, and if idle decay is triggered, it returns early without checking error rate. However, if both idle decay AND high error rate are present, the method still returns only one demotion level (not two). This is likely intentional (one level at a time), but the ordering of checks has an implicit priority: idle decay takes precedence over error rate. This precedence is not documented. Additionally, the method returns after the first window's error rate check due to break on line 285, but this break means subsequent windows are ignored for error rate calculation. If the first window has no data (data_point_count == 0 or success_rate is None), the entire error rate check is skipped. A comment should document this intent.

Copilot · 2026-03-10T12:24:52Z

tests/unit/hr/promotion/conftest.py

+        hiring_date=date(2026, 1, 15),
+    )


In promotion/conftest.py, the make_agent_identity factory sets hiring_date=date(2026, 1, 15). Since the current date is March 2026, this is a past date which is fine for test purposes. However, it's a hardcoded date rather than a relative date, which means tests that check agent tenure (e.g., time_active_days milestone checks) that use this factory may produce unexpected results if the date is ever relied upon in age-based calculations. This is a minor concern as the current tests don't use hiring_date for duration calculations, but it is worth noting for future maintainability.

Copilot · 2026-03-10T12:24:52Z

src/ai_company/hr/promotion/config.py

+    min_criteria_met: int = Field(
+        default=2,
+        ge=1,
+        le=3,
+        description="Minimum number of criteria that must be met (max 3)",
+    )


The PromotionCriteriaConfig.min_criteria_met field has a hardcoded upper bound of le=3 (line 30 of config.py). This is tied to the current ThresholdEvaluator which has exactly 3 criteria (quality_score, success_rate, tasks_completed). However, the PromotionCriteriaConfig is meant to be used with pluggable PromotionCriteriaStrategy implementations. Future strategies might have more or fewer criteria, making the hardcoded le=3 limit a hidden coupling. The field description says "max 3" explicitly, which codifies this assumption, but this could be a breaking constraint for alternative strategies. Either remove the upper bound from the config or document clearly that this config is specific to ThresholdEvaluator.

Copilot · 2026-03-10T12:24:53Z

tests/unit/security/trust/test_milestone_strategy.py

+    async def test_time_active_days_not_met_blocks_promotion(self) -> None:
+        """Milestone requires time_active_days, agent not active long enough."""
+        config = TrustConfig(
+            strategy=TrustStrategyType.MILESTONE,
+            initial_level=ToolAccessLevel.SANDBOXED,
+            milestones={
+                "sandboxed_to_restricted": MilestoneCriteria(
+                    tasks_completed=5,
+                    quality_score_min=6.0,
+                    time_active_days=30,
+                ),
+            },
+        )
+        strategy = MilestoneTrustStrategy(config=config)
+
+        state = TrustState(
+            agent_id=NotBlankStr("agent-001"),
+            global_level=ToolAccessLevel.SANDBOXED,
+            last_evaluated_at=_NOW - timedelta(days=5),
+        )
+        snapshot = make_performance_snapshot(
+            "agent-001",
+            quality=9.0,
+            success_rate=0.99,
+            tasks_completed=100,
+        )
+
+        result = await strategy.evaluate(
+            agent_id=NotBlankStr("agent-001"),
+            current_state=state,
+            snapshot=snapshot,
+        )
+
+        assert result.recommended_level == ToolAccessLevel.SANDBOXED
+        assert result.should_change is False


In milestone_strategy.py, the _check_time_and_history method uses state.last_evaluated_at to calculate days active (lines 215-218), but the docstring and surrounding comment say "Use created_at (agent tenure) rather than last_evaluated_at". The actual code on line 215 checks if state.created_at is None and line 217 uses (now - state.created_at).days, so the implementation is correct. However, the test test_time_active_days_not_met_blocks_promotion in test_milestone_strategy.py (line 340) sets last_evaluated_at=_NOW - timedelta(days=5) but never sets created_at. Since created_at defaults to None and the milestone requires time_active_days=30, the test should expect should_change is False because created_at is None (which returns False at line 215-216 of the strategy). This appears correct, but relies on created_at=None blocking the promotion, not on the 5-day active time. The test name and docstring say "agent not active long enough," which is misleading — the real reason the promotion is blocked in this test is that created_at is None, not that the agent has only been active 5 days. The test passes for the wrong reason and provides a false sense of security around the time-based check. A test with created_at=_NOW - timedelta(days=5) and time_active_days=30 would properly test that the 5-day tenure is insufficient.

Copilot · 2026-03-10T12:24:53Z

src/ai_company/hr/promotion/service.py

+        if self._trust_service is not None:
+            try:
+                snapshot = await self._tracker.get_snapshot(request.agent_id)
+                await self._trust_service.evaluate_agent(
+                    request.agent_id,
+                    snapshot,
+                )
+            except Exception:
+                logger.warning(
+                    PROMOTION_APPLIED,
+                    agent_id=request.agent_id,
+                    error="Trust re-evaluation failed after promotion; "
+                    "promotion still applied",
+                )
+
+        event = (
+            PROMOTION_APPLIED
+            if request.direction == PromotionDirection.PROMOTION
+            else DEMOTION_APPLIED
+        )
+        logger.info(
+            event,
+            agent_id=request.agent_id,
+            old_level=record.old_level.value,
+            new_level=record.new_level.value,
+            model_changed=record.model_changed,
+        )
+
+        # Notify agent and team — best-effort, must not block the record.
+        if self._on_notification is not None:
+            try:
+                await self._on_notification(record)
+                logger.debug(
+                    PROMOTION_NOTIFICATION_SENT,
+                    agent_id=request.agent_id,
+                    direction=request.direction.value,
+                )
+            except Exception:
+                logger.warning(
+                    PROMOTION_NOTIFICATION_SENT,
+                    agent_id=request.agent_id,
+                    error="Notification callback failed; promotion still applied",
+                )


In service.py (promotion), the apply_promotion method at line 469 catches a bare Exception in the trust re-evaluation block. While the comment says this is intentional (best-effort), the bare except Exception: swallows any exception type without logging the actual exception information. This makes debugging difficult. At minimum, the caught exception should be logged with its details (e.g., using exc_info=True or passing the exception message as a structured field). This is similarly true for the notification callback catch on line 499.

Copilot · 2026-03-10T12:24:53Z

src/ai_company/security/trust/service.py

+        from ai_company.security.trust.levels import (  # noqa: PLC0415
+            TRUST_LEVEL_RANK,
+        )
+
+        is_promotion = TRUST_LEVEL_RANK.get(
+            result.recommended_level, 0
+        ) > TRUST_LEVEL_RANK.get(state.global_level, 0)
+        state_update: dict[str, object] = {
+            "global_level": result.recommended_level,
+            "trust_score": result.score,
+        }
+        if is_promotion:
+            state_update["last_promoted_at"] = now
+        updated = state.model_copy(update=state_update)
+        self._trust_states[key] = updated
+        self._change_history.setdefault(key, []).append(record)
+
+        logger.info(
+            TRUST_LEVEL_CHANGED,
+            agent_id=agent_id,
+            old_level=record.old_level.value,
+            new_level=record.new_level.value,
+            reason=reason.value,
+        )
+        return record
+
+    def get_trust_state(
+        self,
+        agent_id: NotBlankStr,
+    ) -> TrustState | None:
+        """Get current trust state for an agent.
+
+        Args:
+            agent_id: Agent identifier.
+
+        Returns:
+            Trust state, or None if not initialized.
+        """
+        return self._trust_states.get(str(agent_id))
+
+    def get_change_history(
+        self,
+        agent_id: NotBlankStr,
+    ) -> tuple[TrustChangeRecord, ...]:
+        """Get trust change history for an agent.
+
+        Args:
+            agent_id: Agent identifier.
+
+        Returns:
+            Tuple of change records.
+        """
+        return tuple(self._change_history.get(str(agent_id), []))
+
+    async def check_decay(
+        self,
+        agent_id: NotBlankStr,
+        snapshot: AgentPerformanceSnapshot,
+    ) -> TrustEvaluationResult:
+        """Check for trust decay conditions.
+
+        Delegates to evaluate_agent first, then updates the decay
+        check timestamp.  The ordering ensures that the strategy's
+        decay logic sees the *previous* last_decay_check_at value,
+        not a freshly-updated one.
+
+        Args:
+            agent_id: Agent to check.
+            snapshot: Agent performance snapshot.
+
+        Returns:
+            Evaluation result (may recommend demotion on decay).
+        """
+        result = await self.evaluate_agent(agent_id, snapshot)
+
+        # Update decay check timestamp *after* evaluation
+        key = str(agent_id)
+        state = self._trust_states.get(key)
+        if state is not None:
+            now = datetime.now(UTC)
+            updated = state.model_copy(
+                update={"last_decay_check_at": now},
+            )
+            self._trust_states[key] = updated
+
+        return result
+
+    def _enforce_elevated_gate(
+        self,
+        result: TrustEvaluationResult,
+    ) -> TrustEvaluationResult:
+        """Defense-in-depth: force human approval for elevated promotion.
+
+        If a strategy recommends ELEVATED but doesn't flag human
+        approval, override it.
+        """
+        if (
+            result.recommended_level == ToolAccessLevel.ELEVATED
+            and result.current_level != ToolAccessLevel.ELEVATED
+            and not result.requires_human_approval
+        ):
+            logger.warning(
+                TRUST_ELEVATED_GATE_ENFORCED,
+                agent_id=result.agent_id,
+                strategy=result.strategy_name,
+            )
+            return result.model_copy(
+                update={"requires_human_approval": True},
+            )
+        return result
+
+    async def _create_approval(
+        self,
+        agent_id: NotBlankStr,
+        result: TrustEvaluationResult,
+    ) -> None:
+        """Create an approval item for trust level promotion."""
+        if self._approval_store is None:
+            msg = (
+                f"Cannot create trust approval for agent {agent_id!r}: "
+                f"no approval store configured"
+            )
+            logger.warning(
+                TRUST_APPROVAL_STORE_MISSING,
+                agent_id=agent_id,
+                recommended=result.recommended_level.value,
+                error=msg,
+            )
+            raise TrustEvaluationError(msg)
+
+        from ai_company.core.approval import ApprovalItem  # noqa: PLC0415
+
+        now = datetime.now(UTC)
+        approval = ApprovalItem(
+            id=NotBlankStr(str(uuid4())),
+            action_type="trust:promote",
+            title=(
+                f"Trust promotion: {result.current_level.value} "
+                f"-> {result.recommended_level.value}"
+            ),
+            description=result.details or "Trust level change requested",
+            requested_by=agent_id,
+            risk_level=ApprovalRiskLevel.HIGH,
+            created_at=now,
+            metadata={
+                "agent_id": str(agent_id),
+                "current_level": result.current_level.value,
+                "recommended_level": result.recommended_level.value,
+            },
+        )
+        await self._approval_store.add(approval)
+
+        logger.info(
+            TRUST_APPROVAL_REQUIRED,
+            agent_id=agent_id,
+            approval_id=approval.id,
+            recommended=result.recommended_level.value,
+        )
+
+    @staticmethod
+    def _infer_reason(
+        result: TrustEvaluationResult,
+    ) -> TrustChangeReason:
+        """Infer the change reason from the evaluation result.
+
+        Distinguishes promotions from demotions: demotions use
+        TRUST_DECAY, promotions use strategy-specific reasons.
+        """
+        from ai_company.security.trust.levels import (  # noqa: PLC0415
+            TRUST_LEVEL_RANK,
+        )
+


In service.py (trust), at line 205, TRUST_LEVEL_RANK is imported inside the method body with a # noqa: PLC0415 comment, and then imported again in _infer_reason at line 373. Similarly, ApprovalItem is imported inside _create_approval at line 335. These deferred imports inside method bodies are unusual Python patterns. They appear to be used here to avoid circular imports, but the module-level if TYPE_CHECKING block at the top already handles several type-only imports. If these imports genuinely cannot be moved to module level due to circular import concerns, a brief explanatory comment would be helpful. If they can be moved (since levels.py and core.approval are not circular with service.py), they should be at the top of the file.

Copilot AI review requested due to automatic review settings March 10, 2026 10:43

Copilot started reviewing on behalf of Aureliolo March 10, 2026 10:44 View session

greptile-apps bot reviewed Mar 10, 2026

View reviewed changes

Copilot AI reviewed Mar 10, 2026

View reviewed changes

coderabbitai bot reviewed Mar 10, 2026

View reviewed changes

gemini-code-assist bot reviewed Mar 10, 2026

View reviewed changes

greptile-apps bot reviewed Mar 10, 2026

View reviewed changes

src/ai_company/security/trust/milestone_strategy.py Show resolved Hide resolved

src/ai_company/observability/events/promotion.py Show resolved Hide resolved

Aureliolo added 3 commits March 10, 2026 12:50

coderabbitai bot reviewed Mar 10, 2026

View reviewed changes

Copilot AI review requested due to automatic review settings March 10, 2026 12:17

Aureliolo force-pushed the feat/progressive-trust-and-promotion branch from cfeba92 to 6c649a6 Compare March 10, 2026 12:17

Aureliolo merged commit 3a87c08 into main Mar 10, 2026
9 of 10 checks passed

Aureliolo deleted the feat/progressive-trust-and-promotion branch March 10, 2026 12:17

Copilot started reviewing on behalf of Aureliolo March 10, 2026 12:17 View session

greptile-apps bot reviewed Mar 10, 2026

View reviewed changes

Copilot AI reviewed Mar 10, 2026

View reviewed changes

		# ── Registry ─────────────────────────────────────────────────────

		HR_REGISTRY_IDENTITY_UPDATED: Final[str] = "hr.registry.identity_updated"

Conversation

Aureliolo commented Mar 10, 2026

Summary

Changes

Source (17 files)

Tests (10 files, 199 tests in trust+promotion)

Docs

Test plan

Review coverage

Uh oh!

github-actions bot commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Dependency Review

Scanned Files

Uh oh!

coderabbitai bot commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Summary by CodeRabbit

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Uh oh!

greptile-apps bot commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Sequence Diagram

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

gemini-code-assist bot commented Mar 10, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Mar 10, 2026 •

edited

Loading

coderabbitai bot commented Mar 10, 2026 •

edited

Loading

greptile-apps bot commented Mar 10, 2026 •

edited

Loading