feat: add progressive trust and promotion/demotion subsystems#196
feat: add progressive trust and promotion/demotion subsystems#196
Conversation
Dependency Review✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.Scanned FilesNone |
|
Caution Review failedThe pull request is closed. ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (56)
📝 WalkthroughSummary by CodeRabbit
WalkthroughIntroduces two new subsystems: a promotion/demotion engine (configs, protocols, evaluators, approval/model‑mapping strategies, orchestrator) and a progressive trust framework (configs, enums, protocols, four strategies, orchestrator). Integrates both into root config, registry updates, observability event constants, and extensive unit tests. Changes
Sequence Diagram(s)sequenceDiagram
rect rgba(200,200,255,0.5)
participant Client
participant PromotionService
participant Registry as AgentRegistry
participant Criteria as PromotionCriteria
participant Approval as PromotionApproval
participant ModelMap as ModelMapping
end
Client->>PromotionService: evaluate_promotion(agent_id)
PromotionService->>Registry: get_identity(agent_id)
Registry-->>PromotionService: AgentIdentity
PromotionService->>Criteria: evaluate(agent_id,current→target,snapshot)
Criteria-->>PromotionService: PromotionEvaluation
PromotionService-->>Client: PromotionEvaluation
Client->>PromotionService: request_promotion(agent_id,evaluation)
PromotionService->>Approval: decide(evaluation,agent_identity)
Approval-->>PromotionService: PromotionApprovalDecision
alt auto-approved
PromotionService-->>Client: PromotionRequest(status=APPROVED)
else needs-human
PromotionService->>Approval: create approval item (approval_store)
PromotionService-->>Client: PromotionRequest(status=PENDING,approval_id)
end
Client->>PromotionService: apply_promotion(request)
PromotionService->>ModelMap: resolve_model(agent_identity,new_level)
ModelMap-->>PromotionService: new_model_id | None
PromotionService->>Registry: update_identity(agent_id, level, model_id?)
Registry-->>PromotionService: updated AgentIdentity
PromotionService-->>Client: PromotionRecord
sequenceDiagram
rect rgba(200,255,200,0.5)
participant Client
participant TrustService
participant Strategy as TrustStrategy
participant ApprovalStore
participant Registry as AgentRegistry
end
Client->>TrustService: initialize_agent(agent_id)
TrustService->>Strategy: initial_state(agent_id)
Strategy-->>TrustService: TrustState
TrustService-->>Client: TrustState
Client->>TrustService: evaluate_agent(agent_id,snapshot)
TrustService->>Strategy: evaluate(agent_id,current_state,snapshot)
Strategy-->>TrustService: TrustEvaluationResult
TrustService-->>Client: TrustEvaluationResult
Client->>TrustService: apply_trust_change(agent_id,result)
TrustService->>TrustService: _enforce_elevated_gate(result)
alt requires approval
TrustService->>ApprovalStore: create approval item
end
TrustService->>Registry: (optional) update related agent info
TrustService-->>Client: TrustChangeRecord | None
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes Possibly related PRs
✨ Finishing Touches
🧪 Generate unit tests (beta)
✨ Simplify code
Tip Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs). Comment |
Greptile SummaryThis PR introduces two substantial new subsystems — a four-strategy progressive trust engine ( The pre-review addressed all 33 previously flagged findings. Three new issues remain:
Confidence Score: 4/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant Caller
participant PromotionService
participant TrustService
participant ApprovalStore
participant Registry
Note over Caller,Registry: Promotion Flow
Caller->>PromotionService: evaluate_promotion(agent_id)
PromotionService->>Registry: get(agent_id)
PromotionService->>PromotionService: _next_level(identity.level)
PromotionService->>PromotionService: criteria.evaluate(snapshot)
PromotionService-->>Caller: PromotionEvaluation
Caller->>PromotionService: request_promotion(agent_id, evaluation)
PromotionService->>PromotionService: check eligibility + cooldown
PromotionService->>PromotionService: approval.decide(evaluation)
alt auto_approve
PromotionService-->>Caller: PromotionRequest(APPROVED)
else requires_human
PromotionService->>ApprovalStore: add(ApprovalItem)
PromotionService-->>Caller: PromotionRequest(PENDING)
end
Caller->>PromotionService: apply_promotion(request)
PromotionService->>PromotionService: _verify_approval(request)
PromotionService->>Registry: update_identity(level)
PromotionService->>TrustService: evaluate_agent(snapshot) [best-effort]
PromotionService-->>Caller: PromotionRecord
Note over Caller,TrustService: Trust Evaluation Flow
Caller->>TrustService: evaluate_agent(agent_id, snapshot)
TrustService->>TrustService: strategy.evaluate(state, snapshot)
TrustService->>TrustService: _enforce_elevated_gate(result)
TrustService-->>Caller: TrustEvaluationResult
Caller->>TrustService: apply_trust_change(agent_id, result)
TrustService->>TrustService: _enforce_elevated_gate(result) [defense-in-depth]
alt requires_human_approval
TrustService->>ApprovalStore: add(ApprovalItem)
TrustService-->>Caller: None (awaiting approval)
else auto
TrustService->>TrustService: update trust state
TrustService-->>Caller: TrustChangeRecord
end
Last reviewed commit: 6c649a6 |
There was a problem hiding this comment.
Pull request overview
This PR implements the progressive trust system (security/trust/) and promotion/demotion subsystem (hr/promotion/) as defined in DESIGN_SPEC §11.3 and §8.4. Both subsystems use pluggable protocol patterns with multiple strategy implementations, plus a comprehensive test suite of ~199 tests covering all strategies, service methods, config validation, and edge cases.
Changes:
- Progressive Trust (
security/trust/): 4 trust strategies (disabled, weighted, per-category, milestone), trust service orchestrator, config models, domain models, event constants - Promotion/Demotion (
hr/promotion/): threshold evaluator, seniority approval strategy, model mapping, service orchestrator, config models, domain models, event constants - Infrastructure:
AgentRegistryService.update_identity(), new HR/trust/promotion enums and errors,RootConfigextended withtrustandpromotionfields
Reviewed changes
Copilot reviewed 53 out of 53 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
src/ai_company/security/trust/weighted_strategy.py |
Contains a critical bug in error_factor computation |
src/ai_company/security/trust/service.py |
apply_trust_change sets last_promoted_at even on demotions |
src/ai_company/observability/events/hr.py |
Duplicate # ── Registry section header |
src/ai_company/security/trust/config.py |
Trust config models with validation |
src/ai_company/security/trust/milestone_strategy.py |
Milestone + decay logic |
src/ai_company/security/trust/per_category_strategy.py |
Per-category trust tracks |
src/ai_company/security/trust/disabled_strategy.py |
Static trust strategy |
src/ai_company/hr/promotion/service.py |
Promotion orchestrator |
src/ai_company/hr/promotion/threshold_evaluator.py |
N-of-M criteria evaluation |
src/ai_company/hr/promotion/seniority_approval_strategy.py |
Approval decisions |
src/ai_company/hr/promotion/seniority_model_mapping.py |
Model tier resolution |
src/ai_company/hr/registry.py |
New update_identity method |
src/ai_company/config/schema.py |
RootConfig extended with trust/promotion |
| Test files | Comprehensive coverage for all new subsystems |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # Error penalty: 1 - failure_rate | ||
| error_factor = 1.0 | ||
| for window in snapshot.windows: | ||
| if window.data_point_count > 0 and window.success_rate is not None: | ||
| error_factor = 1.0 - (1.0 - window.success_rate) | ||
| break |
There was a problem hiding this comment.
The error_factor computation in _compute_score is a no-op: 1.0 - (1.0 - window.success_rate) simplifies to just window.success_rate, making error_factor identical to completion_factor. Both factors end up using window.success_rate, so the weighted trust score counts success rate twice (once under completion_rate weight and once under error_rate weight) and the intended error penalty is never applied. The formula should likely be computing an error rate factor differently — for example using window.tasks_failed / window.data_point_count — so that it actually penalizes failures distinctly from the success-rate criterion.
| # ── Registry ───────────────────────────────────────────────────── | ||
|
|
||
| HR_REGISTRY_IDENTITY_UPDATED: Final[str] = "hr.registry.identity_updated" |
There was a problem hiding this comment.
The duplicate # ── Registry ───────────────────────────────────────────────────── section header was added at line 38 when HR_REGISTRY_IDENTITY_UPDATED was introduced, but there is already an identical section header at line 32 for the same "Registry" group. The new constant should either be placed under the existing # ── Registry block (lines 32–36) or the duplicate heading should be renamed to something more descriptive (e.g., # ── Registry: identity updates ─────────────────────────────────).
| updated = state.model_copy( | ||
| update={ | ||
| "global_level": result.recommended_level, | ||
| "last_promoted_at": now, | ||
| "trust_score": result.score, | ||
| }, | ||
| ) |
There was a problem hiding this comment.
In apply_trust_change, the state update always sets last_promoted_at to now regardless of whether the change is a promotion or a demotion. This means a demotion will incorrectly update last_promoted_at, which could affect demotion-tracking logic or reporting (e.g., if last_promoted_at is used to compute how long since the last promotion). A separate last_demoted_at field or a conditional update based on the direction of change (promotion vs demotion) should be used.
There was a problem hiding this comment.
Actionable comments posted: 28
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/ai_company/config/schema.py (1)
467-498: 🧹 Nitpick | 🔵 TrivialUpdate the RootConfig docstring to include new attributes.
The docstring's Attributes section lists all configuration fields but is missing the newly added
trustandpromotionattributes.Proposed docstring update
mcp: MCP bridge configuration. security: Security subsystem configuration. + trust: Progressive trust configuration. + promotion: Promotion/demotion configuration. """🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/ai_company/config/schema.py` around lines 467 - 498, The RootConfig docstring is out of date and missing the new attributes; update the Attributes section of the RootConfig class docstring to include brief entries for the trust and promotion fields (e.g., "trust: Trust configuration." and "promotion: Promotion rules/configuration.") so the docstring accurately documents all top-level config fields and matches the model schema.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@DESIGN_SPEC.md`:
- Around line 2860-2869: The DESIGN_SPEC.md implementation-status callouts are
inconsistent: you added concrete packages for hr/promotion/ and security/trust/
but left earlier status notes claiming promotions/demotions and M7
security/progressive trust are unimplemented; update those earlier status
callouts to reflect the new implementations and revise the implementation-status
text around the promotion subsystem and M7 security/trust to mark them
implemented (or document the exact deviations), including the same fixes in the
other referenced section(s) noted (around the later block for 3102-3112); also
add the required user-alert language that any deviation from DESIGN_SPEC.md must
be surfaced to the user with an explanation so they can choose to proceed or
update the spec, rather than allowing silent divergence.
In `@src/ai_company/hr/promotion/config.py`:
- Around line 77-90: The seniority_model_map field is currently a plain dict (in
the config model with frozen=True), allowing post-validation mutation that
circumvents _validate_model_map_keys; change the construction to deep-copy the
incoming dict and wrap it in a read-only MappingProxyType before assignment so
callers cannot mutate it afterwards, e.g., perform copy.deepcopy on the provided
mapping during model validation/initialization (within _validate_model_map_keys
or a pre/post validator) and replace self.seniority_model_map with
MappingProxyType(deepcopied_map) after validating keys against SeniorityLevel
values; ensure the model remains frozen and that external callers only see an
immutable mapping.
In `@src/ai_company/hr/promotion/models.py`:
- Around line 180-193: The validator _validate_model_fields should also reject
no-op changes where model_changed is True but old_model_id == new_model_id;
update the logic in _validate_model_fields to (1) keep the existing checks that
model_changed=True requires both old_model_id and new_model_id not None and
model_changed=False requires both IDs be None, and (2) add a new check when
model_changed is True that raises ValueError if self.old_model_id ==
self.new_model_id (with a clear message like "model_changed=True requires
old_model_id and new_model_id to differ") so identical IDs are not accepted.
In `@src/ai_company/hr/promotion/service.py`:
- Around line 142-147: The code is logging errors using the
PROMOTION_EVALUATE_START event which is misleading; replace those usages with a
dedicated error/failure event constant (e.g. PROMOTION_EVALUATE_ERROR or
PROMOTION_EVALUATE_FAILURE) and update the logger calls that currently pass
PROMOTION_EVALUATE_START (locations around the PromotionError raises and the
logger.warning lines) to use the new constant while keeping the same metadata
(agent_id, error=msg); ensure any other occurrences noted (around lines where
PromotionError is raised) are updated consistently and add the new constant to
the module where event constants are defined.
- Around line 463-516: Move the self._approval_store None check to the top of
_create_approval (before creating approval_id, now, or instantiating
ApprovalItem) so you bail early when no store is configured; specifically, in
_create_approval check self._approval_store is not None and raise
PromotionError/log PROMOTION_APPROVAL_SUBMITTED if missing, then proceed to
generate approval_id, now and construct the ApprovalItem (class ApprovalItem)
and call self._approval_store.add.
- Around line 414-419: The call to
self._trust_service.evaluate_agent(request.agent_id, snapshot) currently
discards the returned TrustEvaluationResult; either explicitly document this
side-effect-only use or consume the result—capture the return value (e.g.,
result = await self._trust_service.evaluate_agent(...)) and then log it via the
existing logger or attach fields from result to the promotion record created in
this service so the evaluation outcome is visible; if you prefer to keep it
side-effect-only, add a concise comment next to the call stating that
evaluate_agent updates internal state and its return value is intentionally
ignored.
In `@src/ai_company/hr/promotion/threshold_evaluator.py`:
- Around line 128-192: The _evaluate_criteria method currently hardcodes three
criteria (quality_score, success_rate, tasks_completed) inline, making extension
difficult; refactor by extracting each criterion into reusable evaluators and a
registry: create small functions or callable objects (e.g.,
evaluate_quality_score, evaluate_success_rate, evaluate_tasks_completed) that
accept (snapshot, thresholds, direction) and return CriterionResult, then
replace the inline logic in _evaluate_criteria with a loop over a registered
list/dict of these evaluator callables (e.g., CRITERION_REGISTRY) so
adding/removing criteria only requires registering a new evaluator rather than
modifying _evaluate_criteria itself. Ensure names like AgentPerformanceSnapshot,
PromotionDirection, CriterionResult and thresholds keys
("quality_score","success_rate","tasks_completed") are used by the evaluators
for compatibility.
- Around line 154-177: The aggregation strategy for windowed metrics in
threshold_evaluator.py is inconsistent and undocumented: the success_rate loop
in the snapshot handling (searching snapshot.windows for the first
window.success_rate) implicitly picks the most recent non-None window, while the
tasks_completed loop uses max(...) to pick peak performance; add concise inline
comments above the success_rate aggregation (near the loop that picks the first
non-None value) stating "use most recent window value for success_rate" and
above the tasks_completed aggregation stating "use peak value across windows for
tasks_completed" so readers can understand the intended rationale; reference the
success_rate aggregation block and the total_tasks / tasks_completed max(...)
computation when adding the comments.
In `@src/ai_company/observability/events/hr.py`:
- Around line 38-40: The file defines HR_REGISTRY_IDENTITY_UPDATED under a
duplicated "Registry" section header; remove the second header and move/keep the
constant HR_REGISTRY_IDENTITY_UPDATED within the existing Registry section (the
one around the earlier header) so there is only one "Registry" section
containing the HR_REGISTRY_IDENTITY_UPDATED constant.
In `@src/ai_company/observability/events/promotion.py`:
- Line 18: DEMOTION_APPLIED currently emits "demotion.applied" which breaks the
module's event contract (promotion.<subject>.<action>); update the constant
DEMOTION_APPLIED in promotion.py so its value follows the promotion namespace
(e.g., "promotion.demotion.applied") and adjust any consumers/tests that rely on
the old string to use the new value.
In `@src/ai_company/security/trust/config.py`:
- Around line 127-178: In MilestoneCriteria._validate_approval_flags add
structured logging of the validation failure before raising the ValueError:
import or use the module's logger (e.g., logger) and log a clear message
including the class name, the conflicting field values (self.auto_promote and
self.requires_human_approval) and any relevant context, then raise the same
ValueError; ensure logging uses structured fields (or key/value pairs) so
observability tools can index the values.
In `@src/ai_company/security/trust/milestone_strategy.py`:
- Around line 211-216: The check in milestone_strategy.py uses (now -
state.last_evaluated_at).days to enforce milestone.time_active_days, but that
measures time since last evaluation rather than agent tenure; update the logic
in the function/method containing this block (referenced by
milestone.time_active_days, state.last_evaluated_at, and days_active) to compute
days_active from the agent's hire/start date (e.g., agent.hiring_date or
agent.started_at) instead of state.last_evaluated_at; ensure you handle a
missing hiring/start date (return False or treat as not old enough), compute
(now - agent.hiring_date).days, and then compare that value to
milestone.time_active_days.
In `@src/ai_company/security/trust/per_category_strategy.py`:
- Around line 161-188: In _check_category_criteria, rename the misleading local
variable total_tasks to max_tasks_completed to reflect that it uses
max(window.tasks_completed) across snapshot.windows (best single-window
performance), update all references (the for-loop accumulation and the
subsequent comparison against criteria_config.tasks_completed) to use
max_tasks_completed, and ensure any related comments/docstrings are adjusted to
match the new name.
In `@src/ai_company/security/trust/service.py`:
- Around line 197-204: The current update always sets last_promoted_at even on
demotions; change the update logic around state.model_copy so that
last_promoted_at is only set when result.recommended_level represents a higher
rank than state.global_level (i.e., an actual promotion). Implement this by
comparing levels using the same ranking map used elsewhere (e.g., _LEVEL_RANK)
or by a numeric/ordinal comparison between state.global_level and
result.recommended_level, and only include "last_promoted_at": now in the update
dict when the comparison indicates a promotion; otherwise leave last_promoted_at
unchanged while still updating global_level and trust_score.
- Around line 345-357: _infer_reason currently ignores whether an evaluation is
a promotion or demotion; update it to compare result.recommended_level against
result.current_level and return promotion reasons for increases and demotion
reasons for decreases. Concretely: in _infer_reason, after reading
strategy/result.score, compute a delta = result.recommended_level -
result.current_level; if delta > 0 keep existing promotion mappings (e.g.,
milestone -> MILESTONE_ACHIEVED, weighted/per_category -> SCORE_THRESHOLD); if
delta < 0 pick an appropriate demotion reason from TrustChangeReason (prefer
RE_VERIFICATION_FAILED or ERROR_RATE when the result provides failure/error
signals, otherwise default to TRUST_DECAY) and return that; if delta == 0 fall
back to existing MANUAL or strategy-based logic. Use the TrustEvaluationResult
fields recommended_level, current_level, strategy_name, score and any
error/failure indicators to choose the correct TrustChangeReason.
In `@src/ai_company/security/trust/weighted_strategy.py`:
- Around line 147-159: The error_factor calculation is currently redundant and
equals success_rate; update the branch in weighted_strategy to use a true
failure signal: when iterating snapshot.windows (same loop guard using
window.data_point_count > 0 and window.success_rate is not None), set
error_factor = 1.0 - window.success_rate (or use window.failure_rate if that
field exists) instead of the current 1.0 - (1.0 - window.success_rate) so
completion_factor and error_factor are distinct inputs to the scoring logic.
- Around line 188-216: The loop that computes the recommended level currently
allows multi-step jumps and never evaluates downward transitions; restrict
transitions to adjacent ranks only and evaluate both promotion and demotion so
_check_human_approval remains consistent. Specifically, in the recommendation
logic (where _TRANSITION_KEYS, _TRUST_LEVEL_RANK, self._thresholds and the local
variable recommended are used) compute current_rank/from_rank/to_rank and only
consider a transition if abs(to_rank - current_rank) == 1; for upward moves
require score >= threshold.score, for downward moves require score <
threshold.score (or the appropriate comparison used by TrustThreshold), and set
recommended accordingly. This ensures _check_human_approval (which looks up
thresholds for exact from->to pairs) will see only adjacent transitions and that
demotions are evaluated.
In `@tests/unit/hr/promotion/conftest.py`:
- Around line 39-43: The current logic makes success_rate=0 produce
tasks_failed=0 which is inconsistent; change the branch so that success_rate==0
is treated explicitly: if success_rate == 0 then if tasks_completed > 0 raise a
ValueError (invalid input), otherwise set total = max(1, 0) and compute
tasks_failed = total (so all tasks are failures); keep the existing calculation
for success_rate > 0. Update the block around variables total, success_rate,
tasks_completed, tasks_failed in tests/unit/hr/promotion/conftest.py
accordingly.
In `@tests/unit/hr/promotion/test_config.py`:
- Around line 1-13: Add a module-level 30s timeout for all tests in this file by
declaring the pytest timeout marker at module scope (e.g., set pytestmark to
pytest.mark.timeout(30)) so every test in this module uses the 30-second limit;
place this declaration near the top of the file after the imports (affects the
tests in tests/unit/hr/promotion/test_config.py and ensures the module-level
marker applies to
ModelMappingConfig/PromotionApprovalConfig/PromotionConfig/PromotionCriteriaConfig
tests).
- Around line 45-65: Add frozen/immutability tests for the nested config classes
by creating instances of PromotionCriteriaConfig, PromotionApprovalConfig, and
ModelMappingConfig and asserting that attempting to mutate a field (e.g.,
setting an attribute like min_criteria_met or any other field) raises a
TypeError; add one test function per class (e.g.,
test_immutable_promotion_criteria_config,
test_immutable_promotion_approval_config, test_immutable_model_mapping_config)
in the same test module so the frozen nature of each Pydantic model is
explicitly verified.
In `@tests/unit/hr/promotion/test_models.py`:
- Around line 1-17: This test module lacks the timeout marker; add a
module-level pytest timeout by defining pytestmark = pytest.mark.timeout(30)
near the top of the file (after the imports where pytest is imported) so all
tests in this module use a 30-second timeout; use the existing pytest import and
the pytestmark symbol to apply the marker.
In `@tests/unit/hr/promotion/test_seniority_approval_strategy.py`:
- Around line 1-16: Add the module-level pytest timeout marker by defining
pytestmark = pytest.mark.timeout(30) near the top of the test module (after the
existing imports) so pytest applies a 30s timeout to all tests in this file; you
can place it below the imports that include pytest and above the test helpers
that use make_agent_identity and SeniorityApprovalStrategy.
In `@tests/unit/hr/promotion/test_seniority_model_mapping.py`:
- Around line 1-13: The test module test_seniority_model_mapping.py is missing
the required pytest timeout marker; add a module-level marker by declaring
pytestmark = pytest.mark.timeout(30) (using the pytest symbol) near the top of
the file (after the imports) so the entire module (including tests that exercise
SeniorityModelMapping, ModelMappingConfig, and helpers like make_agent_identity)
runs with a 30s timeout.
In `@tests/unit/hr/promotion/test_threshold_evaluator.py`:
- Around line 1-16: The test module lacks the 30-second timeout marker; add a
module-level pytest mark by inserting "pytestmark = pytest.mark.timeout(30)"
directly after the existing imports (you already import pytest), e.g., in the
tests/unit/hr/promotion/test_threshold_evaluator.py file near the top of the
module so it applies to all tests that exercise ThresholdEvaluator and helpers
like make_performance_snapshot.
In `@tests/unit/security/trust/test_enums.py`:
- Around line 44-72: Multiple single-value tests for TrustChangeReason should be
consolidated into one parameterized test: replace the seven individual test_*
functions (e.g., test_score_threshold, test_milestone_achieved,
test_human_approval, etc.) with a single pytest.mark.parametrize that maps enum
members (TrustChangeReason.SCORE_THRESHOLD,
TrustChangeReason.MILESTONE_ACHIEVED, ...) to their expected string values and
asserts member.value == expected; keep the existing test_member_count as-is and
adjust the members-are-strings test to assert isinstance(member.value, str) when
iterating TrustChangeReason.
- Around line 17-34: The tests duplicate value assertions for TrustStrategyType
(test_disabled_value, test_weighted_value, test_per_category_value,
test_milestone_value); replace them with a single parametric test using
pytest.mark.parametrize that iterates over (member, expected_value) pairs and
asserts member.value == expected_value, and similarly convert
test_members_are_strings to a parametric test over TrustStrategyType members (or
include the isinstance check in the same parametrize) while keeping
test_member_count unchanged; update references to TrustStrategyType and test
function names accordingly.
In `@tests/unit/security/trust/test_milestone_strategy.py`:
- Around line 254-278: The test test_milestone_requires_human_for_elevated
should not conditionally skip assertions; replace the guarded check with
explicit assertions: call MilestoneTrustStrategy.evaluate (as already done) and
assert result.recommended_level == ToolAccessLevel.ELEVATED first, then assert
result.requires_human_approval is True; update references to
result.recommended_level and result.requires_human_approval and fail the test if
promotion didn't occur so the human-approval requirement is always validated.
In `@tests/unit/security/trust/test_service.py`:
- Around line 288-369: Tests in TestElevatedGateEnforcement mark methods async
although they call the synchronous TrustService._enforce_elevated_gate; change
the three test methods (test_elevated_gate_enforces_human_approval,
test_elevated_gate_no_override_when_already_elevated,
test_elevated_gate_no_override_when_approval_set) to regular (non-async)
functions so they reflect synchronous behavior and avoid unnecessary async
pytest handling, keeping assertions and setup unchanged.
---
Outside diff comments:
In `@src/ai_company/config/schema.py`:
- Around line 467-498: The RootConfig docstring is out of date and missing the
new attributes; update the Attributes section of the RootConfig class docstring
to include brief entries for the trust and promotion fields (e.g., "trust: Trust
configuration." and "promotion: Promotion rules/configuration.") so the
docstring accurately documents all top-level config fields and matches the model
schema.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 40cba637-cd6f-4748-ad05-7c0fe0924f0a
📒 Files selected for processing (53)
CLAUDE.mdDESIGN_SPEC.mdsrc/ai_company/config/defaults.pysrc/ai_company/config/schema.pysrc/ai_company/hr/enums.pysrc/ai_company/hr/errors.pysrc/ai_company/hr/promotion/__init__.pysrc/ai_company/hr/promotion/approval_protocol.pysrc/ai_company/hr/promotion/config.pysrc/ai_company/hr/promotion/criteria_protocol.pysrc/ai_company/hr/promotion/model_mapping_protocol.pysrc/ai_company/hr/promotion/models.pysrc/ai_company/hr/promotion/seniority_approval_strategy.pysrc/ai_company/hr/promotion/seniority_model_mapping.pysrc/ai_company/hr/promotion/service.pysrc/ai_company/hr/promotion/threshold_evaluator.pysrc/ai_company/hr/registry.pysrc/ai_company/observability/events/hr.pysrc/ai_company/observability/events/promotion.pysrc/ai_company/observability/events/trust.pysrc/ai_company/security/trust/__init__.pysrc/ai_company/security/trust/config.pysrc/ai_company/security/trust/disabled_strategy.pysrc/ai_company/security/trust/enums.pysrc/ai_company/security/trust/errors.pysrc/ai_company/security/trust/milestone_strategy.pysrc/ai_company/security/trust/models.pysrc/ai_company/security/trust/per_category_strategy.pysrc/ai_company/security/trust/protocol.pysrc/ai_company/security/trust/service.pysrc/ai_company/security/trust/weighted_strategy.pytests/unit/config/conftest.pytests/unit/hr/promotion/__init__.pytests/unit/hr/promotion/conftest.pytests/unit/hr/promotion/test_config.pytests/unit/hr/promotion/test_models.pytests/unit/hr/promotion/test_seniority_approval_strategy.pytests/unit/hr/promotion/test_seniority_model_mapping.pytests/unit/hr/promotion/test_service.pytests/unit/hr/promotion/test_threshold_evaluator.pytests/unit/hr/test_enums.pytests/unit/hr/test_registry.pytests/unit/observability/test_events.pytests/unit/security/trust/__init__.pytests/unit/security/trust/conftest.pytests/unit/security/trust/test_config.pytests/unit/security/trust/test_disabled_strategy.pytests/unit/security/trust/test_enums.pytests/unit/security/trust/test_milestone_strategy.pytests/unit/security/trust/test_models.pytests/unit/security/trust/test_per_category_strategy.pytests/unit/security/trust/test_service.pytests/unit/security/trust/test_weighted_strategy.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Agent
- GitHub Check: Greptile Review
🧰 Additional context used
📓 Path-based instructions (3)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use
from ai_company.observability import get_loggerand instantiate logger aslogger = get_logger(__name__)in every module with business logic
Files:
src/ai_company/hr/enums.pytests/unit/hr/promotion/__init__.pysrc/ai_company/hr/errors.pysrc/ai_company/observability/events/hr.pytests/unit/hr/promotion/test_threshold_evaluator.pysrc/ai_company/hr/promotion/seniority_model_mapping.pysrc/ai_company/observability/events/trust.pysrc/ai_company/config/defaults.pytests/unit/hr/promotion/test_config.pysrc/ai_company/hr/promotion/__init__.pysrc/ai_company/security/trust/errors.pysrc/ai_company/security/trust/config.pysrc/ai_company/hr/promotion/seniority_approval_strategy.pytests/unit/security/trust/test_models.pytests/unit/config/conftest.pysrc/ai_company/hr/promotion/model_mapping_protocol.pysrc/ai_company/security/trust/protocol.pytests/unit/security/trust/test_per_category_strategy.pytests/unit/security/trust/conftest.pysrc/ai_company/hr/promotion/criteria_protocol.pytests/unit/hr/promotion/test_seniority_approval_strategy.pytests/unit/security/trust/test_service.pysrc/ai_company/hr/promotion/approval_protocol.pytests/unit/security/trust/test_weighted_strategy.pytests/unit/observability/test_events.pysrc/ai_company/hr/registry.pysrc/ai_company/hr/promotion/threshold_evaluator.pysrc/ai_company/security/trust/disabled_strategy.pysrc/ai_company/security/trust/enums.pysrc/ai_company/security/trust/service.pysrc/ai_company/hr/promotion/models.pysrc/ai_company/hr/promotion/service.pysrc/ai_company/security/trust/models.pysrc/ai_company/hr/promotion/config.pytests/unit/security/trust/test_config.pytests/unit/security/trust/test_disabled_strategy.pytests/unit/hr/promotion/conftest.pytests/unit/security/trust/test_enums.pysrc/ai_company/security/trust/__init__.pysrc/ai_company/config/schema.pytests/unit/security/trust/test_milestone_strategy.pysrc/ai_company/security/trust/per_category_strategy.pytests/unit/hr/test_enums.pytests/unit/hr/promotion/test_models.pysrc/ai_company/observability/events/promotion.pytests/unit/security/trust/__init__.pytests/unit/hr/test_registry.pytests/unit/hr/promotion/test_service.pytests/unit/hr/promotion/test_seniority_model_mapping.pysrc/ai_company/security/trust/milestone_strategy.pysrc/ai_company/security/trust/weighted_strategy.py
src/ai_company/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/ai_company/**/*.py: Never useimport logging,logging.getLogger(), orprint()in application code — only use the centralized logger
Use event name constants fromai_company.observability.events.<domain>modules (e.g.,PROVIDER_CALL_STARTfromevents.provider,BUDGET_RECORD_ADDEDfromevents.budget) instead of string literals
Always use structured logging withlogger.info(EVENT, key=value)format — never use format strings likelogger.info('msg %s', val)
Nofrom __future__ import annotations— Python 3.14 has PEP 649 native lazy annotations
Useexcept A, B:syntax (no parentheses) per PEP 758 — ruff enforces this on Python 3.14
Add type hints to all public functions and classes; mypy strict mode is enforced
Add Google-style docstrings to all public classes and functions — ruff D rules enforce this
Use immutability principles: create new objects instead of mutating existing ones; for non-Pydantic collections usecopy.deepcopy()at construction andMappingProxyTypefor read-only enforcement
Use frozen Pydantic models for config/identity; use separate mutable-via-copy models (withmodel_copy(update=...)) for runtime state that evolves
UseNotBlankStrfromcore.typesfor all identifier and name fields instead of manual whitespace validators, including optional (NotBlankStr | None) and tuple variants
Use@computed_fieldfor derived values in Pydantic models instead of storing and validating redundant fields (e.g.,TokenUsage.total_tokens)
Preferasyncio.TaskGroupfor fan-out/fan-in parallel operations in new code (e.g., multiple tool invocations, parallel agent calls) over barecreate_task
Keep functions under 50 lines and files under 800 lines
Keep line length at 88 characters (enforced by ruff)
Handle all errors explicitly; never silently swallow exceptions
Validate at system boundaries (user input, external APIs, config files)
Never use real vendor names (Anthropic, OpenAI, Claude, GPT) in project-owned code, docstrings...
Files:
src/ai_company/hr/enums.pysrc/ai_company/hr/errors.pysrc/ai_company/observability/events/hr.pysrc/ai_company/hr/promotion/seniority_model_mapping.pysrc/ai_company/observability/events/trust.pysrc/ai_company/config/defaults.pysrc/ai_company/hr/promotion/__init__.pysrc/ai_company/security/trust/errors.pysrc/ai_company/security/trust/config.pysrc/ai_company/hr/promotion/seniority_approval_strategy.pysrc/ai_company/hr/promotion/model_mapping_protocol.pysrc/ai_company/security/trust/protocol.pysrc/ai_company/hr/promotion/criteria_protocol.pysrc/ai_company/hr/promotion/approval_protocol.pysrc/ai_company/hr/registry.pysrc/ai_company/hr/promotion/threshold_evaluator.pysrc/ai_company/security/trust/disabled_strategy.pysrc/ai_company/security/trust/enums.pysrc/ai_company/security/trust/service.pysrc/ai_company/hr/promotion/models.pysrc/ai_company/hr/promotion/service.pysrc/ai_company/security/trust/models.pysrc/ai_company/hr/promotion/config.pysrc/ai_company/security/trust/__init__.pysrc/ai_company/config/schema.pysrc/ai_company/security/trust/per_category_strategy.pysrc/ai_company/observability/events/promotion.pysrc/ai_company/security/trust/milestone_strategy.pysrc/ai_company/security/trust/weighted_strategy.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/**/*.py: Add test markers@pytest.mark.unit,@pytest.mark.integration,@pytest.mark.e2e, or@pytest.mark.slowto all test files
Maintain 80% minimum code coverage — enforced in CI withpytest --cov=ai_company --cov-fail-under=80
Useasyncio_mode = 'auto'in pytest configuration — no manual@pytest.mark.asyncioneeded on async tests
Set test timeout to 30 seconds per test — use@pytest.mark.timeout(30)or configure in pytest.ini
Use@pytest.mark.parametrizefor testing similar cases instead of duplicating test functions
Files:
tests/unit/hr/promotion/__init__.pytests/unit/hr/promotion/test_threshold_evaluator.pytests/unit/hr/promotion/test_config.pytests/unit/security/trust/test_models.pytests/unit/config/conftest.pytests/unit/security/trust/test_per_category_strategy.pytests/unit/security/trust/conftest.pytests/unit/hr/promotion/test_seniority_approval_strategy.pytests/unit/security/trust/test_service.pytests/unit/security/trust/test_weighted_strategy.pytests/unit/observability/test_events.pytests/unit/security/trust/test_config.pytests/unit/security/trust/test_disabled_strategy.pytests/unit/hr/promotion/conftest.pytests/unit/security/trust/test_enums.pytests/unit/security/trust/test_milestone_strategy.pytests/unit/hr/test_enums.pytests/unit/hr/promotion/test_models.pytests/unit/security/trust/__init__.pytests/unit/hr/test_registry.pytests/unit/hr/promotion/test_service.pytests/unit/hr/promotion/test_seniority_model_mapping.py
🧠 Learnings (11)
📚 Learning: 2026-03-10T09:29:47.580Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-10T09:29:47.580Z
Learning: Applies to src/ai_company/**/*.py : Add type hints to all public functions and classes; mypy strict mode is enforced
Applied to files:
src/ai_company/hr/errors.py
📚 Learning: 2026-03-10T09:29:47.580Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-10T09:29:47.580Z
Learning: Applies to src/ai_company/**/*.py : Use event name constants from `ai_company.observability.events.<domain>` modules (e.g., `PROVIDER_CALL_START` from `events.provider`, `BUDGET_RECORD_ADDED` from `events.budget`) instead of string literals
Applied to files:
src/ai_company/observability/events/hr.pysrc/ai_company/observability/events/trust.pytests/unit/observability/test_events.pysrc/ai_company/observability/events/promotion.pyCLAUDE.md
📚 Learning: 2026-03-10T09:29:47.581Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-10T09:29:47.581Z
Learning: Applies to src/ai_company/**/*.py : Use frozen Pydantic models for config/identity; use separate mutable-via-copy models (with `model_copy(update=...)`) for runtime state that evolves
Applied to files:
src/ai_company/hr/promotion/seniority_model_mapping.pytests/unit/hr/promotion/test_config.pytests/unit/security/trust/test_models.pysrc/ai_company/hr/promotion/model_mapping_protocol.pysrc/ai_company/hr/registry.pysrc/ai_company/hr/promotion/models.pysrc/ai_company/security/trust/models.pysrc/ai_company/hr/promotion/config.pysrc/ai_company/config/schema.py
📚 Learning: 2026-03-10T09:29:47.581Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-10T09:29:47.581Z
Learning: Applies to src/ai_company/**/*.py : Validate at system boundaries (user input, external APIs, config files)
Applied to files:
src/ai_company/security/trust/config.pyCLAUDE.md
📚 Learning: 2026-03-10T09:29:47.581Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-10T09:29:47.581Z
Learning: Applies to src/ai_company/**/*.py : Use immutability principles: create new objects instead of mutating existing ones; for non-Pydantic collections use `copy.deepcopy()` at construction and `MappingProxyType` for read-only enforcement
Applied to files:
src/ai_company/security/trust/models.pysrc/ai_company/hr/promotion/config.py
📚 Learning: 2026-03-10T09:29:47.581Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-10T09:29:47.581Z
Learning: Always read `DESIGN_SPEC.md` before implementing any feature or planning any issue — treat it as the starting point for architecture, data models, and behavior
Applied to files:
DESIGN_SPEC.md
📚 Learning: 2026-03-10T09:29:47.581Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-10T09:29:47.581Z
Learning: If implementation deviates from DESIGN_SPEC.md, alert the user and explain why — user decides whether to proceed or update the spec; never silently diverge
Applied to files:
DESIGN_SPEC.md
📚 Learning: 2026-03-10T09:29:47.580Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-10T09:29:47.580Z
Learning: Applies to **/*.py : Use `from ai_company.observability import get_logger` and instantiate logger as `logger = get_logger(__name__)` in every module with business logic
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-10T09:29:47.580Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-10T09:29:47.580Z
Learning: Applies to src/ai_company/**/*.py : Always use structured logging with `logger.info(EVENT, key=value)` format — never use format strings like `logger.info('msg %s', val)`
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-10T09:29:47.580Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-10T09:29:47.580Z
Learning: Applies to src/ai_company/**/*.py : Never use `import logging`, `logging.getLogger()`, or `print()` in application code — only use the centralized logger
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-10T09:29:47.581Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-10T09:29:47.581Z
Learning: Applies to src/ai_company/**/*.py : Handle all errors explicitly; never silently swallow exceptions
Applied to files:
CLAUDE.md
🧬 Code graph analysis (32)
tests/unit/hr/promotion/test_threshold_evaluator.py (3)
src/ai_company/hr/enums.py (1)
PromotionDirection(44-48)src/ai_company/hr/promotion/threshold_evaluator.py (3)
ThresholdEvaluator(40-192)name(51-53)evaluate(55-126)src/ai_company/hr/promotion/config.py (1)
PromotionCriteriaConfig(15-33)
tests/unit/hr/promotion/test_config.py (2)
src/ai_company/core/enums.py (1)
SeniorityLevel(6-21)src/ai_company/hr/promotion/config.py (4)
ModelMappingConfig(63-90)PromotionApprovalConfig(36-60)PromotionConfig(93-126)PromotionCriteriaConfig(15-33)
src/ai_company/security/trust/config.py (3)
src/ai_company/core/enums.py (1)
ToolAccessLevel(274-291)src/ai_company/observability/_logger.py (1)
get_logger(8-28)src/ai_company/security/trust/enums.py (1)
TrustStrategyType(6-12)
tests/unit/security/trust/test_models.py (3)
src/ai_company/core/enums.py (1)
ToolAccessLevel(274-291)src/ai_company/security/trust/enums.py (1)
TrustChangeReason(15-24)src/ai_company/security/trust/models.py (4)
TrustChangeRecord(71-105)TrustEvaluationResult(108-147)TrustState(22-68)should_change(145-147)
tests/unit/config/conftest.py (2)
src/ai_company/hr/promotion/config.py (1)
PromotionConfig(93-126)src/ai_company/security/trust/config.py (1)
TrustConfig(214-369)
src/ai_company/hr/promotion/model_mapping_protocol.py (10)
src/ai_company/core/agent.py (1)
AgentIdentity(265-323)src/ai_company/core/enums.py (1)
SeniorityLevel(6-21)src/ai_company/hr/promotion/seniority_model_mapping.py (2)
name(35-37)resolve_model(39-87)src/ai_company/security/trust/disabled_strategy.py (1)
name(39-41)src/ai_company/security/trust/milestone_strategy.py (1)
name(64-66)src/ai_company/security/trust/per_category_strategy.py (1)
name(52-54)src/ai_company/security/trust/weighted_strategy.py (1)
name(64-66)src/ai_company/hr/promotion/approval_protocol.py (1)
name(26-28)src/ai_company/hr/promotion/criteria_protocol.py (1)
name(24-26)src/ai_company/security/trust/protocol.py (1)
name(24-26)
src/ai_company/security/trust/protocol.py (5)
src/ai_company/security/trust/models.py (2)
TrustEvaluationResult(108-147)TrustState(22-68)src/ai_company/security/trust/disabled_strategy.py (3)
name(39-41)evaluate(43-80)initial_state(82-94)src/ai_company/security/trust/milestone_strategy.py (3)
name(64-66)evaluate(68-148)initial_state(150-163)src/ai_company/security/trust/per_category_strategy.py (3)
name(52-54)evaluate(56-144)initial_state(146-159)src/ai_company/security/trust/weighted_strategy.py (3)
name(64-66)evaluate(68-114)initial_state(116-129)
tests/unit/security/trust/test_per_category_strategy.py (1)
src/ai_company/security/trust/per_category_strategy.py (3)
name(52-54)initial_state(146-159)evaluate(56-144)
tests/unit/security/trust/conftest.py (4)
src/ai_company/core/enums.py (1)
ToolAccessLevel(274-291)src/ai_company/hr/performance/models.py (2)
AgentPerformanceSnapshot(266-301)WindowMetrics(197-263)src/ai_company/security/trust/config.py (5)
MilestoneCriteria(127-177)ReVerificationConfig(180-211)TrustConfig(214-369)TrustThreshold(21-35)WeightedTrustWeights(38-96)src/ai_company/security/trust/enums.py (1)
TrustStrategyType(6-12)
src/ai_company/hr/promotion/criteria_protocol.py (14)
src/ai_company/core/enums.py (1)
SeniorityLevel(6-21)src/ai_company/hr/promotion/models.py (1)
PromotionEvaluation(49-93)src/ai_company/hr/promotion/seniority_approval_strategy.py (1)
name(39-41)src/ai_company/hr/promotion/seniority_model_mapping.py (1)
name(35-37)src/ai_company/security/trust/disabled_strategy.py (2)
name(39-41)evaluate(43-80)src/ai_company/security/trust/milestone_strategy.py (2)
name(64-66)evaluate(68-148)src/ai_company/security/trust/per_category_strategy.py (2)
name(52-54)evaluate(56-144)src/ai_company/security/trust/weighted_strategy.py (2)
name(64-66)evaluate(68-114)src/ai_company/hr/promotion/approval_protocol.py (1)
name(26-28)src/ai_company/hr/promotion/model_mapping_protocol.py (1)
name(23-25)src/ai_company/hr/promotion/threshold_evaluator.py (2)
name(51-53)evaluate(55-126)src/ai_company/security/trust/protocol.py (2)
name(24-26)evaluate(28-45)src/ai_company/engine/parallel_models.py (1)
agent_id(79-81)src/ai_company/engine/parallel.py (1)
snapshot(69-78)
tests/unit/hr/promotion/test_seniority_approval_strategy.py (5)
src/ai_company/core/enums.py (1)
SeniorityLevel(6-21)src/ai_company/hr/enums.py (1)
PromotionDirection(44-48)src/ai_company/hr/promotion/config.py (1)
PromotionApprovalConfig(36-60)src/ai_company/hr/promotion/models.py (2)
PromotionEvaluation(49-93)requires_human(111-113)src/ai_company/hr/promotion/seniority_approval_strategy.py (2)
name(39-41)decide(43-70)
tests/unit/security/trust/test_service.py (4)
src/ai_company/security/trust/enums.py (1)
TrustChangeReason(15-24)src/ai_company/security/trust/errors.py (1)
TrustEvaluationError(8-9)src/ai_company/security/trust/models.py (3)
TrustEvaluationResult(108-147)TrustState(22-68)should_change(145-147)src/ai_company/security/trust/service.py (1)
_infer_reason(346-357)
tests/unit/security/trust/test_weighted_strategy.py (1)
src/ai_company/security/trust/weighted_strategy.py (4)
WeightedTrustStrategy(44-216)name(64-66)initial_state(116-129)evaluate(68-114)
src/ai_company/hr/registry.py (2)
src/ai_company/core/agent.py (1)
AgentIdentity(265-323)src/ai_company/hr/errors.py (1)
AgentNotFoundError(60-61)
src/ai_company/hr/promotion/threshold_evaluator.py (5)
src/ai_company/core/enums.py (2)
SeniorityLevel(6-21)compare_seniority(51-64)src/ai_company/hr/enums.py (1)
PromotionDirection(44-48)src/ai_company/hr/promotion/models.py (2)
CriterionResult(24-46)PromotionEvaluation(49-93)src/ai_company/hr/performance/models.py (1)
AgentPerformanceSnapshot(266-301)src/ai_company/hr/promotion/config.py (1)
PromotionCriteriaConfig(15-33)
src/ai_company/security/trust/disabled_strategy.py (5)
src/ai_company/core/enums.py (1)
ToolAccessLevel(274-291)src/ai_company/security/trust/milestone_strategy.py (3)
name(64-66)evaluate(68-148)initial_state(150-163)src/ai_company/security/trust/per_category_strategy.py (3)
name(52-54)evaluate(56-144)initial_state(146-159)src/ai_company/security/trust/weighted_strategy.py (3)
name(64-66)evaluate(68-114)initial_state(116-129)src/ai_company/security/trust/protocol.py (3)
name(24-26)evaluate(28-45)initial_state(47-56)
src/ai_company/security/trust/service.py (6)
src/ai_company/core/enums.py (2)
ApprovalRiskLevel(443-449)ToolAccessLevel(274-291)src/ai_company/security/trust/errors.py (1)
TrustEvaluationError(8-9)src/ai_company/security/trust/models.py (4)
TrustChangeRecord(71-105)TrustEvaluationResult(108-147)TrustState(22-68)should_change(145-147)tests/unit/api/conftest.py (1)
approval_store(338-339)src/ai_company/security/trust/protocol.py (4)
TrustStrategy(16-56)initial_state(47-56)name(24-26)evaluate(28-45)src/ai_company/security/trust/weighted_strategy.py (3)
initial_state(116-129)name(64-66)evaluate(68-114)
src/ai_company/hr/promotion/models.py (2)
src/ai_company/core/enums.py (2)
ApprovalStatus(434-440)SeniorityLevel(6-21)src/ai_company/hr/enums.py (1)
PromotionDirection(44-48)
src/ai_company/security/trust/models.py (4)
src/ai_company/core/enums.py (1)
ToolAccessLevel(274-291)src/ai_company/security/trust/enums.py (1)
TrustChangeReason(15-24)src/ai_company/engine/parallel_models.py (1)
agent_id(79-81)src/ai_company/tools/base.py (2)
description(138-140)category(128-130)
src/ai_company/hr/promotion/config.py (2)
src/ai_company/core/enums.py (1)
SeniorityLevel(6-21)src/ai_company/tools/base.py (1)
description(138-140)
tests/unit/security/trust/test_config.py (3)
src/ai_company/core/enums.py (1)
ToolAccessLevel(274-291)src/ai_company/security/trust/config.py (3)
MilestoneCriteria(127-177)TrustThreshold(21-35)WeightedTrustWeights(38-96)src/ai_company/security/trust/enums.py (1)
TrustStrategyType(6-12)
tests/unit/security/trust/test_disabled_strategy.py (4)
src/ai_company/core/enums.py (1)
ToolAccessLevel(274-291)src/ai_company/security/trust/disabled_strategy.py (4)
DisabledTrustStrategy(24-94)name(39-41)initial_state(82-94)evaluate(43-80)src/ai_company/security/trust/models.py (1)
TrustState(22-68)tests/unit/security/trust/conftest.py (1)
make_performance_snapshot(22-60)
tests/unit/hr/promotion/conftest.py (3)
src/ai_company/core/enums.py (1)
SeniorityLevel(6-21)src/ai_company/hr/performance/models.py (2)
AgentPerformanceSnapshot(266-301)WindowMetrics(197-263)src/ai_company/hr/registry.py (1)
AgentRegistryService(32-242)
tests/unit/security/trust/test_enums.py (1)
src/ai_company/security/trust/enums.py (2)
TrustChangeReason(15-24)TrustStrategyType(6-12)
src/ai_company/config/schema.py (2)
src/ai_company/hr/promotion/config.py (1)
PromotionConfig(93-126)src/ai_company/security/trust/config.py (1)
TrustConfig(214-369)
tests/unit/security/trust/test_milestone_strategy.py (2)
src/ai_company/security/trust/config.py (2)
MilestoneCriteria(127-177)TrustConfig(214-369)src/ai_company/security/trust/milestone_strategy.py (4)
MilestoneTrustStrategy(50-291)name(64-66)initial_state(150-163)evaluate(68-148)
src/ai_company/security/trust/per_category_strategy.py (4)
src/ai_company/core/enums.py (1)
ToolAccessLevel(274-291)src/ai_company/security/trust/models.py (2)
TrustEvaluationResult(108-147)TrustState(22-68)src/ai_company/hr/performance/models.py (1)
AgentPerformanceSnapshot(266-301)src/ai_company/security/trust/config.py (2)
CategoryTrustCriteria(99-124)TrustConfig(214-369)
tests/unit/hr/test_enums.py (1)
src/ai_company/hr/enums.py (2)
PromotionDirection(44-48)LifecycleEventType(32-41)
tests/unit/hr/test_registry.py (4)
src/ai_company/core/enums.py (2)
AgentStatus(67-73)SeniorityLevel(6-21)tests/unit/hr/conftest.py (2)
registry(174-176)make_agent_identity(37-55)src/ai_company/hr/registry.py (3)
register(43-69)update_identity(198-237)get(101-111)src/ai_company/hr/errors.py (1)
AgentNotFoundError(60-61)
tests/unit/hr/promotion/test_seniority_model_mapping.py (4)
src/ai_company/core/enums.py (1)
SeniorityLevel(6-21)src/ai_company/hr/promotion/config.py (1)
ModelMappingConfig(63-90)src/ai_company/hr/promotion/seniority_model_mapping.py (3)
SeniorityModelMapping(22-87)name(35-37)resolve_model(39-87)tests/unit/hr/promotion/conftest.py (1)
make_agent_identity(62-81)
src/ai_company/security/trust/milestone_strategy.py (3)
src/ai_company/security/trust/models.py (2)
TrustEvaluationResult(108-147)TrustState(22-68)src/ai_company/hr/performance/models.py (1)
AgentPerformanceSnapshot(266-301)src/ai_company/security/trust/config.py (2)
MilestoneCriteria(127-177)TrustConfig(214-369)
src/ai_company/security/trust/weighted_strategy.py (4)
src/ai_company/core/enums.py (1)
ToolAccessLevel(274-291)src/ai_company/security/trust/models.py (2)
TrustEvaluationResult(108-147)TrustState(22-68)src/ai_company/hr/performance/models.py (1)
AgentPerformanceSnapshot(266-301)src/ai_company/security/trust/config.py (1)
TrustThreshold(21-35)
🪛 LanguageTool
CLAUDE.md
[style] ~87-~87: A comma is missing here.
Context: ...nder ai_company.observability.events (e.g. PROVIDER_CALL_START from `events.prov...
(EG_NO_COMMA)
| def test_score_threshold(self) -> None: | ||
| assert TrustChangeReason.SCORE_THRESHOLD.value == "score_threshold" | ||
|
|
||
| def test_milestone_achieved(self) -> None: | ||
| assert TrustChangeReason.MILESTONE_ACHIEVED.value == "milestone_achieved" | ||
|
|
||
| def test_human_approval(self) -> None: | ||
| assert TrustChangeReason.HUMAN_APPROVAL.value == "human_approval" | ||
|
|
||
| def test_trust_decay(self) -> None: | ||
| assert TrustChangeReason.TRUST_DECAY.value == "trust_decay" | ||
|
|
||
| def test_re_verification_failed(self) -> None: | ||
| assert ( | ||
| TrustChangeReason.RE_VERIFICATION_FAILED.value == "re_verification_failed" | ||
| ) | ||
|
|
||
| def test_manual(self) -> None: | ||
| assert TrustChangeReason.MANUAL.value == "manual" | ||
|
|
||
| def test_error_rate(self) -> None: | ||
| assert TrustChangeReason.ERROR_RATE.value == "error_rate" | ||
|
|
||
| def test_members_are_strings(self) -> None: | ||
| for member in TrustChangeReason: | ||
| assert isinstance(member, str) | ||
|
|
||
| def test_member_count(self) -> None: | ||
| assert len(TrustChangeReason) == 7 |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Same parametrize suggestion applies to TrustChangeReason tests.
The seven individual value tests could be consolidated similarly.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/unit/security/trust/test_enums.py` around lines 44 - 72, Multiple
single-value tests for TrustChangeReason should be consolidated into one
parameterized test: replace the seven individual test_* functions (e.g.,
test_score_threshold, test_milestone_achieved, test_human_approval, etc.) with a
single pytest.mark.parametrize that maps enum members
(TrustChangeReason.SCORE_THRESHOLD, TrustChangeReason.MILESTONE_ACHIEVED, ...)
to their expected string values and asserts member.value == expected; keep the
existing test_member_count as-is and adjust the members-are-strings test to
assert isinstance(member.value, str) when iterating TrustChangeReason.
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly expands the system's capabilities by integrating sophisticated mechanisms for managing agent trust and career progression. It enables dynamic adjustment of agent privileges and seniority based on performance and predefined rules, ensuring a more secure and adaptive operational environment. The changes also reflect a robust internal review process, leading to a more resilient and well-tested codebase. Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces critical new subsystems for progressive trust and agent promotion/demotion. However, it introduces critical architectural flaws in the core service orchestrators (PromotionService and TrustService), which trust evaluation and request objects from the caller without verification, allowing easy bypass of human approval gates. Furthermore, the AgentRegistryService contains a generic identity update method vulnerable to mass assignment. These issues collectively enable straightforward privilege escalation. Beyond these critical security concerns, there are also potential improvements regarding a logic bug in the weighted trust score calculation, fragility in how per-category trust transitions are configured, and a potentially incorrect aggregation of metrics across multiple time windows.
| approval_id: NotBlankStr | None = None | ||
| status = ApprovalStatus.PENDING | ||
|
|
||
| if decision.auto_approve: | ||
| status = ApprovalStatus.APPROVED | ||
| elif decision.requires_human: | ||
| if self._approval_store is None: | ||
| msg = ( | ||
| f"Promotion for agent {agent_id!r} requires human " | ||
| f"approval but no approval store is configured" | ||
| ) | ||
| logger.warning( | ||
| PROMOTION_REQUESTED, | ||
| agent_id=agent_id, | ||
| error=msg, | ||
| ) | ||
| raise PromotionError(msg) | ||
| approval_id = await self._create_approval( | ||
| agent_id=agent_id, | ||
| evaluation=evaluation, | ||
| initiated_by=initiated_by, | ||
| ) |
There was a problem hiding this comment.
The apply_promotion method trusts the status field of the provided PromotionRequest object without verifying it against a trusted server-side store. Since PromotionRequest is a Pydantic model that can be deserialized from untrusted user input (e.g., in an API endpoint), an attacker can craft a request with status="approved" and an arbitrary target_level (such as "c_suite"). This allows an attacker to bypass the human approval gate and escalate an agent's seniority level and associated authorities.
| self, | ||
| agent_id: NotBlankStr, | ||
| result: TrustEvaluationResult, | ||
| ) -> TrustChangeRecord | None: | ||
| """Apply a trust level change based on evaluation result. | ||
|
|
||
| If human approval is required, creates an approval item and | ||
| returns None. The change is applied when the approval is granted. | ||
|
|
||
| Args: | ||
| agent_id: Agent whose trust to change. | ||
| result: Evaluation result to apply. | ||
|
|
||
| Returns: | ||
| Change record if applied, None if awaiting approval. | ||
|
|
||
| Raises: | ||
| TrustEvaluationError: If agent not initialized. | ||
| """ | ||
| if not result.should_change: | ||
| return None | ||
|
|
||
| key = str(agent_id) | ||
| state = self._trust_states.get(key) | ||
| if state is None: | ||
| msg = f"Agent {agent_id!r} not initialized for trust tracking" | ||
| logger.warning( | ||
| TRUST_EVALUATE_FAILED, | ||
| agent_id=agent_id, | ||
| error=msg, | ||
| ) | ||
| raise TrustEvaluationError(msg) |
There was a problem hiding this comment.
The apply_trust_change method trusts the requires_human_approval and recommended_level fields of the TrustEvaluationResult object passed as an argument. An attacker can provide a malicious result object with requires_human_approval=False and recommended_level="elevated" to bypass the progressive trust evaluation logic and the mandatory human approval gate for elevated access. This leads to unauthorized privilege escalation, granting agents access to sensitive tool categories.
| **updates: Any, | ||
| ) -> AgentIdentity: | ||
| """Update agent identity fields via model_copy(update=...). | ||
|
|
||
| Used by TrustService (tools.access_level) and PromotionService | ||
| (level, model) to update agent identity in a single mutation point. | ||
|
|
||
| Args: | ||
| agent_id: The agent identifier. | ||
| **updates: Fields to update on the AgentIdentity. | ||
|
|
||
| Returns: | ||
| Updated agent identity. | ||
|
|
||
| Raises: | ||
| AgentNotFoundError: If the agent is not found. | ||
| """ | ||
| key = str(agent_id) | ||
| async with self._lock: | ||
| identity = self._agents.get(key) | ||
| if identity is None: | ||
| msg = f"Agent {agent_id!r} not found in registry" | ||
| logger.warning( | ||
| HR_REGISTRY_IDENTITY_UPDATED, | ||
| agent_id=key, | ||
| error=msg, | ||
| ) | ||
| raise AgentNotFoundError(msg) | ||
| updated = identity.model_copy(update=updates) |
There was a problem hiding this comment.
The update_identity method uses model_copy(update=updates) with arbitrary keyword arguments (**updates: Any). This implementation allows a caller to overwrite any field in the AgentIdentity model, including security-sensitive fields like authority, status, or tools.access_level. When this method is called with data derived from untrusted sources (as seen in the PromotionService and TrustService orchestrators), it enables mass assignment vulnerabilities and privilege escalation.
| score = ( | ||
| self._weights.task_difficulty * difficulty_factor | ||
| + self._weights.completion_rate * completion_factor | ||
| + self._weights.error_rate * error_factor | ||
| + self._weights.human_feedback * feedback_factor |
There was a problem hiding this comment.
The calculation for error_factor appears to be redundant with completion_factor. The formula 1.0 - (1.0 - window.success_rate) simplifies to just window.success_rate, which is the same value used for completion_factor. This means the error_rate weight is effectively just added to the completion_rate weight, and there's no distinct penalty for errors. This is likely a bug in the scoring formula design that could lead to incorrect trust scores.
A possible alternative to differentiate the factors would be to give a bonus for a perfect record.
| score = ( | |
| self._weights.task_difficulty * difficulty_factor | |
| + self._weights.completion_rate * completion_factor | |
| + self._weights.error_rate * error_factor | |
| + self._weights.human_feedback * feedback_factor | |
| error_factor = 1.0 | |
| for window in snapshot.windows: | |
| if window.data_point_count > 0 and window.success_rate is not None: | |
| # Give a bonus for a perfect record, otherwise use success rate. | |
| # This makes the factor different from completion_factor. | |
| if window.tasks_failed > 0: | |
| error_factor = window.success_rate | |
| else: | |
| error_factor = 1.0 | |
| break |
| met=rate_met, | ||
| current_value=success_rate, | ||
| threshold=rate_threshold, |
There was a problem hiding this comment.
The logic total_tasks = max(total_tasks, float(window.tasks_completed)) to aggregate tasks completed across multiple performance windows might be incorrect. If windows are overlapping (e.g., 7d and 30d), this could lead to double-counting or misrepresenting the total work done. Depending on the intent, you might want to use the value from the longest window, or ensure the windows are non-overlapping. This same logic appears in src/ai_company/security/trust/milestone_strategy.py.
|
|
||
| @staticmethod | ||
| def _check_time_and_history( |
There was a problem hiding this comment.
The logic total_tasks = max(total_tasks, window.tasks_completed) to aggregate tasks completed across multiple performance windows might be incorrect. If windows are overlapping (e.g., 7d and 30d), this could lead to double-counting or misrepresenting the total work done. Depending on the intent, you might want to use the value from the longest window, or ensure the windows are non-overlapping. This same logic appears in src/ai_company/hr/promotion/threshold_evaluator.py.
| ): | ||
| try: | ||
| new_level = ToolAccessLevel(to_str) | ||
| except ValueError: |
There was a problem hiding this comment.
Parsing the transition key using split('_to_') is fragile. If a tool access level's value ever contains _to_, this logic will break. A more robust approach would be to structure the category_criteria configuration differently, for example as a list of objects with from and to keys, instead of using a dictionary where keys are concatenated strings.
For example, the config could be structured like this:
category_criteria:
file_system:
- from: sandboxed
to: restricted
tasks_completed: 5
quality_score_min: 6.0Implement two interrelated subsystems: progressive trust (§11.3) with pluggable strategies (disabled, weighted, per-category, milestone) and promotion/demotion (§8.4) with criteria evaluation, approval gates, model mapping, and trust re-evaluation integration. - security/trust/: TrustStrategy protocol, 4 strategy implementations, TrustService orchestrator, config with elevated-gate security invariant - hr/promotion/: PromotionCriteriaStrategy, ApprovalStrategy, and ModelMappingStrategy protocols with initial implementations, PromotionService orchestrator with cooldown and approval store - hr/registry: add update_identity() for generic field updates - config/schema: add trust and promotion to RootConfig - 170+ unit tests across both subsystems, 94.88% coverage
Pre-reviewed by 10 agents, 33 findings addressed: - Add computed_field for PromotionApprovalDecision.requires_human - Add cross-field validators (MilestoneCriteria mutual exclusivity, PromotionRecord model_changed consistency, ModelMappingConfig keys) - Add per-category ELEVATED gate (defense-in-depth) - Fix weighted strategy factor differentiation - Add structured logging to all error/state-transition paths - Replace silent returns with explicit raises in trust/promotion services - Tighten config validation (non-empty milestones, thresholds) - Add 23 new tests covering decay, approval, level helpers, edge cases - Update DESIGN_SPEC.md and CLAUDE.md project structure sections
…t, and Greptile
- Fix error_factor/completion_factor redundancy in weighted strategy
- Prevent multi-level trust jumps (adjacent-level-only transitions)
- Add eligibility check in request_promotion
- Add notification placeholder callback for promotion/demotion events
- Add created_at field to TrustState for tenure tracking
- Fix time_active_days to use agent tenure, not evaluation frequency
- Enforce MappingProxyType on seniority_model_map and threshold dicts
- Raise PromotionError when human approval needed but no store configured
- Best-effort trust re-evaluation after promotion application
- Extract shared trust level constants to levels.py
- Add direction consistency and no-op model change validators
- Fix split("_to_") fragility with maxsplit=1
- Fix _infer_reason to distinguish promotions from demotions
- Move last_decay_check_at update after evaluation
- Update Mapping type annotation for MappingProxyType compatibility
- Add ~20 new tests, fix conditional assertions, add timeout markers
- Update DESIGN_SPEC.md and README.md for implementation status
There was a problem hiding this comment.
Actionable comments posted: 9
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
DESIGN_SPEC.md (1)
1655-1661:⚠️ Potential issue | 🟡 MinorUpdate the backlog entry for promotions.
This section now says promotions/demotions ship in
hr/promotion/, but §18.1 still lists “Agent promotions” as future work. Please remove that backlog row or narrow it to the still-deferred scope so the spec does not present both states at once. Based on learnings,If implementation deviates from DESIGN_SPEC.md, alert the user and explain why — user decides whether to proceed or update the spec; never silently diverge.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@DESIGN_SPEC.md` around lines 1655 - 1661, The DESIGN_SPEC backlog entry for “Agent promotions” conflicts with the implemented hr/promotion/ module (ThresholdEvaluator, SeniorityApprovalStrategy, SeniorityModelMapping, PromotionService) — update the backlog by either removing the “Agent promotions” row or narrowing it to any genuinely deferred sub-features not yet implemented; also add a clear note in the spec that implementation currently diverges from the original backlog and instruct the maintainer to alert stakeholders and explain the divergence so they can choose to accept the current implementation or update the spec.
♻️ Duplicate comments (1)
tests/unit/security/trust/test_service.py (1)
328-404: 🧹 Nitpick | 🔵 TrivialMake the elevated-gate tests synchronous.
TrustService._enforce_elevated_gateis a regular method, so keeping these three testsasyncadds needless event-loop setup and obscures what is actually being exercised.♻️ Suggested cleanup
- async def test_elevated_gate_enforces_human_approval( + def test_elevated_gate_enforces_human_approval( self, trust_config: TrustConfig, ) -> None: @@ - async def test_elevated_gate_no_override_when_already_elevated( + def test_elevated_gate_no_override_when_already_elevated( self, trust_config: TrustConfig, ) -> None: @@ - async def test_elevated_gate_no_override_when_approval_set( + def test_elevated_gate_no_override_when_approval_set( self, trust_config: TrustConfig, ) -> None:#!/bin/bash # Verify that the production helper is synchronous while these tests are async. rg -n -C2 'def _enforce_elevated_gate|async def test_elevated_gate_|def test_elevated_gate_' \ src/ai_company/security/trust/service.py \ tests/unit/security/trust/test_service.py🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/unit/security/trust/test_service.py` around lines 328 - 404, The three tests (test_elevated_gate_enforces_human_approval, test_elevated_gate_no_override_when_already_elevated, test_elevated_gate_no_override_when_approval_set) are marked async but call the synchronous TrustService._enforce_elevated_gate; change each test from "async def" to a plain "def" and remove any unnecessary event-loop/await scaffolding so they run synchronously and directly call service._enforce_elevated_gate; keep all assertions and setup unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@README.md`:
- Around line 40-45: The README's "Security/Approval System (M7)" paragraph
understates delivered functionality by implying all approval gating is still
planned; update README.md so it explicitly notes that human approval paths for
elevated trust access and promotion/demotion decisions are implemented (e.g.,
mention "human approval paths for elevated trust and promotion/demotion are in
place"), and scope the outstanding work to the missing authentication and
tool-invocation workflow gates (specifically JWT/OAuth and the approval workflow
gates for tool invocation), referencing the existing "Security/Approval System
(M7)" and "Status" lines so the new wording clarifies shipped features vs
remaining gaps.
In `@src/ai_company/security/trust/config.py`:
- Around line 251-274: The dict fields promotion_thresholds,
initial_category_levels, category_criteria, and milestones are still mutable
after validation and should be converted to read-only mappings to enforce
invariants (e.g., the _validate_elevated_gate check); import MappingProxyType
and wrap a deep-copied version of each dict into MappingProxyType after
validation (use a root validator or post-init hook on the same model class that
defines these fields) — mirror the pattern used for
ModelMappingConfig.seniority_model_map in hr/promotion/config.py: deep-copy the
incoming dicts, validate them, then replace each field with
MappingProxyType(copy.deepcopy(...)) so external code cannot mutate them.
In `@src/ai_company/security/trust/models.py`:
- Around line 48-73: The dict fields category_levels and milestone_progress on
the frozen TrustState can still be mutated after construction; add a Pydantic
validator on TrustState (or field validators for category_levels and
milestone_progress) that wraps incoming dicts with types.MappingProxyType (or
another immutable mapping) so instances are truly immutable, mirroring the
approach used by ModelMappingConfig.seniority_model_map; ensure the validator
handles None/default_factory and preserves typing, and keep using
model_copy(update=...) for state updates.
In `@tests/unit/hr/promotion/test_models.py`:
- Around line 241-354: Combine the four similar tests into a single
pytest.mark.parametrize test that builds one PromotionEvaluation once and calls
PromotionRecord with parameters for model_changed, old_model_id, new_model_id
and the expected ValidationError match; specifically replace the separate
test_model_changed_true_missing_old_model_id_raises,
test_model_changed_true_missing_new_model_id_raises,
test_model_changed_false_with_old_model_id_raises, and
test_model_changed_true_same_model_ids_raises with one function (e.g.,
test_model_changed_validation) decorated with `@pytest.mark.parametrize` over
tuples of (model_changed, old_model_id, new_model_id, expected_error_substring),
and in the test assert pytest.raises(ValidationError,
match=expected_error_substring) when constructing PromotionRecord, reusing the
PromotionEvaluation creation inside the test body.
In `@tests/unit/hr/promotion/test_seniority_approval_strategy.py`:
- Around line 57-135: Collapse the four near-duplicate tests
(test_junior_to_mid_auto_approves, test_mid_to_senior_requires_human,
test_senior_to_lead_requires_human, test_custom_threshold_lead) into a single
parametrized test using pytest.mark.parametrize that iterates rows of
(current_level, target_level, config, expected_auto_approve,
expected_requires_human); inside the test instantiate SeniorityApprovalStrategy
(use PromotionApprovalConfig where a non-default threshold is needed), build the
evaluation with _make_evaluation, create identity via make_agent_identity, call
strategy.decide, and assert decision.auto_approve and decision.requires_human
for each row — this removes duplication while still exercising
SeniorityApprovalStrategy.decide and keeping PromotionApprovalConfig,
_make_evaluation, and make_agent_identity references for locating the code.
In `@tests/unit/hr/promotion/test_service.py`:
- Around line 1-40: Add a 30-second per-test timeout marker to the module by
defining pytestmark = pytest.mark.timeout(30) near the top of the file (add
import pytest if missing); this ensures all async tests in the PromotionService
test module use a 30s timeout and prevents hangs in approval/tracker code paths.
In `@tests/unit/security/trust/conftest.py`:
- Around line 43-62: The helper in make_performance_snapshot computes total
using int(tasks_completed / success_rate) which truncates and yields
inconsistent counts vs the stored float success_rate; change it to use ceiling
(e.g., math.ceil(tasks_completed / success_rate)) so total >= tasks_completed
and tasks_failed = total - tasks_completed reflects at least one inferred
failure when the float success_rate implies it; update the success_rate>0 branch
to import/use math.ceil, compute total = max(1, math.ceil(tasks_completed /
success_rate)), then recompute tasks_failed before constructing WindowMetrics
(window_size, data_point_count, tasks_completed, tasks_failed,
avg_quality_score, success_rate).
In `@tests/unit/security/trust/test_config.py`:
- Around line 250-259: The test shows promotion_thresholds is a plain dict so
typos are ignored; update WeightedTrustStrategy to validate keys on
construction: in WeightedTrustStrategy.__init__ (or the config loader) define
the exact allowed transition names (e.g., "sandboxed_to_restricted",
"restricted_to_something" — whatever the class expects) and check
promotion_thresholds.keys() against that set, raising a ValueError (or
ConfigError) listing unknown keys when any extra/misspelled keys are present;
this ensures TrustStrategyType.WEIGHTED / promotion_thresholds / TrustThreshold
mis-keys fail fast.
In `@tests/unit/security/trust/test_weighted_strategy.py`:
- Around line 74-100: Add a non-floor demotion regression test: create a new
async test in tests/unit/security/trust/test_weighted_strategy.py that mirrors
the existing low-score test but starts current_state.global_level at
ToolAccessLevel.RESTRICTED (or STANDARD) with a non-zero trust_score; provide a
snapshot that causes the WeightedTrustStrategy.evaluate to produce a collapsing
score below the demotion threshold and assert that result.score is not None and
below the relevant threshold, result.recommended_level equals the next-lower
ToolAccessLevel (e.g., RESTRICTED -> SANDBOXED or STANDARD -> RESTRICTED) and
result.should_change is True; reference WeightedTrustStrategy.evaluate,
TrustState, TrustConfig, and ToolAccessLevel when locating code to ensure the
strategy supports downward transitions and the test fails if only floor demotion
is implemented.
---
Outside diff comments:
In `@DESIGN_SPEC.md`:
- Around line 1655-1661: The DESIGN_SPEC backlog entry for “Agent promotions”
conflicts with the implemented hr/promotion/ module (ThresholdEvaluator,
SeniorityApprovalStrategy, SeniorityModelMapping, PromotionService) — update the
backlog by either removing the “Agent promotions” row or narrowing it to any
genuinely deferred sub-features not yet implemented; also add a clear note in
the spec that implementation currently diverges from the original backlog and
instruct the maintainer to alert stakeholders and explain the divergence so they
can choose to accept the current implementation or update the spec.
---
Duplicate comments:
In `@tests/unit/security/trust/test_service.py`:
- Around line 328-404: The three tests
(test_elevated_gate_enforces_human_approval,
test_elevated_gate_no_override_when_already_elevated,
test_elevated_gate_no_override_when_approval_set) are marked async but call the
synchronous TrustService._enforce_elevated_gate; change each test from "async
def" to a plain "def" and remove any unnecessary event-loop/await scaffolding so
they run synchronously and directly call service._enforce_elevated_gate; keep
all assertions and setup unchanged.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: b19a7bbb-3237-488e-90a0-574b1a694f96
📒 Files selected for processing (29)
DESIGN_SPEC.mdREADME.mdsrc/ai_company/api/approval_store.pysrc/ai_company/config/schema.pysrc/ai_company/hr/promotion/config.pysrc/ai_company/hr/promotion/models.pysrc/ai_company/hr/promotion/service.pysrc/ai_company/hr/promotion/threshold_evaluator.pysrc/ai_company/observability/events/hr.pysrc/ai_company/observability/events/promotion.pysrc/ai_company/security/trust/config.pysrc/ai_company/security/trust/levels.pysrc/ai_company/security/trust/milestone_strategy.pysrc/ai_company/security/trust/models.pysrc/ai_company/security/trust/per_category_strategy.pysrc/ai_company/security/trust/service.pysrc/ai_company/security/trust/weighted_strategy.pytests/unit/hr/promotion/conftest.pytests/unit/hr/promotion/test_config.pytests/unit/hr/promotion/test_models.pytests/unit/hr/promotion/test_seniority_approval_strategy.pytests/unit/hr/promotion/test_seniority_model_mapping.pytests/unit/hr/promotion/test_service.pytests/unit/hr/promotion/test_threshold_evaluator.pytests/unit/security/trust/conftest.pytests/unit/security/trust/test_config.pytests/unit/security/trust/test_milestone_strategy.pytests/unit/security/trust/test_service.pytests/unit/security/trust/test_weighted_strategy.py
📜 Review details
🧰 Additional context used
📓 Path-based instructions (3)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use
from ai_company.observability import get_loggerand instantiate logger aslogger = get_logger(__name__)in every module with business logic
Files:
tests/unit/hr/promotion/test_threshold_evaluator.pytests/unit/security/trust/test_milestone_strategy.pytests/unit/security/trust/conftest.pysrc/ai_company/security/trust/levels.pysrc/ai_company/security/trust/per_category_strategy.pytests/unit/security/trust/test_service.pytests/unit/hr/promotion/conftest.pytests/unit/hr/promotion/test_seniority_model_mapping.pysrc/ai_company/observability/events/hr.pysrc/ai_company/security/trust/config.pytests/unit/hr/promotion/test_config.pysrc/ai_company/security/trust/weighted_strategy.pytests/unit/hr/promotion/test_seniority_approval_strategy.pysrc/ai_company/hr/promotion/service.pysrc/ai_company/config/schema.pytests/unit/hr/promotion/test_models.pytests/unit/hr/promotion/test_service.pytests/unit/security/trust/test_weighted_strategy.pysrc/ai_company/hr/promotion/models.pysrc/ai_company/security/trust/milestone_strategy.pysrc/ai_company/security/trust/models.pysrc/ai_company/hr/promotion/threshold_evaluator.pysrc/ai_company/hr/promotion/config.pysrc/ai_company/observability/events/promotion.pytests/unit/security/trust/test_config.pysrc/ai_company/security/trust/service.pysrc/ai_company/api/approval_store.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/**/*.py: Add test markers@pytest.mark.unit,@pytest.mark.integration,@pytest.mark.e2e, or@pytest.mark.slowto all test files
Maintain 80% minimum code coverage — enforced in CI withpytest --cov=ai_company --cov-fail-under=80
Useasyncio_mode = 'auto'in pytest configuration — no manual@pytest.mark.asyncioneeded on async tests
Set test timeout to 30 seconds per test — use@pytest.mark.timeout(30)or configure in pytest.ini
Use@pytest.mark.parametrizefor testing similar cases instead of duplicating test functions
Files:
tests/unit/hr/promotion/test_threshold_evaluator.pytests/unit/security/trust/test_milestone_strategy.pytests/unit/security/trust/conftest.pytests/unit/security/trust/test_service.pytests/unit/hr/promotion/conftest.pytests/unit/hr/promotion/test_seniority_model_mapping.pytests/unit/hr/promotion/test_config.pytests/unit/hr/promotion/test_seniority_approval_strategy.pytests/unit/hr/promotion/test_models.pytests/unit/hr/promotion/test_service.pytests/unit/security/trust/test_weighted_strategy.pytests/unit/security/trust/test_config.py
src/ai_company/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/ai_company/**/*.py: Never useimport logging,logging.getLogger(), orprint()in application code — only use the centralized logger
Use event name constants fromai_company.observability.events.<domain>modules (e.g.,PROVIDER_CALL_STARTfromevents.provider,BUDGET_RECORD_ADDEDfromevents.budget) instead of string literals
Always use structured logging withlogger.info(EVENT, key=value)format — never use format strings likelogger.info('msg %s', val)
Nofrom __future__ import annotations— Python 3.14 has PEP 649 native lazy annotations
Useexcept A, B:syntax (no parentheses) per PEP 758 — ruff enforces this on Python 3.14
Add type hints to all public functions and classes; mypy strict mode is enforced
Add Google-style docstrings to all public classes and functions — ruff D rules enforce this
Use immutability principles: create new objects instead of mutating existing ones; for non-Pydantic collections usecopy.deepcopy()at construction andMappingProxyTypefor read-only enforcement
Use frozen Pydantic models for config/identity; use separate mutable-via-copy models (withmodel_copy(update=...)) for runtime state that evolves
UseNotBlankStrfromcore.typesfor all identifier and name fields instead of manual whitespace validators, including optional (NotBlankStr | None) and tuple variants
Use@computed_fieldfor derived values in Pydantic models instead of storing and validating redundant fields (e.g.,TokenUsage.total_tokens)
Preferasyncio.TaskGroupfor fan-out/fan-in parallel operations in new code (e.g., multiple tool invocations, parallel agent calls) over barecreate_task
Keep functions under 50 lines and files under 800 lines
Keep line length at 88 characters (enforced by ruff)
Handle all errors explicitly; never silently swallow exceptions
Validate at system boundaries (user input, external APIs, config files)
Never use real vendor names (Anthropic, OpenAI, Claude, GPT) in project-owned code, docstrings...
Files:
src/ai_company/security/trust/levels.pysrc/ai_company/security/trust/per_category_strategy.pysrc/ai_company/observability/events/hr.pysrc/ai_company/security/trust/config.pysrc/ai_company/security/trust/weighted_strategy.pysrc/ai_company/hr/promotion/service.pysrc/ai_company/config/schema.pysrc/ai_company/hr/promotion/models.pysrc/ai_company/security/trust/milestone_strategy.pysrc/ai_company/security/trust/models.pysrc/ai_company/hr/promotion/threshold_evaluator.pysrc/ai_company/hr/promotion/config.pysrc/ai_company/observability/events/promotion.pysrc/ai_company/security/trust/service.pysrc/ai_company/api/approval_store.py
🧠 Learnings (11)
📚 Learning: 2026-03-10T09:29:47.581Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-10T09:29:47.581Z
Learning: Applies to tests/**/*.py : Set test timeout to 30 seconds per test — use `pytest.mark.timeout(30)` or configure in pytest.ini
Applied to files:
tests/unit/hr/promotion/test_threshold_evaluator.pytests/unit/hr/promotion/test_seniority_model_mapping.pytests/unit/hr/promotion/test_config.pytests/unit/hr/promotion/test_seniority_approval_strategy.pytests/unit/hr/promotion/test_models.py
📚 Learning: 2026-03-10T09:29:47.581Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-10T09:29:47.581Z
Learning: Applies to tests/**/*.py : Add test markers `pytest.mark.unit`, `pytest.mark.integration`, `pytest.mark.e2e`, or `pytest.mark.slow` to all test files
Applied to files:
tests/unit/hr/promotion/test_threshold_evaluator.pytests/unit/hr/promotion/test_seniority_model_mapping.pytests/unit/hr/promotion/test_config.pytests/unit/hr/promotion/test_seniority_approval_strategy.pytests/unit/hr/promotion/test_models.py
📚 Learning: 2026-03-10T09:29:47.581Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-10T09:29:47.581Z
Learning: Applies to tests/**/*.py : Maintain 80% minimum code coverage — enforced in CI with `pytest --cov=ai_company --cov-fail-under=80`
Applied to files:
tests/unit/hr/promotion/test_threshold_evaluator.pytests/unit/hr/promotion/test_config.py
📚 Learning: 2026-03-10T09:29:47.580Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-10T09:29:47.580Z
Learning: Applies to src/ai_company/**/*.py : Use event name constants from `ai_company.observability.events.<domain>` modules (e.g., `PROVIDER_CALL_START` from `events.provider`, `BUDGET_RECORD_ADDED` from `events.budget`) instead of string literals
Applied to files:
src/ai_company/observability/events/hr.pysrc/ai_company/hr/promotion/service.pysrc/ai_company/observability/events/promotion.py
📚 Learning: 2026-03-10T09:29:47.581Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-10T09:29:47.581Z
Learning: Applies to src/ai_company/**/*.py : Use frozen Pydantic models for config/identity; use separate mutable-via-copy models (with `model_copy(update=...)`) for runtime state that evolves
Applied to files:
src/ai_company/security/trust/config.pytests/unit/hr/promotion/test_config.pysrc/ai_company/config/schema.pysrc/ai_company/hr/promotion/models.pysrc/ai_company/security/trust/models.pysrc/ai_company/hr/promotion/config.py
📚 Learning: 2026-03-10T09:29:47.581Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-10T09:29:47.581Z
Learning: Applies to src/ai_company/**/*.py : Validate at system boundaries (user input, external APIs, config files)
Applied to files:
src/ai_company/security/trust/config.py
📚 Learning: 2026-03-10T09:29:47.580Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-10T09:29:47.580Z
Learning: Applies to src/ai_company/**/*.py : Always use structured logging with `logger.info(EVENT, key=value)` format — never use format strings like `logger.info('msg %s', val)`
Applied to files:
src/ai_company/security/trust/config.py
📚 Learning: 2026-03-10T09:29:47.581Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-10T09:29:47.581Z
Learning: Applies to src/ai_company/**/*.py : Use immutability principles: create new objects instead of mutating existing ones; for non-Pydantic collections use `copy.deepcopy()` at construction and `MappingProxyType` for read-only enforcement
Applied to files:
tests/unit/hr/promotion/test_config.pysrc/ai_company/security/trust/models.pysrc/ai_company/hr/promotion/config.py
📚 Learning: 2026-03-10T09:29:47.581Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-10T09:29:47.581Z
Learning: If implementation deviates from DESIGN_SPEC.md, alert the user and explain why — user decides whether to proceed or update the spec; never silently diverge
Applied to files:
DESIGN_SPEC.md
📚 Learning: 2026-03-10T09:29:47.581Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-10T09:29:47.581Z
Learning: When a spec section is referenced (e.g. 'Section 10.2'), read that section verbatim before coding and update `DESIGN_SPEC.md` after approved deviations
Applied to files:
DESIGN_SPEC.md
📚 Learning: 2026-03-10T09:29:47.581Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-10T09:29:47.581Z
Learning: Always read `DESIGN_SPEC.md` before implementing any feature or planning any issue — treat it as the starting point for architecture, data models, and behavior
Applied to files:
DESIGN_SPEC.md
🧬 Code graph analysis (20)
tests/unit/hr/promotion/test_threshold_evaluator.py (5)
src/ai_company/hr/enums.py (1)
PromotionDirection(44-48)src/ai_company/hr/promotion/threshold_evaluator.py (3)
ThresholdEvaluator(47-199)name(58-60)evaluate(62-133)tests/unit/hr/promotion/conftest.py (1)
make_performance_snapshot(26-69)src/ai_company/hr/promotion/config.py (1)
PromotionCriteriaConfig(17-36)src/ai_company/hr/promotion/models.py (1)
criteria_met_count(103-105)
tests/unit/security/trust/test_milestone_strategy.py (11)
src/ai_company/core/enums.py (1)
ToolAccessLevel(274-291)src/ai_company/security/trust/config.py (3)
MilestoneCriteria(127-184)ReVerificationConfig(187-218)TrustConfig(221-376)src/ai_company/security/trust/milestone_strategy.py (4)
MilestoneTrustStrategy(37-280)name(51-53)initial_state(137-150)evaluate(55-135)src/ai_company/security/trust/models.py (1)
TrustState(22-73)tests/unit/hr/promotion/conftest.py (1)
make_performance_snapshot(26-69)tests/unit/security/trust/test_weighted_strategy.py (1)
test_name_property(24-26)tests/unit/security/trust/test_per_category_strategy.py (1)
test_name_property(43-46)tests/unit/security/trust/test_disabled_strategy.py (1)
test_name_property(18-20)src/ai_company/security/trust/per_category_strategy.py (3)
name(41-43)initial_state(135-148)evaluate(45-133)src/ai_company/security/trust/weighted_strategy.py (3)
name(48-50)initial_state(100-113)evaluate(52-98)src/ai_company/security/trust/disabled_strategy.py (3)
name(39-41)initial_state(82-94)evaluate(43-80)
tests/unit/security/trust/conftest.py (4)
src/ai_company/core/enums.py (1)
ToolAccessLevel(274-291)src/ai_company/hr/performance/models.py (2)
AgentPerformanceSnapshot(266-301)WindowMetrics(197-263)src/ai_company/security/trust/config.py (5)
MilestoneCriteria(127-184)ReVerificationConfig(187-218)TrustConfig(221-376)TrustThreshold(21-35)WeightedTrustWeights(38-96)src/ai_company/security/trust/enums.py (1)
TrustStrategyType(6-12)
src/ai_company/security/trust/levels.py (1)
src/ai_company/core/enums.py (1)
ToolAccessLevel(274-291)
src/ai_company/security/trust/per_category_strategy.py (5)
src/ai_company/core/enums.py (1)
ToolAccessLevel(274-291)src/ai_company/security/trust/models.py (1)
TrustEvaluationResult(113-152)src/ai_company/hr/performance/models.py (1)
AgentPerformanceSnapshot(266-301)src/ai_company/security/trust/config.py (2)
CategoryTrustCriteria(99-124)TrustConfig(221-376)src/ai_company/security/trust/protocol.py (2)
name(24-26)evaluate(28-45)
tests/unit/hr/promotion/conftest.py (5)
src/ai_company/core/enums.py (1)
SeniorityLevel(6-21)src/ai_company/hr/promotion/config.py (4)
ModelMappingConfig(66-103)PromotionApprovalConfig(39-63)PromotionConfig(106-139)PromotionCriteriaConfig(17-36)src/ai_company/hr/promotion/approval_protocol.py (1)
name(26-28)src/ai_company/hr/promotion/criteria_protocol.py (1)
name(24-26)src/ai_company/hr/promotion/model_mapping_protocol.py (1)
name(23-25)
tests/unit/hr/promotion/test_seniority_model_mapping.py (3)
src/ai_company/core/enums.py (1)
SeniorityLevel(6-21)src/ai_company/hr/promotion/config.py (1)
ModelMappingConfig(66-103)src/ai_company/hr/promotion/seniority_model_mapping.py (3)
SeniorityModelMapping(22-87)name(35-37)resolve_model(39-87)
src/ai_company/security/trust/config.py (3)
src/ai_company/core/enums.py (1)
ToolAccessLevel(274-291)src/ai_company/observability/_logger.py (1)
get_logger(8-28)src/ai_company/security/trust/enums.py (1)
TrustStrategyType(6-12)
tests/unit/hr/promotion/test_config.py (5)
src/ai_company/core/enums.py (1)
SeniorityLevel(6-21)src/ai_company/hr/promotion/config.py (1)
PromotionCriteriaConfig(17-36)tests/unit/hr/promotion/test_models.py (5)
test_frozen(50-59)test_frozen(127-141)test_frozen(171-178)test_frozen(215-239)test_frozen(457-480)tests/unit/security/trust/test_config.py (2)
test_frozen(56-59)test_frozen(122-125)tests/unit/security/trust/test_models.py (3)
test_frozen(55-58)test_frozen(144-153)test_frozen(229-237)
src/ai_company/security/trust/weighted_strategy.py (5)
src/ai_company/core/enums.py (1)
ToolAccessLevel(274-291)src/ai_company/security/trust/models.py (2)
TrustEvaluationResult(113-152)TrustState(22-73)src/ai_company/hr/performance/models.py (1)
AgentPerformanceSnapshot(266-301)src/ai_company/security/trust/config.py (2)
TrustConfig(221-376)TrustThreshold(21-35)src/ai_company/security/trust/protocol.py (3)
name(24-26)evaluate(28-45)initial_state(47-56)
tests/unit/hr/promotion/test_seniority_approval_strategy.py (6)
src/ai_company/core/enums.py (1)
SeniorityLevel(6-21)src/ai_company/hr/enums.py (1)
PromotionDirection(44-48)src/ai_company/hr/promotion/config.py (1)
PromotionApprovalConfig(39-63)src/ai_company/hr/promotion/models.py (2)
PromotionEvaluation(49-105)requires_human(123-125)src/ai_company/hr/promotion/seniority_approval_strategy.py (2)
name(39-41)decide(43-70)src/ai_company/hr/promotion/criteria_protocol.py (1)
name(24-26)
src/ai_company/hr/promotion/service.py (10)
src/ai_company/hr/enums.py (1)
PromotionDirection(44-48)src/ai_company/hr/promotion/models.py (3)
PromotionEvaluation(49-105)requires_human(123-125)criteria_met_count(103-105)src/ai_company/hr/promotion/approval_protocol.py (3)
PromotionApprovalStrategy(18-45)decide(30-45)name(26-28)src/ai_company/hr/promotion/config.py (1)
PromotionConfig(106-139)src/ai_company/hr/promotion/model_mapping_protocol.py (3)
ModelMappingStrategy(15-42)name(23-25)resolve_model(27-42)src/ai_company/hr/registry.py (2)
AgentRegistryService(32-242)get(101-111)src/ai_company/hr/promotion/threshold_evaluator.py (2)
evaluate(62-133)name(58-60)src/ai_company/security/trust/disabled_strategy.py (2)
evaluate(43-80)name(39-41)src/ai_company/hr/promotion/seniority_approval_strategy.py (2)
decide(43-70)name(39-41)src/ai_company/hr/promotion/seniority_model_mapping.py (2)
name(35-37)resolve_model(39-87)
src/ai_company/config/schema.py (2)
src/ai_company/hr/promotion/config.py (1)
PromotionConfig(106-139)src/ai_company/security/trust/config.py (1)
TrustConfig(221-376)
tests/unit/hr/promotion/test_models.py (1)
src/ai_company/hr/promotion/models.py (7)
CriterionResult(24-46)PromotionApprovalDecision(108-125)PromotionEvaluation(49-105)PromotionRecord(128-208)PromotionRequest(211-251)criteria_met_count(103-105)requires_human(123-125)
tests/unit/security/trust/test_weighted_strategy.py (3)
src/ai_company/core/enums.py (1)
ToolAccessLevel(274-291)src/ai_company/security/trust/models.py (1)
TrustState(22-73)src/ai_company/security/trust/weighted_strategy.py (4)
WeightedTrustStrategy(26-200)name(48-50)initial_state(100-113)evaluate(52-98)
src/ai_company/security/trust/milestone_strategy.py (5)
src/ai_company/core/enums.py (1)
ToolAccessLevel(274-291)src/ai_company/security/trust/models.py (2)
TrustEvaluationResult(113-152)TrustState(22-73)src/ai_company/hr/performance/models.py (1)
AgentPerformanceSnapshot(266-301)src/ai_company/security/trust/config.py (2)
MilestoneCriteria(127-184)TrustConfig(221-376)src/ai_company/security/trust/protocol.py (3)
name(24-26)evaluate(28-45)initial_state(47-56)
src/ai_company/security/trust/models.py (2)
src/ai_company/core/enums.py (1)
ToolAccessLevel(274-291)src/ai_company/security/trust/enums.py (1)
TrustChangeReason(15-24)
src/ai_company/hr/promotion/threshold_evaluator.py (5)
src/ai_company/core/enums.py (2)
SeniorityLevel(6-21)compare_seniority(51-64)src/ai_company/hr/enums.py (1)
PromotionDirection(44-48)src/ai_company/hr/promotion/models.py (2)
CriterionResult(24-46)PromotionEvaluation(49-105)src/ai_company/hr/performance/models.py (1)
AgentPerformanceSnapshot(266-301)src/ai_company/hr/promotion/config.py (1)
PromotionCriteriaConfig(17-36)
src/ai_company/hr/promotion/config.py (1)
src/ai_company/core/enums.py (1)
SeniorityLevel(6-21)
tests/unit/security/trust/test_config.py (3)
src/ai_company/core/enums.py (1)
ToolAccessLevel(274-291)src/ai_company/security/trust/config.py (6)
CategoryTrustCriteria(99-124)MilestoneCriteria(127-184)ReVerificationConfig(187-218)TrustConfig(221-376)TrustThreshold(21-35)WeightedTrustWeights(38-96)src/ai_company/security/trust/enums.py (1)
TrustStrategyType(6-12)
🪛 LanguageTool
README.md
[typographical] ~33-~33: To join two clauses or introduce examples, consider using an em dash.
Context: ...aggregation - Progressive Trust (M7) - 4 strategies (disabled/weighted/per-cate...
(DASH_RULE)
[typographical] ~34-~34: To join two clauses or introduce examples, consider using an em dash.
Context: ...evaluation - Promotion/Demotion (M7) - Criteria evaluation (ThresholdEvaluator)...
(DASH_RULE)
[grammar] ~34-~34: Please add a punctuation mark at the end of paragraph.
Context: ...eniorityModelMapping), PromotionService orchestrator ### Not implemented yet (planned miles...
(PUNCTUATION_PARAGRAPH_END)
[typographical] ~38-~38: To join two clauses or introduce examples, consider using an em dash.
Context: ...ones) - Memory Backend Adapter (M5) - Memory protocols, retrieval pipeline, or...
(DASH_RULE)
[typographical] ~39-~39: To join two clauses or introduce examples, consider using an em dash.
Context: ..., Temporal KG) planned - CLI Surface - cli/ package is placeholder-only - **S...
(DASH_RULE)
[typographical] ~40-~40: To join two clauses or introduce examples, consider using an em dash.
Context: ...only - Security/Approval System (M7) - SecOps agent with rule engine (soft-allo...
(DASH_RULE)
[typographical] ~41-~41: To join two clauses or introduce examples, consider using an em dash.
Context: ...e planned - Advanced Product Surface - web dashboard, external integrations ##...
(DASH_RULE)
🔇 Additional comments (38)
src/ai_company/observability/events/hr.py (1)
37-37: LGTM!The new event constant follows the established
hr.<subject>.<action>naming convention and is correctly placed within the existing Registry section.src/ai_company/security/trust/levels.py (1)
1-39: LGTM!Clean extraction of shared trust level constants. The canonical ordering tuple and rank lookup are correctly derived from
ToolAccessLevelenum members, andCUSTOMis intentionally excluded since it bypasses the hierarchy. The use of tuples ensures immutability of the ordering and transition definitions.src/ai_company/security/trust/models.py (1)
76-152: LGTM!
TrustChangeRecordandTrustEvaluationResultare well-structured frozen models with appropriate field types. The@computed_fielddecorator forshould_changeis a clean way to derive whether a trust update is warranted. Theallow_inf_nan=Falseconfig onTrustEvaluationResultprovides proper score validation.src/ai_company/security/trust/config.py (3)
21-97: LGTM!
TrustThresholdandWeightedTrustWeightsare well-designed with appropriate field constraints. The weights sum validation with_WEIGHTS_SUM_TOLERANCEhandles floating-point precision correctly, and the structured logging provides good observability for configuration errors.
99-185: LGTM!
CategoryTrustCriteriaandMilestoneCriteriahave appropriate defaults and constraints. The mutual exclusivity validation onauto_promoteandrequires_human_approvalwith structured logging is correctly implemented.
276-376: LGTM!The
_validate_elevated_gateand_validate_strategy_specific_fieldsvalidators provide strong security guarantees. The elevated gate invariant is enforced across all strategy types (weighted, milestone, per-category), and strategy-specific required fields are validated with clear error messages.src/ai_company/hr/promotion/config.py (3)
17-64: LGTM!
PromotionCriteriaConfigandPromotionApprovalConfigare well-structured frozen models with appropriate defaults and constraints. Usingtuple[NotBlankStr, ...]forrequired_criteriaensures immutability.
66-104: LGTM!The
seniority_model_mapimmutability concern from the previous review has been properly addressed. The validator correctly validates keys againstSeniorityLevelvalues, deep-copies the map, and wraps it inMappingProxyTypefor read-only enforcement.
106-139: LGTM!
PromotionConfigaggregates sub-configs appropriately with sensible defaults. Theallow_inf_nan=Falseconstraint provides additional safety for numeric fields.src/ai_company/config/schema.py (1)
23-23: LGTM!Clean integration of
TrustConfigandPromotionConfigintoRootConfig. The new fields follow the established pattern of usingdefault_factoryfor nested frozen configs, and the docstring is appropriately updated.Also applies to: 31-31, 498-499, 599-606
src/ai_company/security/trust/weighted_strategy.py (6)
115-161: LGTM - scoring factors are now distinct.The previous concern about
error_ratedouble-countingsuccess_ratehas been addressed. The scoring factors are now:
completion_factor: useswindow.success_ratedirectlyerror_factor: uses1.0 - (tasks_failed / data_point_count), which is semantically distinct sincedata_point_countincludes non-task eventsThe weighted computation correctly applies configured weights to independent signals.
163-184: LGTM - single-step transitions enforced.The previous concern about multi-level jumps has been addressed. The
_score_to_levelmethod now correctly checksif from_level != current_level: continue, ensuring only adjacent-level promotions are considered.
26-51: LGTM!The class docstring clearly documents the four weighted factors and their sources. Initialization correctly stores config references for weights and thresholds.
52-114: LGTM!The
evaluatemethod follows the protocol interface correctly with structured logging events. Theinitial_statemethod properly initializes trust state with the configured level and a zero score.
186-200: LGTM!
_check_human_approvalcorrectly looks up the threshold for the specific transition and returns therequires_human_approvalflag.
163-184: Weighted strategy only handles promotions; demotion is by design handled elsewhere.
_score_to_levelintentionally evaluates upward transitions only—TRANSITION_KEYScontains only upward transitions and the config usespromotion_thresholds(no demotion thresholds). Demotion is handled via the service layer by comparingrecommended_levelvscurrent_level(reason:TRUST_DECAY), with explicit decay logic in the milestone strategy for idle/error-rate conditions.Minor note: The module docstring mentions "promotes/demotes" but this strategy only promotes; consider clarifying it as "promotes agents based on weighted score."
src/ai_company/security/trust/milestone_strategy.py (5)
191-216: LGTM - tenure calculation fixed.The previous concern about
time_active_daysmeasuring evaluation frequency instead of agent tenure has been addressed. The code now correctly usesstate.created_at(line 203) with an explicit comment (line 200) clarifying this is "agent tenure" rather than time since last evaluation.
37-54: LGTM!Clean initialization storing config, milestones, and re-verification settings. The
nameproperty correctly identifies the strategy.
55-136: LGTM!The
evaluatemethod correctly:
- Checks for milestone-based promotion (single-step via
TRANSITION_KEYS)- Checks for decay when re-verification is enabled and agent is above
SANDBOXED- Builds a detailed result with joined reason parts
The structured logging with event constants provides good observability.
218-280: LGTM!The
_check_decaymethod comprehensively handles three decay conditions:
- Idle decay: Demotes if
idle_days >= decay_on_idle_days- Error rate decay: Demotes if error rate exceeds threshold
- Re-verification failure: Demotes if quality score drops below minimum after verification interval
Each decay condition is logged with
TRUST_DECAY_DETECTEDand the specific reason, providing excellent observability.
171-190: Single-window max approach is intentional—agents must demonstrate tasks within a recent rolling window period.The code correctly uses the maximum
tasks_completedvalue across all rolling time windows (7d, 30d, 90d) rather than summing across windows. This ensures agents demonstrate capability within a recent period rather than relying purely on historical cumulative performance. The design is consistent across other checks (clean history validation, error rate decay) and aligns with the rolling-window architecture of the performance tracking system.src/ai_company/security/trust/per_category_strategy.py (1)
1-178: LGTM!The implementation correctly addresses the past review comment by renaming the variable to
max_tasks_completed(line 166), clarifying that it represents the best single-window count rather than a cumulative total. The per-category evaluation logic, global level derivation (minimum rank across categories), and elevated gate enforcement are all well-structured.src/ai_company/security/trust/service.py (2)
199-224: LGTM!The past review comment about
last_promoted_atbeing incorrectly updated for demotions has been addressed. The code now correctly usesTRUST_LEVEL_RANKcomparison (lines 204-206) to determine if the change is a promotion before conditionally updatinglast_promoted_at(lines 211-212).
359-385: LGTM!The past review comment about
_infer_reasonnot distinguishing promotions from demotions has been addressed. The method now checks for demotion first (lines 372-376) and returnsTRUST_DECAYappropriately, while promotions get strategy-specific reasons.src/ai_company/hr/promotion/models.py (2)
192-208: LGTM!The past review comment about rejecting no-op model changes has been addressed. The validator now includes a check (lines 200-202) that raises
ValueErrorwhenmodel_changed=Truebutold_model_id == new_model_id, preventing misleading history records.
1-251: Well-designed frozen models.All models correctly use
ConfigDict(frozen=True)for immutability,NotBlankStrfor identifier fields, and@computed_fieldfor derived values (criteria_met_count,requires_human). ThePromotionEvaluation._validate_direction_consistencyvalidator (lines 89-99) ensures data integrity between direction and level ordering. Based on learnings: "Use frozen Pydantic models for config/identity."src/ai_company/hr/promotion/threshold_evaluator.py (3)
161-179: LGTM!The past review comment about clarifying aggregation strategies has been addressed. Line 161 documents "use most recent window value" for success_rate, explaining the
breakon first non-None value.
181-197: LGTM!The past review comment has been addressed. Line 181 documents "use peak value across windows" for tasks_completed, explaining the
max()aggregation strategy.
1-199: Well-structured threshold evaluator.The implementation correctly uses
MappingProxyTypefor read-only threshold defaults (lines 30-44), direction-aware threshold selection (lines 95-99), and proper N-of-M eligibility logic (line 112). Event constants and structured logging follow coding guidelines.src/ai_company/observability/events/promotion.py (1)
1-21: LGTM!The past review comment about
DEMOTION_APPLIEDnamespace has been addressed. Line 19 now correctly uses"promotion.demotion.applied"to maintain consistency with thepromotion.<subject>.<action>naming convention, ensuring it will be captured by promotion-related observability filters.src/ai_company/hr/promotion/service.py (4)
156-174: LGTM!The past review comment about using the wrong event for error logging has been addressed. The code now correctly uses
PROMOTION_EVALUATE_FAILED(lines 159, 169) for error conditions instead ofPROMOTION_EVALUATE_START.
458-473: LGTM!The past review comment about the discarded trust evaluation result has been addressed. Lines 458-459 now include a comment explaining the intent: "Best-effort trust re-evaluation — promotion is already applied, so failures here must not prevent the record from being returned." The side-effect-only pattern is now clearly documented.
534-551: LGTM!The past review comment about checking approval store availability early has been addressed. The
_approval_store is Nonecheck (lines 544-551) is now at the top of_create_approval, before any object creation, avoiding unnecessary work when no store is configured.
1-589: Well-orchestrated promotion service.The service properly coordinates criteria evaluation, approval decisions, model mapping, and trust integration. Error handling follows best-effort patterns where appropriate (trust re-evaluation, notifications) with proper logging. All past review comments have been addressed.
src/ai_company/api/approval_store.py (1)
8-8: LGTM!The
Callableimport is correctly moved to top-level since it's used at runtime in the__init__type hint (line 37). Thenoqa: TC003suppression is appropriate.tests/unit/hr/promotion/conftest.py (3)
43-53: LGTM!The past review comment about the
success_rate=0edge case has been addressed. The code now correctly handles three scenarios:
success_rate > 0: computes total and tasks_failed normallysuccess_rate == 0withtasks_completed > 0: treats all tasks as failures (lines 46-50)- Both zero: sets everything to 0 (lines 51-53)
72-91: LGTM!The
make_agent_identitybuilder correctly uses generic test names ("test-provider","test-small-001") instead of real vendor names, complying with the coding guideline: "Never use real vendor names in tests."
97-130: LGTM!The fixtures provide clean defaults for promotion workflow tests, aligning with the configuration models defined in
src/ai_company/hr/promotion/config.py.
| promotion_thresholds: dict[str, TrustThreshold] = Field( | ||
| default_factory=dict, | ||
| description="Thresholds for trust level transitions", | ||
| ) | ||
|
|
||
| # Per-category strategy config | ||
| initial_category_levels: dict[str, ToolAccessLevel] = Field( | ||
| default_factory=dict, | ||
| description="Initial per-category trust levels", | ||
| ) | ||
| category_criteria: dict[str, dict[str, CategoryTrustCriteria]] = Field( | ||
| default_factory=dict, | ||
| description="Per-category promotion criteria", | ||
| ) | ||
|
|
||
| # Milestone strategy config | ||
| milestones: dict[str, MilestoneCriteria] = Field( | ||
| default_factory=dict, | ||
| description="Milestone criteria for trust transitions", | ||
| ) | ||
| re_verification: ReVerificationConfig = Field( | ||
| default_factory=ReVerificationConfig, | ||
| description="Re-verification configuration", | ||
| ) |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major
Dict fields remain mutable despite frozen=True.
The promotion_thresholds, initial_category_levels, category_criteria, and milestones dict fields can be mutated after validation (e.g., config.promotion_thresholds["new_key"] = ...), potentially bypassing the _validate_elevated_gate security invariant. Consider wrapping these in MappingProxyType after validation, similar to how ModelMappingConfig.seniority_model_map is handled in hr/promotion/config.py.
As per coding guidelines: "Use immutability principles: create new objects instead of mutating existing ones; for non-Pydantic collections use copy.deepcopy() at construction and MappingProxyType for read-only enforcement".
🔒 Example approach for promotion_thresholds
+import copy
+from types import MappingProxyType
+
class TrustConfig(BaseModel):
# ... existing fields ...
+ `@model_validator`(mode="after")
+ def _wrap_dict_fields_readonly(self) -> Self:
+ """Wrap dict fields in MappingProxyType for immutability."""
+ for field_name in (
+ "promotion_thresholds",
+ "initial_category_levels",
+ "milestones",
+ ):
+ raw = getattr(self, field_name)
+ if not isinstance(raw, MappingProxyType):
+ object.__setattr__(
+ self,
+ field_name,
+ MappingProxyType(copy.deepcopy(raw)),
+ )
+ # category_criteria is nested dict
+ if not isinstance(self.category_criteria, MappingProxyType):
+ wrapped = {
+ k: MappingProxyType(copy.deepcopy(v))
+ for k, v in self.category_criteria.items()
+ }
+ object.__setattr__(
+ self,
+ "category_criteria",
+ MappingProxyType(wrapped),
+ )
+ return self🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/ai_company/security/trust/config.py` around lines 251 - 274, The dict
fields promotion_thresholds, initial_category_levels, category_criteria, and
milestones are still mutable after validation and should be converted to
read-only mappings to enforce invariants (e.g., the _validate_elevated_gate
check); import MappingProxyType and wrap a deep-copied version of each dict into
MappingProxyType after validation (use a root validator or post-init hook on the
same model class that defines these fields) — mirror the pattern used for
ModelMappingConfig.seniority_model_map in hr/promotion/config.py: deep-copy the
incoming dicts, validate them, then replace each field with
MappingProxyType(copy.deepcopy(...)) so external code cannot mutate them.
| category_levels: dict[str, ToolAccessLevel] = Field( | ||
| default_factory=dict, | ||
| description="Per-category trust levels", | ||
| ) | ||
| trust_score: float | None = Field( | ||
| default=None, | ||
| ge=0.0, | ||
| le=1.0, | ||
| description="Weighted trust score", | ||
| ) | ||
| last_evaluated_at: AwareDatetime | None = Field( | ||
| default=None, | ||
| description="When trust was last evaluated", | ||
| ) | ||
| last_promoted_at: AwareDatetime | None = Field( | ||
| default=None, | ||
| description="When trust level was last promoted", | ||
| ) | ||
| last_decay_check_at: AwareDatetime | None = Field( | ||
| default=None, | ||
| description="When decay was last checked", | ||
| ) | ||
| milestone_progress: dict[str, int | float] = Field( | ||
| default_factory=dict, | ||
| description="Milestone tracking data", | ||
| ) |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Consider wrapping dict fields for full immutability.
While TrustState is frozen, the category_levels and milestone_progress dict fields can still be mutated after construction (e.g., state.category_levels["key"] = value). Since this model represents state that evolves via model_copy(update=...), consider using a validator to wrap these in MappingProxyType for defense-in-depth, consistent with how ModelMappingConfig.seniority_model_map handles this.
That said, this is a lower priority since the intended usage pattern (immutable snapshots replaced via model_copy) mitigates the risk.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/ai_company/security/trust/models.py` around lines 48 - 73, The dict
fields category_levels and milestone_progress on the frozen TrustState can still
be mutated after construction; add a Pydantic validator on TrustState (or field
validators for category_levels and milestone_progress) that wraps incoming dicts
with types.MappingProxyType (or another immutable mapping) so instances are
truly immutable, mirroring the approach used by
ModelMappingConfig.seniority_model_map; ensure the validator handles
None/default_factory and preserves typing, and keep using model_copy(update=...)
for state updates.
| def test_model_changed_true_missing_old_model_id_raises(self) -> None: | ||
| """model_changed=True with old_model_id=None raises ValidationError.""" | ||
| evaluation = PromotionEvaluation( | ||
| agent_id="agent-001", | ||
| current_level=SeniorityLevel.JUNIOR, | ||
| target_level=SeniorityLevel.MID, | ||
| direction=PromotionDirection.PROMOTION, | ||
| criteria_results=(), | ||
| required_criteria_met=True, | ||
| eligible=True, | ||
| evaluated_at=datetime.now(UTC), | ||
| strategy_name="threshold_evaluator", | ||
| ) | ||
| with pytest.raises(ValidationError, match="old_model_id and new_model_id"): | ||
| PromotionRecord( | ||
| agent_id="agent-001", | ||
| agent_name="test-agent", | ||
| old_level=SeniorityLevel.JUNIOR, | ||
| new_level=SeniorityLevel.MID, | ||
| direction=PromotionDirection.PROMOTION, | ||
| evaluation=evaluation, | ||
| effective_at=datetime.now(UTC), | ||
| initiated_by="system", | ||
| model_changed=True, | ||
| old_model_id=None, | ||
| new_model_id="test-large-001", | ||
| ) | ||
|
|
||
| def test_model_changed_true_missing_new_model_id_raises(self) -> None: | ||
| """model_changed=True with new_model_id=None raises ValidationError.""" | ||
| evaluation = PromotionEvaluation( | ||
| agent_id="agent-001", | ||
| current_level=SeniorityLevel.JUNIOR, | ||
| target_level=SeniorityLevel.MID, | ||
| direction=PromotionDirection.PROMOTION, | ||
| criteria_results=(), | ||
| required_criteria_met=True, | ||
| eligible=True, | ||
| evaluated_at=datetime.now(UTC), | ||
| strategy_name="threshold_evaluator", | ||
| ) | ||
| with pytest.raises(ValidationError, match="old_model_id and new_model_id"): | ||
| PromotionRecord( | ||
| agent_id="agent-001", | ||
| agent_name="test-agent", | ||
| old_level=SeniorityLevel.JUNIOR, | ||
| new_level=SeniorityLevel.MID, | ||
| direction=PromotionDirection.PROMOTION, | ||
| evaluation=evaluation, | ||
| effective_at=datetime.now(UTC), | ||
| initiated_by="system", | ||
| model_changed=True, | ||
| old_model_id="test-small-001", | ||
| new_model_id=None, | ||
| ) | ||
|
|
||
| def test_model_changed_false_with_old_model_id_raises(self) -> None: | ||
| """model_changed=False with old_model_id set raises ValidationError.""" | ||
| evaluation = PromotionEvaluation( | ||
| agent_id="agent-001", | ||
| current_level=SeniorityLevel.JUNIOR, | ||
| target_level=SeniorityLevel.MID, | ||
| direction=PromotionDirection.PROMOTION, | ||
| criteria_results=(), | ||
| required_criteria_met=True, | ||
| eligible=True, | ||
| evaluated_at=datetime.now(UTC), | ||
| strategy_name="threshold_evaluator", | ||
| ) | ||
| with pytest.raises(ValidationError, match="model IDs to be None"): | ||
| PromotionRecord( | ||
| agent_id="agent-001", | ||
| agent_name="test-agent", | ||
| old_level=SeniorityLevel.JUNIOR, | ||
| new_level=SeniorityLevel.MID, | ||
| direction=PromotionDirection.PROMOTION, | ||
| evaluation=evaluation, | ||
| effective_at=datetime.now(UTC), | ||
| initiated_by="system", | ||
| model_changed=False, | ||
| old_model_id="test-small-001", | ||
| ) | ||
|
|
||
| def test_model_changed_true_same_model_ids_raises(self) -> None: | ||
| """model_changed=True with identical model IDs raises ValidationError.""" | ||
| evaluation = PromotionEvaluation( | ||
| agent_id="agent-001", | ||
| current_level=SeniorityLevel.JUNIOR, | ||
| target_level=SeniorityLevel.MID, | ||
| direction=PromotionDirection.PROMOTION, | ||
| criteria_results=(), | ||
| required_criteria_met=True, | ||
| eligible=True, | ||
| evaluated_at=datetime.now(UTC), | ||
| strategy_name="threshold_evaluator", | ||
| ) | ||
| with pytest.raises( | ||
| ValidationError, | ||
| match="old_model_id and new_model_id to differ", | ||
| ): | ||
| PromotionRecord( | ||
| agent_id="agent-001", | ||
| agent_name="test-agent", | ||
| old_level=SeniorityLevel.JUNIOR, | ||
| new_level=SeniorityLevel.MID, | ||
| direction=PromotionDirection.PROMOTION, | ||
| evaluation=evaluation, | ||
| effective_at=datetime.now(UTC), | ||
| initiated_by="system", | ||
| model_changed=True, | ||
| old_model_id="test-small-001", | ||
| new_model_id="test-small-001", | ||
| ) | ||
|
|
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Parameterize the model_changed validation matrix.
These cases rebuild the same PromotionEvaluation and PromotionRecord with only the model IDs and expected error changing. Folding them into one parametrized test will make future validator changes much cheaper to update.
As per coding guidelines: "Use @pytest.mark.parametrize for testing similar cases instead of duplicating test functions".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/unit/hr/promotion/test_models.py` around lines 241 - 354, Combine the
four similar tests into a single pytest.mark.parametrize test that builds one
PromotionEvaluation once and calls PromotionRecord with parameters for
model_changed, old_model_id, new_model_id and the expected ValidationError
match; specifically replace the separate
test_model_changed_true_missing_old_model_id_raises,
test_model_changed_true_missing_new_model_id_raises,
test_model_changed_false_with_old_model_id_raises, and
test_model_changed_true_same_model_ids_raises with one function (e.g.,
test_model_changed_validation) decorated with `@pytest.mark.parametrize` over
tuples of (model_changed, old_model_id, new_model_id, expected_error_substring),
and in the test assert pytest.raises(ValidationError,
match=expected_error_substring) when constructing PromotionRecord, reusing the
PromotionEvaluation creation inside the test body.
| async def test_junior_to_mid_auto_approves( | ||
| self, | ||
| approval_config: PromotionApprovalConfig, | ||
| ) -> None: | ||
| """Junior to Mid promotion is auto-approved (below SENIOR threshold).""" | ||
| strategy = SeniorityApprovalStrategy(config=approval_config) | ||
| evaluation = _make_evaluation( | ||
| current_level=SeniorityLevel.JUNIOR, | ||
| target_level=SeniorityLevel.MID, | ||
| direction=PromotionDirection.PROMOTION, | ||
| ) | ||
| identity = make_agent_identity(level=SeniorityLevel.JUNIOR) | ||
|
|
||
| decision = await strategy.decide( | ||
| evaluation=evaluation, | ||
| agent_identity=identity, | ||
| ) | ||
| assert decision.auto_approve is True | ||
| assert decision.requires_human is False | ||
|
|
||
| async def test_mid_to_senior_requires_human( | ||
| self, | ||
| approval_config: PromotionApprovalConfig, | ||
| ) -> None: | ||
| """Mid to Senior promotion requires human approval (at threshold).""" | ||
| strategy = SeniorityApprovalStrategy(config=approval_config) | ||
| evaluation = _make_evaluation( | ||
| current_level=SeniorityLevel.MID, | ||
| target_level=SeniorityLevel.SENIOR, | ||
| direction=PromotionDirection.PROMOTION, | ||
| ) | ||
| identity = make_agent_identity(level=SeniorityLevel.MID) | ||
|
|
||
| decision = await strategy.decide( | ||
| evaluation=evaluation, | ||
| agent_identity=identity, | ||
| ) | ||
| assert decision.auto_approve is False | ||
| assert decision.requires_human is True | ||
|
|
||
| async def test_senior_to_lead_requires_human( | ||
| self, | ||
| approval_config: PromotionApprovalConfig, | ||
| ) -> None: | ||
| """Senior to Lead promotion also requires human approval.""" | ||
| strategy = SeniorityApprovalStrategy(config=approval_config) | ||
| evaluation = _make_evaluation( | ||
| current_level=SeniorityLevel.SENIOR, | ||
| target_level=SeniorityLevel.LEAD, | ||
| direction=PromotionDirection.PROMOTION, | ||
| ) | ||
| identity = make_agent_identity(level=SeniorityLevel.SENIOR) | ||
|
|
||
| decision = await strategy.decide( | ||
| evaluation=evaluation, | ||
| agent_identity=identity, | ||
| ) | ||
| assert decision.auto_approve is False | ||
| assert decision.requires_human is True | ||
|
|
||
| async def test_custom_threshold_lead(self) -> None: | ||
| """Promotion to Senior auto-approved when threshold is LEAD.""" | ||
| config = PromotionApprovalConfig( | ||
| human_approval_from_level=SeniorityLevel.LEAD, | ||
| ) | ||
| strategy = SeniorityApprovalStrategy(config=config) | ||
| evaluation = _make_evaluation( | ||
| current_level=SeniorityLevel.MID, | ||
| target_level=SeniorityLevel.SENIOR, | ||
| direction=PromotionDirection.PROMOTION, | ||
| ) | ||
| identity = make_agent_identity(level=SeniorityLevel.MID) | ||
|
|
||
| decision = await strategy.decide( | ||
| evaluation=evaluation, | ||
| agent_identity=identity, | ||
| ) | ||
| assert decision.auto_approve is True | ||
| assert decision.requires_human is False |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Collapse the approval decision matrix into parametrized cases.
These promotion/demotion tests only vary by levels, config, and expected flags/reason text. Converting them into one or two parametrized tests would keep the scenario table easier to extend without repeating the same arrange/act flow.
As per coding guidelines: "Use @pytest.mark.parametrize for testing similar cases instead of duplicating test functions".
Also applies to: 142-204
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/unit/hr/promotion/test_seniority_approval_strategy.py` around lines 57
- 135, Collapse the four near-duplicate tests (test_junior_to_mid_auto_approves,
test_mid_to_senior_requires_human, test_senior_to_lead_requires_human,
test_custom_threshold_lead) into a single parametrized test using
pytest.mark.parametrize that iterates rows of (current_level, target_level,
config, expected_auto_approve, expected_requires_human); inside the test
instantiate SeniorityApprovalStrategy (use PromotionApprovalConfig where a
non-default threshold is needed), build the evaluation with _make_evaluation,
create identity via make_agent_identity, call strategy.decide, and assert
decision.auto_approve and decision.requires_human for each row — this removes
duplication while still exercising SeniorityApprovalStrategy.decide and keeping
PromotionApprovalConfig, _make_evaluation, and make_agent_identity references
for locating the code.
| """Unit tests for PromotionService orchestrator.""" | ||
|
|
||
| from datetime import UTC, datetime | ||
| from typing import TYPE_CHECKING | ||
| from unittest.mock import AsyncMock | ||
|
|
||
| import pytest | ||
|
|
||
| from ai_company.api.approval_store import ApprovalStore | ||
| from ai_company.core.enums import ( | ||
| ApprovalStatus, | ||
| Complexity, | ||
| SeniorityLevel, | ||
| TaskType, | ||
| ) | ||
| from ai_company.core.types import NotBlankStr | ||
| from ai_company.hr.errors import ( | ||
| PromotionApprovalRequiredError, | ||
| PromotionCooldownError, | ||
| PromotionError, | ||
| ) | ||
| from ai_company.hr.performance.models import TaskMetricRecord | ||
| from ai_company.hr.promotion.config import ( | ||
| PromotionConfig, | ||
| ) | ||
| from ai_company.hr.promotion.seniority_approval_strategy import ( | ||
| SeniorityApprovalStrategy, | ||
| ) | ||
| from ai_company.hr.promotion.seniority_model_mapping import ( | ||
| SeniorityModelMapping, | ||
| ) | ||
| from ai_company.hr.promotion.service import PromotionService | ||
| from ai_company.hr.promotion.threshold_evaluator import ThresholdEvaluator | ||
|
|
||
| from .conftest import make_agent_identity | ||
|
|
||
| if TYPE_CHECKING: | ||
| from ai_company.hr.performance.tracker import PerformanceTracker | ||
| from ai_company.hr.registry import AgentRegistryService | ||
|
|
There was a problem hiding this comment.
Add the module timeout marker.
This file has many async tests but no 30-second timeout marker, so a hung approval or tracker path can stall the suite.
⏱️ Minimal fix
if TYPE_CHECKING:
from ai_company.hr.performance.tracker import PerformanceTracker
from ai_company.hr.registry import AgentRegistryService
+pytestmark = pytest.mark.timeout(30)
+
# ── Helpers ──────────────────────────────────────────────────────Based on learnings: Applies to tests/**/*.py : Set test timeout to 30 seconds per test — use pytest.mark.timeout(30) or configure in pytest.ini.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/unit/hr/promotion/test_service.py` around lines 1 - 40, Add a 30-second
per-test timeout marker to the module by defining pytestmark =
pytest.mark.timeout(30) near the top of the file (add import pytest if missing);
this ensures all async tests in the PromotionService test module use a 30s
timeout and prevents hangs in approval/tracker code paths.
| if success_rate > 0: | ||
| total = max(1, int(tasks_completed / success_rate)) | ||
| tasks_failed = total - tasks_completed | ||
| elif tasks_completed > 0: | ||
| # 0% success: every task is a failure | ||
| total = tasks_completed | ||
| tasks_failed = total | ||
| tasks_completed = 0 | ||
| else: | ||
| total = 0 | ||
| tasks_failed = 0 | ||
|
|
||
| window = WindowMetrics( | ||
| window_size="30d", | ||
| data_point_count=total, | ||
| tasks_completed=tasks_completed, | ||
| tasks_failed=tasks_failed, | ||
| avg_quality_score=quality, | ||
| success_rate=success_rate, | ||
| ) |
There was a problem hiding this comment.
Stop emitting inconsistent failure counts in make_performance_snapshot.
Line 44 truncates the inferred total. For example, tasks_completed=10 and success_rate=0.95 produces tasks_failed=0 but still stores success_rate=0.95. Any trust path that derives error rate or “clean history” from counts instead of the float field will miss the failure case these tests intend to simulate.
🛠️ Safer direction
def make_performance_snapshot(
agent_id: str = "agent-001",
*,
quality: float = 8.0,
success_rate: float = 0.9,
tasks_completed: int = 15,
) -> AgentPerformanceSnapshot:
@@
- if success_rate > 0:
- total = max(1, int(tasks_completed / success_rate))
- tasks_failed = total - tasks_completed
+ if success_rate >= 1:
+ total = tasks_completed
+ tasks_failed = 0
+ elif success_rate > 0:
+ tasks_failed = max(
+ 1,
+ round(tasks_completed * (1 - success_rate) / success_rate),
+ )
+ total = tasks_completed + tasks_failed
+ success_rate = tasks_completed / total
elif tasks_completed > 0:
# 0% success: every task is a failure
total = tasks_completed
tasks_failed = total
tasks_completed = 0📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if success_rate > 0: | |
| total = max(1, int(tasks_completed / success_rate)) | |
| tasks_failed = total - tasks_completed | |
| elif tasks_completed > 0: | |
| # 0% success: every task is a failure | |
| total = tasks_completed | |
| tasks_failed = total | |
| tasks_completed = 0 | |
| else: | |
| total = 0 | |
| tasks_failed = 0 | |
| window = WindowMetrics( | |
| window_size="30d", | |
| data_point_count=total, | |
| tasks_completed=tasks_completed, | |
| tasks_failed=tasks_failed, | |
| avg_quality_score=quality, | |
| success_rate=success_rate, | |
| ) | |
| if success_rate >= 1: | |
| total = tasks_completed | |
| tasks_failed = 0 | |
| elif success_rate > 0: | |
| tasks_failed = max( | |
| 1, | |
| round(tasks_completed * (1 - success_rate) / success_rate), | |
| ) | |
| total = tasks_completed + tasks_failed | |
| success_rate = tasks_completed / total | |
| elif tasks_completed > 0: | |
| # 0% success: every task is a failure | |
| total = tasks_completed | |
| tasks_failed = total | |
| tasks_completed = 0 | |
| else: | |
| total = 0 | |
| tasks_failed = 0 | |
| window = WindowMetrics( | |
| window_size="30d", | |
| data_point_count=total, | |
| tasks_completed=tasks_completed, | |
| tasks_failed=tasks_failed, | |
| avg_quality_score=quality, | |
| success_rate=success_rate, | |
| ) |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/unit/security/trust/conftest.py` around lines 43 - 62, The helper in
make_performance_snapshot computes total using int(tasks_completed /
success_rate) which truncates and yields inconsistent counts vs the stored float
success_rate; change it to use ceiling (e.g., math.ceil(tasks_completed /
success_rate)) so total >= tasks_completed and tasks_failed = total -
tasks_completed reflects at least one inferred failure when the float
success_rate implies it; update the success_rate>0 branch to import/use
math.ceil, compute total = max(1, math.ceil(tasks_completed / success_rate)),
then recompute tasks_failed before constructing WindowMetrics (window_size,
data_point_count, tasks_completed, tasks_failed, avg_quality_score,
success_rate).
| ( | ||
| TrustStrategyType.WEIGHTED, | ||
| { | ||
| "promotion_thresholds": { | ||
| "sandboxed_to_restricted": TrustThreshold( | ||
| score=0.5, | ||
| ), | ||
| }, | ||
| }, | ||
| ), |
There was a problem hiding this comment.
Fail fast on misspelled threshold keys.
promotion_thresholds is currently a plain dict[str, TrustThreshold], and WeightedTrustStrategy only looks up the fixed transition names. A typo here is therefore silently ignored instead of failing config load, which makes the trust policy look valid while disabling it.
Suggested regression
+ def test_unknown_threshold_key_raises(self) -> None:
+ with pytest.raises(ValueError, match="Unknown trust transition"):
+ TrustConfig(
+ strategy=TrustStrategyType.WEIGHTED,
+ promotion_thresholds={
+ "sandbxed_to_restricted": TrustThreshold(score=0.5),
+ },
+ )📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| ( | |
| TrustStrategyType.WEIGHTED, | |
| { | |
| "promotion_thresholds": { | |
| "sandboxed_to_restricted": TrustThreshold( | |
| score=0.5, | |
| ), | |
| }, | |
| }, | |
| ), | |
| def test_unknown_threshold_key_raises(self) -> None: | |
| with pytest.raises(ValueError, match="Unknown trust transition"): | |
| TrustConfig( | |
| strategy=TrustStrategyType.WEIGHTED, | |
| promotion_thresholds={ | |
| "sandbxed_to_restricted": TrustThreshold(score=0.5), | |
| }, | |
| ) |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/unit/security/trust/test_config.py` around lines 250 - 259, The test
shows promotion_thresholds is a plain dict so typos are ignored; update
WeightedTrustStrategy to validate keys on construction: in
WeightedTrustStrategy.__init__ (or the config loader) define the exact allowed
transition names (e.g., "sandboxed_to_restricted", "restricted_to_something" —
whatever the class expects) and check promotion_thresholds.keys() against that
set, raising a ValueError (or ConfigError) listing unknown keys when any
extra/misspelled keys are present; this ensures TrustStrategyType.WEIGHTED /
promotion_thresholds / TrustThreshold mis-keys fail fast.
| async def test_evaluate_low_scores_stays_at_current( | ||
| self, | ||
| weighted_config: TrustConfig, | ||
| ) -> None: | ||
| strategy = WeightedTrustStrategy(config=weighted_config) | ||
| state = TrustState( | ||
| agent_id=NotBlankStr("agent-001"), | ||
| global_level=ToolAccessLevel.SANDBOXED, | ||
| trust_score=0.0, | ||
| ) | ||
| snapshot = make_performance_snapshot( | ||
| "agent-001", | ||
| quality=2.0, | ||
| success_rate=0.3, | ||
| tasks_completed=3, | ||
| ) | ||
|
|
||
| result = await strategy.evaluate( | ||
| agent_id=NotBlankStr("agent-001"), | ||
| current_state=state, | ||
| snapshot=snapshot, | ||
| ) | ||
|
|
||
| assert result.score is not None | ||
| assert result.score < 0.5 | ||
| assert result.recommended_level == ToolAccessLevel.SANDBOXED | ||
| assert result.should_change is False |
There was a problem hiding this comment.
Add a non-floor demotion regression.
This only exercises the floor case (SANDBOXED). The current WeightedTrustStrategy implementation walks forward transition thresholds only, so a RESTRICTED or STANDARD agent with a collapsing score can stay over-trusted without this suite noticing, even though demotion support is part of the trust contract.
Suggested regression
+ async def test_evaluate_low_scores_demotes_one_level(
+ self,
+ weighted_config: TrustConfig,
+ ) -> None:
+ strategy = WeightedTrustStrategy(config=weighted_config)
+ state = TrustState(
+ agent_id=NotBlankStr("agent-001"),
+ global_level=ToolAccessLevel.RESTRICTED,
+ trust_score=0.7,
+ )
+ snapshot = make_performance_snapshot(
+ "agent-001",
+ quality=1.0,
+ success_rate=0.1,
+ tasks_completed=1,
+ )
+
+ result = await strategy.evaluate(
+ agent_id=NotBlankStr("agent-001"),
+ current_state=state,
+ snapshot=snapshot,
+ )
+
+ assert result.recommended_level == ToolAccessLevel.SANDBOXED
+ assert result.should_change is True📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| async def test_evaluate_low_scores_stays_at_current( | |
| self, | |
| weighted_config: TrustConfig, | |
| ) -> None: | |
| strategy = WeightedTrustStrategy(config=weighted_config) | |
| state = TrustState( | |
| agent_id=NotBlankStr("agent-001"), | |
| global_level=ToolAccessLevel.SANDBOXED, | |
| trust_score=0.0, | |
| ) | |
| snapshot = make_performance_snapshot( | |
| "agent-001", | |
| quality=2.0, | |
| success_rate=0.3, | |
| tasks_completed=3, | |
| ) | |
| result = await strategy.evaluate( | |
| agent_id=NotBlankStr("agent-001"), | |
| current_state=state, | |
| snapshot=snapshot, | |
| ) | |
| assert result.score is not None | |
| assert result.score < 0.5 | |
| assert result.recommended_level == ToolAccessLevel.SANDBOXED | |
| assert result.should_change is False | |
| async def test_evaluate_low_scores_stays_at_current( | |
| self, | |
| weighted_config: TrustConfig, | |
| ) -> None: | |
| strategy = WeightedTrustStrategy(config=weighted_config) | |
| state = TrustState( | |
| agent_id=NotBlankStr("agent-001"), | |
| global_level=ToolAccessLevel.SANDBOXED, | |
| trust_score=0.0, | |
| ) | |
| snapshot = make_performance_snapshot( | |
| "agent-001", | |
| quality=2.0, | |
| success_rate=0.3, | |
| tasks_completed=3, | |
| ) | |
| result = await strategy.evaluate( | |
| agent_id=NotBlankStr("agent-001"), | |
| current_state=state, | |
| snapshot=snapshot, | |
| ) | |
| assert result.score is not None | |
| assert result.score < 0.5 | |
| assert result.recommended_level == ToolAccessLevel.SANDBOXED | |
| assert result.should_change is False | |
| async def test_evaluate_low_scores_demotes_one_level( | |
| self, | |
| weighted_config: TrustConfig, | |
| ) -> None: | |
| strategy = WeightedTrustStrategy(config=weighted_config) | |
| state = TrustState( | |
| agent_id=NotBlankStr("agent-001"), | |
| global_level=ToolAccessLevel.RESTRICTED, | |
| trust_score=0.7, | |
| ) | |
| snapshot = make_performance_snapshot( | |
| "agent-001", | |
| quality=1.0, | |
| success_rate=0.1, | |
| tasks_completed=1, | |
| ) | |
| result = await strategy.evaluate( | |
| agent_id=NotBlankStr("agent-001"), | |
| current_state=state, | |
| snapshot=snapshot, | |
| ) | |
| assert result.recommended_level == ToolAccessLevel.SANDBOXED | |
| assert result.should_change is True |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/unit/security/trust/test_weighted_strategy.py` around lines 74 - 100,
Add a non-floor demotion regression test: create a new async test in
tests/unit/security/trust/test_weighted_strategy.py that mirrors the existing
low-score test but starts current_state.global_level at
ToolAccessLevel.RESTRICTED (or STANDARD) with a non-zero trust_score; provide a
snapshot that causes the WeightedTrustStrategy.evaluate to produce a collapsing
score below the demotion threshold and assert that result.score is not None and
below the relevant threshold, result.recommended_level equals the next-lower
ToolAccessLevel (e.g., RESTRICTED -> SANDBOXED or STANDARD -> RESTRICTED) and
result.should_change is True; reference WeightedTrustStrategy.evaluate,
TrustState, TrustConfig, and ToolAccessLevel when locating code to ensure the
strategy supports downward transitions and the test fails if only floor demotion
is implemented.
Security fixes: - Add approval store verification in apply_promotion (defense-in-depth) - Re-enforce elevated gate in apply_trust_change (defense-in-depth) - Add field allowlist to update_identity (prevent mass assignment) Bug fixes: - Fix clean_history_days to filter by window period (was checking all windows) - Fix demotion tasks_completed threshold from 0.0 to 3.0 (was unsatisfiable) - Add category_criteria coverage validator on TrustConfig - Remove unused PROMOTION_APPROVED event constant - Rename total_tasks to max_tasks_completed in threshold_evaluator Test improvements: - Add frozen/immutability tests for nested config classes - Parametrize enum value tests - Convert elevated gate tests from async to sync (method is sync) - Update registry test for allowlist enforcement
cfeba92 to
6c649a6
Compare
| event = ( | ||
| PROMOTION_REJECTED | ||
| if request.status == ApprovalStatus.REJECTED | ||
| else PROMOTION_REQUESTED | ||
| ) | ||
| logger.warning( | ||
| event, | ||
| agent_id=request.agent_id, | ||
| status=request.status.value, | ||
| ) | ||
| msg = f"Cannot apply promotion: request status is {request.status.value}" |
There was a problem hiding this comment.
Wrong log event for non-approved, non-rejected status
When request.status is PENDING (or any future non-APPROVED/non-REJECTED state), PROMOTION_REQUESTED is used as the structured log event. PROMOTION_REQUESTED semantically means "a new promotion request was created," so log aggregation tools would misclassify this warning as a successful request creation rather than a rejected call. A more appropriate fallback would be PROMOTION_REJECTED (since the promotion was refused regardless of the reason), or a dedicated PROMOTION_APPROVAL_REQUIRED constant.
| event = ( | |
| PROMOTION_REJECTED | |
| if request.status == ApprovalStatus.REJECTED | |
| else PROMOTION_REQUESTED | |
| ) | |
| logger.warning( | |
| event, | |
| agent_id=request.agent_id, | |
| status=request.status.value, | |
| ) | |
| msg = f"Cannot apply promotion: request status is {request.status.value}" | |
| event = ( | |
| PROMOTION_REJECTED | |
| if request.status == ApprovalStatus.REJECTED | |
| else PROMOTION_REJECTED | |
| ) |
Or, more explicitly, remove the conditional entirely and always use PROMOTION_REJECTED for this error path.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/ai_company/hr/promotion/service.py
Line: 377-387
Comment:
**Wrong log event for non-approved, non-rejected status**
When `request.status` is `PENDING` (or any future non-APPROVED/non-REJECTED state), `PROMOTION_REQUESTED` is used as the structured log event. `PROMOTION_REQUESTED` semantically means "a new promotion request was created," so log aggregation tools would misclassify this warning as a successful request creation rather than a rejected call. A more appropriate fallback would be `PROMOTION_REJECTED` (since the promotion was refused regardless of the reason), or a dedicated `PROMOTION_APPROVAL_REQUIRED` constant.
```suggestion
event = (
PROMOTION_REJECTED
if request.status == ApprovalStatus.REJECTED
else PROMOTION_REJECTED
)
```
Or, more explicitly, remove the conditional entirely and always use `PROMOTION_REJECTED` for this error path.
How can I resolve this? If you propose a fix, please make it concise.| except Exception: | ||
| logger.warning( | ||
| PROMOTION_APPLIED, | ||
| agent_id=request.agent_id, | ||
| error="Trust re-evaluation failed after promotion; " | ||
| "promotion still applied", | ||
| ) |
There was a problem hiding this comment.
Success event constant used on failure/warning path
PROMOTION_APPLIED is a success-signal event, but here it is emitted at WARNING level to describe a trust re-evaluation failure. Any log-aggregation query filtering for PROMOTION_APPLIED will ingest this warning as a successful promotion event — misreporting failure cases.
PROMOTION_EVALUATE_FAILED is already imported and semantically correct here:
| except Exception: | |
| logger.warning( | |
| PROMOTION_APPLIED, | |
| agent_id=request.agent_id, | |
| error="Trust re-evaluation failed after promotion; " | |
| "promotion still applied", | |
| ) | |
| logger.warning( | |
| PROMOTION_EVALUATE_FAILED, | |
| agent_id=request.agent_id, | |
| error="Trust re-evaluation failed after promotion; " | |
| "promotion still applied", | |
| ) |
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/ai_company/hr/promotion/service.py
Line: 469-475
Comment:
**Success event constant used on failure/warning path**
`PROMOTION_APPLIED` is a success-signal event, but here it is emitted at `WARNING` level to describe a trust re-evaluation failure. Any log-aggregation query filtering for `PROMOTION_APPLIED` will ingest this warning as a successful promotion event — misreporting failure cases.
`PROMOTION_EVALUATE_FAILED` is already imported and semantically correct here:
```suggestion
logger.warning(
PROMOTION_EVALUATE_FAILED,
agent_id=request.agent_id,
error="Trust re-evaluation failed after promotion; "
"promotion still applied",
)
```
How can I resolve this? If you propose a fix, please make it concise.| class WeightedTrustStrategy: | ||
| """Trust strategy using a single weighted score. | ||
|
|
||
| Computes a trust score from four weighted factors derived from | ||
| the agent's performance snapshot: | ||
| - task_difficulty: quality score normalized to [0, 1] | ||
| - completion_rate: task success rate from the latest window | ||
| - error_rate: failure penalty (tasks_failed / data_point_count) | ||
| - human_feedback: task volume proxy (tasks / 100, capped at 1.0) | ||
| — placeholder until actual human feedback signals are available | ||
|
|
||
| The score is compared against configurable thresholds to | ||
| determine the recommended trust level. Trust changes are | ||
| restricted to one adjacent level per evaluation. | ||
| """ |
There was a problem hiding this comment.
WeightedTrustStrategy only promotes; demotion is never recommended
The class-level docstring says "promotes/demotes based on configurable thresholds," but _score_to_level exclusively iterates TRANSITION_KEYS, which only contains upward transitions (sandboxed_to_restricted, restricted_to_standard, standard_to_elevated). There is no corresponding downward path.
Concretely: an agent promoted to STANDARD whose weighted score subsequently drops to 0.0 will never be recommended for demotion by this strategy — _score_to_level returns current_level unchanged. This also means TrustService.check_decay is a silent no-op for any agent using the weighted strategy, even though it appears in the public service API.
If downward movement is not intended for this strategy, the class docstring should be corrected:
| class WeightedTrustStrategy: | |
| """Trust strategy using a single weighted score. | |
| Computes a trust score from four weighted factors derived from | |
| the agent's performance snapshot: | |
| - task_difficulty: quality score normalized to [0, 1] | |
| - completion_rate: task success rate from the latest window | |
| - error_rate: failure penalty (tasks_failed / data_point_count) | |
| - human_feedback: task volume proxy (tasks / 100, capped at 1.0) | |
| — placeholder until actual human feedback signals are available | |
| The score is compared against configurable thresholds to | |
| determine the recommended trust level. Trust changes are | |
| restricted to one adjacent level per evaluation. | |
| """ | |
| class WeightedTrustStrategy: | |
| """Trust strategy using a single weighted score. | |
| Computes a trust score from four weighted factors derived from | |
| the agent's performance snapshot: | |
| - task_difficulty: quality score normalized to [0, 1] | |
| - completion_rate: task success rate from the latest window | |
| - error_rate: failure penalty (tasks_failed / data_point_count) | |
| - human_feedback: task volume proxy (tasks / 100, capped at 1.0) | |
| — placeholder until actual human feedback signals are available | |
| The score is compared against configurable thresholds to | |
| determine the recommended trust level. Trust changes are | |
| restricted to one adjacent level per evaluation (promotion only; | |
| use the milestone strategy for decay/demotion detection). | |
| """ |
If downward movement is intended, _score_to_level needs a complementary demotion path.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/ai_company/security/trust/weighted_strategy.py
Line: 26-40
Comment:
**`WeightedTrustStrategy` only promotes; demotion is never recommended**
The class-level docstring says "promotes/demotes based on configurable thresholds," but `_score_to_level` exclusively iterates `TRANSITION_KEYS`, which only contains upward transitions (`sandboxed_to_restricted`, `restricted_to_standard`, `standard_to_elevated`). There is no corresponding downward path.
Concretely: an agent promoted to `STANDARD` whose weighted score subsequently drops to 0.0 will never be recommended for demotion by this strategy — `_score_to_level` returns `current_level` unchanged. This also means `TrustService.check_decay` is a silent no-op for any agent using the `weighted` strategy, even though it appears in the public service API.
If downward movement is not intended for this strategy, the class docstring should be corrected:
```suggestion
class WeightedTrustStrategy:
"""Trust strategy using a single weighted score.
Computes a trust score from four weighted factors derived from
the agent's performance snapshot:
- task_difficulty: quality score normalized to [0, 1]
- completion_rate: task success rate from the latest window
- error_rate: failure penalty (tasks_failed / data_point_count)
- human_feedback: task volume proxy (tasks / 100, capped at 1.0)
— placeholder until actual human feedback signals are available
The score is compared against configurable thresholds to
determine the recommended trust level. Trust changes are
restricted to one adjacent level per evaluation (promotion only;
use the milestone strategy for decay/demotion detection).
"""
```
If downward movement *is* intended, `_score_to_level` needs a complementary demotion path.
How can I resolve this? If you propose a fix, please make it concise.There was a problem hiding this comment.
Pull request overview
Copilot reviewed 56 out of 56 changed files in this pull request and generated 11 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| for category, current_level in current_state.category_levels.items(): | ||
| cat_criteria = self._criteria.get(category, {}) | ||
| new_level = current_level | ||
|
|
||
| for transition_key, criteria in cat_criteria.items(): | ||
| from_to = transition_key.split("_to_", maxsplit=1) | ||
| if len(from_to) != 2: # noqa: PLR2004 | ||
| continue | ||
| from_str, to_str = from_to | ||
|
|
||
| if current_level.value != from_str: | ||
| continue | ||
|
|
||
| if self._check_category_criteria( | ||
| criteria_config=criteria, | ||
| snapshot=snapshot, | ||
| ): | ||
| try: | ||
| new_level = ToolAccessLevel(to_str) | ||
| except ValueError: | ||
| logger.warning( | ||
| TRUST_EVALUATE_FAILED, | ||
| agent_id=agent_id, | ||
| category=category, | ||
| transition_key=transition_key, | ||
| invalid_level=to_str, | ||
| ) | ||
| continue | ||
|
|
||
| if criteria.requires_human_approval: | ||
| requires_human = True | ||
|
|
||
| # Per-category ELEVATED always requires human approval | ||
| if new_level == ToolAccessLevel.ELEVATED: | ||
| requires_human = True | ||
|
|
||
| category_updates[category] = new_level | ||
|
|
||
| if category_updates: | ||
| min_rank = min( | ||
| TRUST_LEVEL_RANK.get(lv, 0) for lv in category_updates.values() | ||
| ) | ||
| recommended = TRUST_LEVEL_ORDER[min_rank] | ||
| else: | ||
| recommended = current_state.global_level | ||
|
|
There was a problem hiding this comment.
In per_category_strategy.py, the evaluate method (lines 71-107) iterates over current_state.category_levels.items() to determine which categories to evaluate. However, it uses the current state's category levels rather than the config's initial_category_levels to drive evaluation. If a new category is added to config's initial_category_levels after an agent's state was created, that category won't be evaluated for the existing agent. More critically, the variable category_updates is populated from iterations over current_state.category_levels, not from self._initial_levels. This means newly configured categories (in self._initial_levels) that are not yet in current_state.category_levels are silently ignored during evaluation. Since initial_state is only called once at initialization, any categories added to config after initialization are permanently excluded.
| seniority_model_map: Any = Field( | ||
| default_factory=dict, | ||
| description="Explicit seniority level to model ID overrides " | ||
| "(wrapped as MappingProxyType after validation)", | ||
| ) |
There was a problem hiding this comment.
In config.py (promotion), the seniority_model_map field on ModelMappingConfig is typed as Any (line 80), which loses type safety. The _validate_model_map_keys validator converts it to a MappingProxyType[str, str], but the field type annotation remains Any. This means consumers of this config field have no type guidance, and static analysis tools won't catch misuse. The field should be typed as dict[str, str] in the annotation (Pydantic will accept a plain dict at construction time), and the validator should document that it wraps the result in MappingProxyType. Alternatively, accept and store it as MappingProxyType[str, str] with a custom field type.
| disallowed = set(updates.keys()) - self._UPDATABLE_FIELDS | ||
| if disallowed: | ||
| msg = ( | ||
| f"Fields not allowed for update_identity: " | ||
| f"{sorted(disallowed)}; allowed: {sorted(self._UPDATABLE_FIELDS)}" | ||
| ) | ||
| logger.warning( | ||
| HR_REGISTRY_IDENTITY_UPDATED, | ||
| agent_id=str(agent_id), | ||
| error=msg, | ||
| ) | ||
| raise ValueError(msg) | ||
|
|
||
| key = str(agent_id) | ||
| async with self._lock: | ||
| identity = self._agents.get(key) | ||
| if identity is None: | ||
| msg = f"Agent {agent_id!r} not found in registry" | ||
| logger.warning( | ||
| HR_REGISTRY_IDENTITY_UPDATED, | ||
| agent_id=key, | ||
| error=msg, | ||
| ) | ||
| raise AgentNotFoundError(msg) | ||
| updated = identity.model_copy(update=updates) | ||
| self._agents[key] = updated | ||
|
|
||
| logger.info( | ||
| HR_REGISTRY_IDENTITY_UPDATED, | ||
| agent_id=key, | ||
| updated_fields=sorted(updates.keys()), | ||
| ) | ||
| return updated |
There was a problem hiding this comment.
In registry.py, the update_identity method logs a HR_REGISTRY_IDENTITY_UPDATED event even on the error path (disallowed fields, line 232), which is confusing since the event name implies success. An error path should use a different event constant (e.g., a dedicated HR_REGISTRY_UPDATE_FAILED event or at least pass an error= kwarg to distinguish failure from success). Additionally, looking at line 252 (success path), the variable key (not agent_id) is used in the log call, which is fine for consistency, but the validation check on line 225-236 logs with agent_id=str(agent_id) directly (before key is defined), making the key usage inconsistent across the two log calls within this function.
| # Success rate from latest window | ||
| completion_factor = 0.0 | ||
| for window in snapshot.windows: | ||
| if window.success_rate is not None: | ||
| completion_factor = window.success_rate | ||
| break | ||
|
|
||
| # Error penalty: 1 - (tasks_failed / data_point_count) | ||
| error_factor = 1.0 | ||
| for window in snapshot.windows: | ||
| if window.data_point_count > 0: | ||
| error_factor = 1.0 - (window.tasks_failed / window.data_point_count) | ||
| break | ||
|
|
||
| # Task volume ratio (tasks completed / 100, capped at 1.0) | ||
| # — placeholder for human feedback until that signal is available | ||
| feedback_factor = 0.0 | ||
| for window in snapshot.windows: | ||
| if window.tasks_completed > 0: | ||
| feedback_factor = min(window.tasks_completed / 100.0, 1.0) | ||
| break |
There was a problem hiding this comment.
In weighted_strategy.py, the _compute_score method (lines 133-138) uses only the first window's success_rate via a break after the first non-None value. This means if windows are ordered from newest to oldest, it uses the most recent window, which is correct. However, the code comment says "use latest window value" but the ordering of snapshot.windows is not documented or guaranteed in this context. In contrast, the error_factor calculation (lines 141-145) and feedback_factor (lines 149-153) also take the first valid window. If windows are not guaranteed to be in chronological order, the "latest" assumption could produce incorrect results. This should be documented or an explicit sort/selection should be added.
| current_rank = TRUST_LEVEL_RANK.get(state.global_level, 0) | ||
| if current_rank <= 0: | ||
| return None | ||
|
|
||
| # Idle decay | ||
| last_eval = state.last_evaluated_at | ||
| if last_eval is not None: | ||
| idle_days = (now - last_eval).days | ||
| if idle_days >= self._re_verification.decay_on_idle_days: | ||
| demoted = TRUST_LEVEL_ORDER[current_rank - 1] | ||
| logger.info( | ||
| TRUST_DECAY_DETECTED, | ||
| agent_id=state.agent_id, | ||
| reason="idle", | ||
| idle_days=idle_days, | ||
| threshold=self._re_verification.decay_on_idle_days, | ||
| ) | ||
| return demoted | ||
|
|
||
| # Error rate decay | ||
| for window in snapshot.windows: | ||
| if window.data_point_count > 0 and window.success_rate is not None: | ||
| error_rate = 1.0 - window.success_rate | ||
| if error_rate > self._re_verification.decay_on_error_rate: | ||
| demoted = TRUST_LEVEL_ORDER[current_rank - 1] | ||
| logger.info( | ||
| TRUST_DECAY_DETECTED, | ||
| agent_id=state.agent_id, | ||
| reason="error_rate", | ||
| error_rate=error_rate, | ||
| threshold=self._re_verification.decay_on_error_rate, | ||
| ) | ||
| return demoted | ||
| break |
There was a problem hiding this comment.
In milestone_strategy.py, the _check_decay method at lines 256-285 checks idle decay first, and if idle decay is triggered, it returns early without checking error rate. However, if both idle decay AND high error rate are present, the method still returns only one demotion level (not two). This is likely intentional (one level at a time), but the ordering of checks has an implicit priority: idle decay takes precedence over error rate. This precedence is not documented. Additionally, the method returns after the first window's error rate check due to break on line 285, but this break means subsequent windows are ignored for error rate calculation. If the first window has no data (data_point_count == 0 or success_rate is None), the entire error rate check is skipped. A comment should document this intent.
| hiring_date=date(2026, 1, 15), | ||
| ) |
There was a problem hiding this comment.
In promotion/conftest.py, the make_agent_identity factory sets hiring_date=date(2026, 1, 15). Since the current date is March 2026, this is a past date which is fine for test purposes. However, it's a hardcoded date rather than a relative date, which means tests that check agent tenure (e.g., time_active_days milestone checks) that use this factory may produce unexpected results if the date is ever relied upon in age-based calculations. This is a minor concern as the current tests don't use hiring_date for duration calculations, but it is worth noting for future maintainability.
| min_criteria_met: int = Field( | ||
| default=2, | ||
| ge=1, | ||
| le=3, | ||
| description="Minimum number of criteria that must be met (max 3)", | ||
| ) |
There was a problem hiding this comment.
The PromotionCriteriaConfig.min_criteria_met field has a hardcoded upper bound of le=3 (line 30 of config.py). This is tied to the current ThresholdEvaluator which has exactly 3 criteria (quality_score, success_rate, tasks_completed). However, the PromotionCriteriaConfig is meant to be used with pluggable PromotionCriteriaStrategy implementations. Future strategies might have more or fewer criteria, making the hardcoded le=3 limit a hidden coupling. The field description says "max 3" explicitly, which codifies this assumption, but this could be a breaking constraint for alternative strategies. Either remove the upper bound from the config or document clearly that this config is specific to ThresholdEvaluator.
| async def test_time_active_days_not_met_blocks_promotion(self) -> None: | ||
| """Milestone requires time_active_days, agent not active long enough.""" | ||
| config = TrustConfig( | ||
| strategy=TrustStrategyType.MILESTONE, | ||
| initial_level=ToolAccessLevel.SANDBOXED, | ||
| milestones={ | ||
| "sandboxed_to_restricted": MilestoneCriteria( | ||
| tasks_completed=5, | ||
| quality_score_min=6.0, | ||
| time_active_days=30, | ||
| ), | ||
| }, | ||
| ) | ||
| strategy = MilestoneTrustStrategy(config=config) | ||
|
|
||
| state = TrustState( | ||
| agent_id=NotBlankStr("agent-001"), | ||
| global_level=ToolAccessLevel.SANDBOXED, | ||
| last_evaluated_at=_NOW - timedelta(days=5), | ||
| ) | ||
| snapshot = make_performance_snapshot( | ||
| "agent-001", | ||
| quality=9.0, | ||
| success_rate=0.99, | ||
| tasks_completed=100, | ||
| ) | ||
|
|
||
| result = await strategy.evaluate( | ||
| agent_id=NotBlankStr("agent-001"), | ||
| current_state=state, | ||
| snapshot=snapshot, | ||
| ) | ||
|
|
||
| assert result.recommended_level == ToolAccessLevel.SANDBOXED | ||
| assert result.should_change is False |
There was a problem hiding this comment.
In milestone_strategy.py, the _check_time_and_history method uses state.last_evaluated_at to calculate days active (lines 215-218), but the docstring and surrounding comment say "Use created_at (agent tenure) rather than last_evaluated_at". The actual code on line 215 checks if state.created_at is None and line 217 uses (now - state.created_at).days, so the implementation is correct. However, the test test_time_active_days_not_met_blocks_promotion in test_milestone_strategy.py (line 340) sets last_evaluated_at=_NOW - timedelta(days=5) but never sets created_at. Since created_at defaults to None and the milestone requires time_active_days=30, the test should expect should_change is False because created_at is None (which returns False at line 215-216 of the strategy). This appears correct, but relies on created_at=None blocking the promotion, not on the 5-day active time. The test name and docstring say "agent not active long enough," which is misleading — the real reason the promotion is blocked in this test is that created_at is None, not that the agent has only been active 5 days. The test passes for the wrong reason and provides a false sense of security around the time-based check. A test with created_at=_NOW - timedelta(days=5) and time_active_days=30 would properly test that the 5-day tenure is insufficient.
| if self._trust_service is not None: | ||
| try: | ||
| snapshot = await self._tracker.get_snapshot(request.agent_id) | ||
| await self._trust_service.evaluate_agent( | ||
| request.agent_id, | ||
| snapshot, | ||
| ) | ||
| except Exception: | ||
| logger.warning( | ||
| PROMOTION_APPLIED, | ||
| agent_id=request.agent_id, | ||
| error="Trust re-evaluation failed after promotion; " | ||
| "promotion still applied", | ||
| ) | ||
|
|
||
| event = ( | ||
| PROMOTION_APPLIED | ||
| if request.direction == PromotionDirection.PROMOTION | ||
| else DEMOTION_APPLIED | ||
| ) | ||
| logger.info( | ||
| event, | ||
| agent_id=request.agent_id, | ||
| old_level=record.old_level.value, | ||
| new_level=record.new_level.value, | ||
| model_changed=record.model_changed, | ||
| ) | ||
|
|
||
| # Notify agent and team — best-effort, must not block the record. | ||
| if self._on_notification is not None: | ||
| try: | ||
| await self._on_notification(record) | ||
| logger.debug( | ||
| PROMOTION_NOTIFICATION_SENT, | ||
| agent_id=request.agent_id, | ||
| direction=request.direction.value, | ||
| ) | ||
| except Exception: | ||
| logger.warning( | ||
| PROMOTION_NOTIFICATION_SENT, | ||
| agent_id=request.agent_id, | ||
| error="Notification callback failed; promotion still applied", | ||
| ) |
There was a problem hiding this comment.
In service.py (promotion), the apply_promotion method at line 469 catches a bare Exception in the trust re-evaluation block. While the comment says this is intentional (best-effort), the bare except Exception: swallows any exception type without logging the actual exception information. This makes debugging difficult. At minimum, the caught exception should be logged with its details (e.g., using exc_info=True or passing the exception message as a structured field). This is similarly true for the notification callback catch on line 499.
| from ai_company.security.trust.levels import ( # noqa: PLC0415 | ||
| TRUST_LEVEL_RANK, | ||
| ) | ||
|
|
||
| is_promotion = TRUST_LEVEL_RANK.get( | ||
| result.recommended_level, 0 | ||
| ) > TRUST_LEVEL_RANK.get(state.global_level, 0) | ||
| state_update: dict[str, object] = { | ||
| "global_level": result.recommended_level, | ||
| "trust_score": result.score, | ||
| } | ||
| if is_promotion: | ||
| state_update["last_promoted_at"] = now | ||
| updated = state.model_copy(update=state_update) | ||
| self._trust_states[key] = updated | ||
| self._change_history.setdefault(key, []).append(record) | ||
|
|
||
| logger.info( | ||
| TRUST_LEVEL_CHANGED, | ||
| agent_id=agent_id, | ||
| old_level=record.old_level.value, | ||
| new_level=record.new_level.value, | ||
| reason=reason.value, | ||
| ) | ||
| return record | ||
|
|
||
| def get_trust_state( | ||
| self, | ||
| agent_id: NotBlankStr, | ||
| ) -> TrustState | None: | ||
| """Get current trust state for an agent. | ||
|
|
||
| Args: | ||
| agent_id: Agent identifier. | ||
|
|
||
| Returns: | ||
| Trust state, or None if not initialized. | ||
| """ | ||
| return self._trust_states.get(str(agent_id)) | ||
|
|
||
| def get_change_history( | ||
| self, | ||
| agent_id: NotBlankStr, | ||
| ) -> tuple[TrustChangeRecord, ...]: | ||
| """Get trust change history for an agent. | ||
|
|
||
| Args: | ||
| agent_id: Agent identifier. | ||
|
|
||
| Returns: | ||
| Tuple of change records. | ||
| """ | ||
| return tuple(self._change_history.get(str(agent_id), [])) | ||
|
|
||
| async def check_decay( | ||
| self, | ||
| agent_id: NotBlankStr, | ||
| snapshot: AgentPerformanceSnapshot, | ||
| ) -> TrustEvaluationResult: | ||
| """Check for trust decay conditions. | ||
|
|
||
| Delegates to evaluate_agent first, then updates the decay | ||
| check timestamp. The ordering ensures that the strategy's | ||
| decay logic sees the *previous* last_decay_check_at value, | ||
| not a freshly-updated one. | ||
|
|
||
| Args: | ||
| agent_id: Agent to check. | ||
| snapshot: Agent performance snapshot. | ||
|
|
||
| Returns: | ||
| Evaluation result (may recommend demotion on decay). | ||
| """ | ||
| result = await self.evaluate_agent(agent_id, snapshot) | ||
|
|
||
| # Update decay check timestamp *after* evaluation | ||
| key = str(agent_id) | ||
| state = self._trust_states.get(key) | ||
| if state is not None: | ||
| now = datetime.now(UTC) | ||
| updated = state.model_copy( | ||
| update={"last_decay_check_at": now}, | ||
| ) | ||
| self._trust_states[key] = updated | ||
|
|
||
| return result | ||
|
|
||
| def _enforce_elevated_gate( | ||
| self, | ||
| result: TrustEvaluationResult, | ||
| ) -> TrustEvaluationResult: | ||
| """Defense-in-depth: force human approval for elevated promotion. | ||
|
|
||
| If a strategy recommends ELEVATED but doesn't flag human | ||
| approval, override it. | ||
| """ | ||
| if ( | ||
| result.recommended_level == ToolAccessLevel.ELEVATED | ||
| and result.current_level != ToolAccessLevel.ELEVATED | ||
| and not result.requires_human_approval | ||
| ): | ||
| logger.warning( | ||
| TRUST_ELEVATED_GATE_ENFORCED, | ||
| agent_id=result.agent_id, | ||
| strategy=result.strategy_name, | ||
| ) | ||
| return result.model_copy( | ||
| update={"requires_human_approval": True}, | ||
| ) | ||
| return result | ||
|
|
||
| async def _create_approval( | ||
| self, | ||
| agent_id: NotBlankStr, | ||
| result: TrustEvaluationResult, | ||
| ) -> None: | ||
| """Create an approval item for trust level promotion.""" | ||
| if self._approval_store is None: | ||
| msg = ( | ||
| f"Cannot create trust approval for agent {agent_id!r}: " | ||
| f"no approval store configured" | ||
| ) | ||
| logger.warning( | ||
| TRUST_APPROVAL_STORE_MISSING, | ||
| agent_id=agent_id, | ||
| recommended=result.recommended_level.value, | ||
| error=msg, | ||
| ) | ||
| raise TrustEvaluationError(msg) | ||
|
|
||
| from ai_company.core.approval import ApprovalItem # noqa: PLC0415 | ||
|
|
||
| now = datetime.now(UTC) | ||
| approval = ApprovalItem( | ||
| id=NotBlankStr(str(uuid4())), | ||
| action_type="trust:promote", | ||
| title=( | ||
| f"Trust promotion: {result.current_level.value} " | ||
| f"-> {result.recommended_level.value}" | ||
| ), | ||
| description=result.details or "Trust level change requested", | ||
| requested_by=agent_id, | ||
| risk_level=ApprovalRiskLevel.HIGH, | ||
| created_at=now, | ||
| metadata={ | ||
| "agent_id": str(agent_id), | ||
| "current_level": result.current_level.value, | ||
| "recommended_level": result.recommended_level.value, | ||
| }, | ||
| ) | ||
| await self._approval_store.add(approval) | ||
|
|
||
| logger.info( | ||
| TRUST_APPROVAL_REQUIRED, | ||
| agent_id=agent_id, | ||
| approval_id=approval.id, | ||
| recommended=result.recommended_level.value, | ||
| ) | ||
|
|
||
| @staticmethod | ||
| def _infer_reason( | ||
| result: TrustEvaluationResult, | ||
| ) -> TrustChangeReason: | ||
| """Infer the change reason from the evaluation result. | ||
|
|
||
| Distinguishes promotions from demotions: demotions use | ||
| TRUST_DECAY, promotions use strategy-specific reasons. | ||
| """ | ||
| from ai_company.security.trust.levels import ( # noqa: PLC0415 | ||
| TRUST_LEVEL_RANK, | ||
| ) | ||
|
|
There was a problem hiding this comment.
In service.py (trust), at line 205, TRUST_LEVEL_RANK is imported inside the method body with a # noqa: PLC0415 comment, and then imported again in _infer_reason at line 373. Similarly, ApprovalItem is imported inside _create_approval at line 335. These deferred imports inside method bodies are unusual Python patterns. They appear to be used here to avoid circular imports, but the module-level if TYPE_CHECKING block at the top already handles several type-only imports. If these imports genuinely cannot be moved to module level due to circular import concerns, a brief explanatory comment would be helpful. If they can be moved (since levels.py and core.approval are not circular with service.py), they should be at the top of the file.
Summary
security/trust/): 4 pluggable strategies (disabled, weighted, per-category, milestone) behindTrustStrategyprotocol, with trust state management, decay detection, re-verification, and defense-in-depth elevated gate enforcementhr/promotion/): criteria evaluation, approval decisions (seniority-based), model mapping, cooldown enforcement, and trust re-evaluation integration behind pluggable protocols (PromotionCriteriaStrategy,PromotionApprovalStrategy,ModelMappingStrategy)Closes #43, closes #49
Changes
Source (17 files)
security/trust/: config models, 4 strategy implementations, service orchestrator, protocol, enums, errors, domain modelshr/promotion/: config models, threshold evaluator, seniority approval strategy, model mapping, service orchestrator, protocols, domain modelsobservability/events/: trust and promotion event constantsTests (10 files, 199 tests in trust+promotion)
Docs
DESIGN_SPEC.md: §15.3 project structure updated withtrust/andpromotion/subsystemsCLAUDE.md: package structure and event constants updatedTest plan
uv run ruff check src/ tests/— all passeduv run ruff format --check src/ tests/— all formatteduv run mypy src/ tests/— 0 errors (806 files)uv run pytest tests/ -n auto --cov=ai_company --cov-fail-under=80— 6399 passed, 94.98% coverageReview coverage
Pre-reviewed by 10 agents (code-reviewer, python-reviewer, pr-test-analyzer, silent-failure-hunter, comment-analyzer, type-design-analyzer, logging-audit, resilience-audit, security-reviewer, docs-consistency). 33 findings addressed, 0 skipped.