-
Notifications
You must be signed in to change notification settings - Fork 0
Implement progressive trust system with pluggable TrustStrategy protocol (DESIGN_SPEC §11.3) #43
Description
Context
Implement the progressive trust system defined in DESIGN_SPEC §11.3. Agents can earn higher tool access over time through configurable trust strategies. The trust system implements a TrustStrategy protocol, making it extensible — new strategies can be added without modifying existing ones.
Trust Strategies (§11.3)
Strategy 1: Disabled (Static Access)
Trust is disabled. Agents receive their configured access level at hire time and it never changes. Useful when the human manages permissions manually.
Strategy 2: Weighted Score (Single Track)
A single trust score computed from weighted factors: task difficulty completed, error rate, time active, and human feedback. One global trust level per agent. Human approval required for promotion to elevated regardless of score.
Strategy 3: Per-Category Trust Tracks
Separate trust tracks per tool category (filesystem, git, deployment, database, network). An agent can be "standard" for files but "sandboxed" for deployment. Human approval gate required for any production-touching category.
Strategy 4: Milestone Gates (ATF-Inspired)
Explicit capability milestones aligned with the Cloud Security Alliance Agentic Trust Framework. Automated promotion for low-risk levels. Human approval gates for elevated access. Trust is time-bound and subject to periodic re-verification — trust decays if the agent is idle for extended periods or error rate increases.
Acceptance Criteria
Protocol Interface
-
TrustStrategyprotocol defined with standard operations (evaluate, promote, demote, check_access) - Strategy selection configurable via YAML (
trust.strategy: "disabled" | "weighted" | "per_category" | "milestone") - New strategies addable without modifying existing ones
Implementations
- Disabled strategy — static access levels, no trust tracking
- Weighted strategy — single score, weighted factors, promotion thresholds
- Per-category strategy — per-tool-category trust tracks, separate promotion criteria
- Milestone strategy — explicit milestones, auto-promote for low-risk, human gates for elevated, trust decay + re-verification
Common Requirements
- Trust level tracking per agent (sandboxed, restricted, standard, elevated)
- Human approval required for promotion to elevated (all strategies except disabled)
- Trust level determines available tools (tool access matrix)
- Demotion support (trust can be reduced by human or on policy violation)
- Trust level change audit trail
- Unit tests for each strategy and promotion path (>80% coverage)
Dependencies
- Implement Security Operations agent (action validation, audit logging) #40 — SecOps agent for policy enforcement
- Tool execution system for access control
Design Spec Reference
- §11.3 — Progressive Trust (4 strategies behind
TrustStrategyprotocol)
Updated 2026-03-06: Rewritten to reflect DESIGN_SPEC §11.3 expansion from simple 3-level progression to 4 pluggable strategies behind
TrustStrategyprotocol.
Design Decisions Finalized
- D2 — Quality Scoring: Pluggable
QualityScoringStrategyprotocol. Initial: layered combination (Layer 1: CI signals free, Layer 2: LLM judge, Layer 3: human override). Start with Layer 1 only. - D3 — Collaboration Scoring: Pluggable
CollaborationScoringStrategyprotocol. Initial: automated behavioral telemetry (delegation_success_rate, response_latency, conflict_resolution_constructiveness, meeting_contribution_rate, loop_prevention_score, handoff_completeness).
Common pattern: All strategies use pluggable protocol interfaces with one initial implementation. Alternative strategies are documented in DESIGN_SPEC.md for future.