Skip to content

research(security): Aethelgard RL-learned dynamic capability governance — minimum viable tool set per task type #3563

@bug-ops

Description

@bug-ops

Description

arXiv:2604.11839 (April 12, 2026) — Beyond Static Sandboxing: Learned Capability Governance for Autonomous AI Agents by Sidik & Rokach (Ben-Gurion University).

Aethelgard is a four-layer adaptive governance framework that addresses capability overprovisioning: today a summarization task receives the same shell, subagent spawning, and credential access as a code deployment task — a measured 15× overprovision ratio. Existing sandboxes (NemoClaw, Cisco DefenseClaw) handle containment and detection but do not learn the minimum viable capability set.

The Aethelgard Framework

  • Layer 1 — Capability Governor: dynamically scopes which tools the agent is aware of per session
  • Layer 2 — RL Learning Policy: trains a PPO policy on the accumulated audit log to learn the minimum viable skill set per task type
  • Layer 3 — Safety Router: intercepts tool calls before execution via a hybrid rule-based + fine-tuned classifier
  • Layer 4 — Audit Logger: records all capability grants and denials to feed the RL loop

Relevance to Zeph

Zeph exposes all registered tools to the LLM on every turn. The Aethelgard approach would:

  1. Narrow the tool namespace visible to the LLM per turn based on task classification (reduce context, reduce attack surface)
  2. Learn the minimum viable tool set over time from the audit log
  3. Reduce prompt injection risk by hiding unneeded tools entirely

Zeph already has zeph-tools audit logging and a ContentSanitizer — these are natural inputs for an RL policy. The tool registry in zeph-core already has a concept of per-tool trust levels that could be extended to dynamic scoping.

Proposed Design Direction

  • Extend ToolExecutor with a scope_for_task(task_type: &str) -> Vec<ToolId> method
  • Wire a lightweight classifier (could be the routing model) to label each turn's task type
  • Accumulate (task_type, tool_id, granted, outcome) tuples in a new SQLite table
  • Train offline or online PPO policy to predict minimum tool set; apply at context build time

Expected Benefit

  • Smaller tool lists → fewer context tokens consumed per turn
  • Reduced prompt injection surface (tools not listed cannot be hijacked)
  • Audit-grounded capability decisions visible in TUI / /status

References

Metadata

Metadata

Assignees

Labels

P3Research — medium-high complexityresearchResearch-driven improvement

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions