research(security): Aethelgard RL-learned dynamic capability governance — minimum viable tool set per task type

## Description

arXiv:2604.11839 (April 12, 2026) — *Beyond Static Sandboxing: Learned Capability Governance for Autonomous AI Agents* by Sidik & Rokach (Ben-Gurion University).

Aethelgard is a four-layer adaptive governance framework that addresses capability overprovisioning: today a summarization task receives the same shell, subagent spawning, and credential access as a code deployment task — a measured 15× overprovision ratio. Existing sandboxes (NemoClaw, Cisco DefenseClaw) handle containment and detection but do not learn the minimum viable capability set.

## The Aethelgard Framework

- **Layer 1 — Capability Governor**: dynamically scopes which tools the agent is aware of per session
- **Layer 2 — RL Learning Policy**: trains a PPO policy on the accumulated audit log to learn the minimum viable skill set per task type
- **Layer 3 — Safety Router**: intercepts tool calls before execution via a hybrid rule-based + fine-tuned classifier
- **Layer 4 — Audit Logger**: records all capability grants and denials to feed the RL loop

## Relevance to Zeph

Zeph exposes all registered tools to the LLM on every turn. The Aethelgard approach would:
1. Narrow the tool namespace visible to the LLM per turn based on task classification (reduce context, reduce attack surface)
2. Learn the minimum viable tool set over time from the audit log
3. Reduce prompt injection risk by hiding unneeded tools entirely

Zeph already has `zeph-tools` audit logging and a `ContentSanitizer` — these are natural inputs for an RL policy. The tool registry in `zeph-core` already has a concept of per-tool trust levels that could be extended to dynamic scoping.

## Proposed Design Direction

- Extend `ToolExecutor` with a `scope_for_task(task_type: &str) -> Vec<ToolId>` method
- Wire a lightweight classifier (could be the routing model) to label each turn's task type
- Accumulate (task_type, tool_id, granted, outcome) tuples in a new SQLite table
- Train offline or online PPO policy to predict minimum tool set; apply at context build time

## Expected Benefit

- Smaller tool lists → fewer context tokens consumed per turn
- Reduced prompt injection surface (tools not listed cannot be hijacked)
- Audit-grounded capability decisions visible in TUI / `/status`

## References

- Paper: https://arxiv.org/abs/2604.11839
- Related: #2417 (formal security model), #2420 (MCP tool trust metadata)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

research(security): Aethelgard RL-learned dynamic capability governance — minimum viable tool set per task type #3563

Description

The Aethelgard Framework

Relevance to Zeph

Proposed Design Direction

Expected Benefit

References

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

research(security): Aethelgard RL-learned dynamic capability governance — minimum viable tool set per task type #3563

Description

Description

The Aethelgard Framework

Relevance to Zeph

Proposed Design Direction

Expected Benefit

References

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions