-
Notifications
You must be signed in to change notification settings - Fork 8.5k
Closed
Description
Proposal: Semantic Intent Classification Safety Extension for AutoGen
Problem
AutoGen enables powerful multi-agent conversations with flexible orchestration. However, as agents gain autonomy to execute code, call tools, and delegate tasks, there's a growing need for fine-grained action-level safety classification - understanding what an agent is trying to do before allowing it.
Current approaches (token limits, blocklists) are too coarse. Teams need:
- Semantic intent classification - Classify each action into threat categories before execution
- Trust-scored agent interactions - Track and decay trust across multi-turn conversations
- Policy enforcement - Declarative governance policies with event hooks for violations
- Tamper-evident audit trails - Cryptographic proof of what happened during execution
What we've built (Apache-2.0)
Agent-OS includes a production-grade semantic intent classifier:
- 9 threat categories -
destructive,exfiltration,privilege_escalation,resource_abuse,persistence,lateral_movement,reconnaissance,social_engineering,benign - No LLM dependency - Fast, deterministic classification using pattern matching and heuristics
- GovernancePolicy - YAML-based policies with blocked patterns (regex/glob), token limits, tool call limits
- Event hooks -
on(POLICY_VIOLATION, callback)for real-time alerting - Trust scoring - 5-dimension trust model with decay (via AgentMesh)
Proposed integration
An autogen-safety extension that hooks into AutoGen's message handling:
from autogen import ConversableAgent
from autogen_safety import SafetyGuard, GovernancePolicy
policy = GovernancePolicy.load("policy.yaml")
guard = SafetyGuard(policy=policy)
agent = ConversableAgent(
name="coder",
system_message="You write Python code.",
)
guard.protect(agent) # Wraps message handling with safety checks
# Now all agent actions are classified and policy-checked
# Dangerous actions (exfiltration, privilege escalation) are blocked
# All actions are logged to tamper-evident audit chainWhy this matters for AutoGen
- Enterprise readiness - Organizations need safety guarantees before deploying autonomous agents
- Code execution risk - AutoGen's code executor is powerful but needs guardrails against malicious patterns
- Composable - Works with existing AutoGen patterns (group chat, nested chat, tool use)
- Deterministic - No LLM-in-the-loop for safety checks; fast and predictable
- Standards-aligned - Implements CSA's Agentic Trust Framework zero-trust governance model
Ask
Is there interest in this kind of contribution? Options:
- Standalone
autogen-safetypackage using AutoGen's extensibility hooks - PR to AutoGen core adding optional safety middleware
- Example/cookbook showing the integration pattern
Happy to discuss the best approach with maintainers.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels