feat(security): harden against agent sandbox escape vectors#297
feat(security): harden against agent sandbox escape vectors#297imran-siddique merged 3 commits intomicrosoft:mainfrom
Conversation
🤖 AI Agent: docs-sync-checker📝 Documentation Sync ReportIssues Found
Suggestions
Additional Notes
SummaryThe PR introduces several new public APIs and features that require documentation updates and additional examples. Once the suggested changes are made, the documentation will be in sync. Let me know if you need further assistance! |
There was a problem hiding this comment.
🤖 AI Agent: code-reviewer
Review Summary
This pull request introduces significant security enhancements to the microsoft/agent-governance-toolkit repository, addressing critical sandbox escape vectors and improving the robustness of the Policy Engine and Tool Registry. The changes are well-documented, include extensive testing, and appear to be backward-compatible. However, there are a few areas that require attention, particularly around cryptographic operations, thread safety, and potential edge cases.
🔴 CRITICAL
-
Content Hashing for Tool Integrity Verification
- Issue: The
_compute_handler_hashmethod inToolRegistryfalls back to hashing bytecode if the source code is unavailable. However, this fallback mechanism is not secure because bytecode can be manipulated by an attacker to bypass integrity checks. - Recommendation: If the source code is unavailable, the tool registration should fail with an error instead of falling back to bytecode hashing. This ensures that only tools with verifiable source code are registered.
- Issue: The
-
Mutation Audit Log Integrity
- Issue: The
_mutation_loginPolicyEngineis a mutable list, which could be tampered with at runtime, potentially allowing an attacker to erase evidence of unauthorized mutations. - Recommendation: Use an immutable data structure (e.g.,
tupleorNamedTuple) for the mutation log entries and store the log in an immutable container (e.g.,tupleorcollections.dequewith a fixed maximum length). Additionally, consider cryptographically signing each log entry to ensure integrity.
- Issue: The
-
Adversarial Attack Scenarios
- Issue: The adversarial attack scenarios in
maf_governance_demo.pyrely on middleware to block malicious actions. However, theMiddlewareTerminationexception is used to signal blocked actions, which could be caught and suppressed by an attacker. - Recommendation: Ensure that
MiddlewareTerminationcannot be suppressed by malicious code. Consider using a more robust mechanism, such as terminating the process or using a non-overridable exception.
- Issue: The adversarial attack scenarios in
🟡 WARNING
-
Backward Compatibility
- Observation: The
freeze()method inPolicyEngineintroduces a new behavior that could potentially break existing code if it relies on mutating the policy engine after initialization. - Recommendation: Clearly document this change in the release notes and provide guidance on how to adapt existing code to this new behavior.
- Observation: The
-
Logging
- Observation: The new logging statements in
PolicyEngineandToolRegistryuse theloggingmodule but do not include any configuration for log levels or handlers. - Recommendation: Ensure that logging is properly configured in the application to avoid unintentional information leakage in production environments.
- Observation: The new logging statements in
💡 SUGGESTIONS
-
Testing Coverage
- The new tests in
test_security_hardening.pyare a great addition. However, consider adding tests for edge cases, such as:- Attempting to register a tool with a handler that has no source code (e.g., a built-in function).
- Attempting to mutate a frozen
PolicyEnginein a multithreaded environment to ensure thread safety. - Verifying that the
mutation_logis immutable and cannot be tampered with.
- The new tests in
-
Documentation
- The new
freeze()method and its implications should be prominently documented in theREADME.mdand other relevant documentation files. This will help users understand how to use this feature effectively and avoid potential pitfalls. - The
Security Model & Limitationssection in theREADME.mdis a valuable addition. Consider linking to specific examples or tests that demonstrate how to mitigate the limitations mentioned.
- The new
-
Thread Safety
- While the
PolicyEnginenow includes a_frozenflag and mutation logging, there is no explicit locking mechanism to ensure thread safety when accessing or modifying shared state (e.g.,self._frozen,self._mutation_log,self.state_permissions, etc.). - Recommendation: Use a threading lock (e.g.,
threading.Lock) to synchronize access to shared state inPolicyEngine.
- While the
-
Error Handling in
ToolRegistry- The
_compute_handler_hashmethod currently has a broadexcept Exceptionclause, which could mask unexpected errors. - Recommendation: Narrow the exception handling to specific exceptions (e.g.,
OSError,TypeError) and log any unexpected exceptions for debugging.
- The
-
QuorumConfig and Fatigue Detection
- The
QuorumConfigand fatigue detection mechanisms inEscalationHandlerare not fully detailed in the PR description or code diff. Ensure that these features are thoroughly tested and documented, as they are critical for preventing approval fatigue attacks.
- The
Conclusion
The proposed changes significantly improve the security posture of the microsoft/agent-governance-toolkit repository. However, the identified critical issues must be addressed to ensure the robustness of the new features. Additionally, the warnings and suggestions should be considered to further enhance the quality and maintainability of the code.
🤖 AI Agent: breaking-change-detector🔍 API Compatibility ReportSummaryThis pull request introduces several new features and security enhancements to the Findings
Migration GuideFor
|
🤖 AI Agent: test-generator🧪 Test Coverage Analysis
|
🤖 AI Agent: security-scannerSecurity Review of PR: feat(security): harden against agent sandbox escape vectorsThis PR introduces several security enhancements to the 1. Tool Content HashingRisk Addressed: Tool aliasing/wrapping attacks Potential Issues:
Recommendation:
Rating: 🔵 LOW 2. PolicyEngine FreezeRisk Addressed: Runtime self-modification of policies Potential Issues:
Recommendation:
Rating: 🟠 HIGH 3. Approval Quorum & Fatigue DetectionRisk Addressed: Approval fatigue and unauthorized escalation Potential Issues:
Recommendation:
Rating: 🟡 MEDIUM 4. Adversarial Scenario TestingRisk Addressed: Prompt injection, tool alias bypass, trust score manipulation, SQL injection Potential Issues:
Recommendation:
Rating: 🔵 LOW 5. Dependency UpdateRisk Addressed: Supply chain vulnerabilities Potential Issues:
Recommendation:
Rating: 🔵 LOW 6. Logging and Audit EnhancementsRisk Addressed: Lack of visibility into security-relevant events Potential Issues:
Recommendation:
Rating: 🟡 MEDIUM 7. General Observations
Summary of Findings
Final RecommendationThis PR introduces critical security improvements to the |
🤖 AI Agent: docs-sync-checker📝 Documentation Sync ReportIssues Found
Suggestions
Additional Notes
Final VerdictDocumentation is not in sync. Please address the issues and suggestions above. |
🤖 AI Agent: breaking-change-detector🔍 API Compatibility ReportSummaryThis pull request introduces several new features and security enhancements to the Findings
Migration GuideFor
|
🤖 AI Agent: test-generator🧪 Test Coverage Analysis
|
There was a problem hiding this comment.
🤖 AI Agent: code-reviewer
Review Summary
This pull request introduces significant security enhancements to the microsoft/agent-governance-toolkit repository, addressing critical sandbox escape vectors. The changes include:
- Tool Content Hashing: Ensures tool integrity by hashing the source code at registration and verifying it before execution.
- PolicyEngine Freeze: Prevents runtime self-modification by making the policy engine immutable after initialization.
- Approval Quorum & Fatigue Detection: Adds quorum-based approval and fatigue detection to mitigate approval fatigue attacks.
The PR also includes comprehensive testing, documentation updates, and backward compatibility considerations.
🔴 CRITICAL Issues
-
Content Hashing Bypass via Dynamic Code Execution
- Issue: The
Toolclass computes a SHA-256 hash of the tool handler's source code at registration time. However, this does not account for dynamic code execution or runtime modifications (e.g.,exec,eval, or dynamic imports). An attacker could potentially modify the behavior of the tool at runtime without altering the source code. - Recommendation: Consider implementing runtime integrity checks, such as monitoring for dynamic code execution or using a restricted execution environment (e.g.,
RestrictedPythonorPyPysandboxing). Additionally, ensure that the hash includes all dependencies and external files that the tool relies on.
- Issue: The
-
Insufficient Logging for Critical Events
- Issue: While the
PolicyEnginelogs mutations and integrity violations, the logs may not be sufficient for forensic analysis in case of a security incident. - Recommendation: Enhance logging to include more contextual information, such as the user/agent ID, IP address, and a unique request identifier. Ensure logs are immutable and stored securely.
- Issue: While the
-
Potential Race Condition in
PolicyEngine- Issue: The
PolicyEngine's mutation log (_mutation_log) is a mutable list, and access to it is not thread-safe. Concurrent mutations could lead to race conditions. - Recommendation: Use a thread-safe data structure (e.g.,
queue.Queueorcollections.dequewith a lock) for_mutation_log. Ensure all read/write operations are synchronized.
- Issue: The
🟡 WARNING: Potential Breaking Changes
-
PolicyEngine Freeze
- Issue: The
freeze()method makes thePolicyEngineimmutable, which could break existing workflows that rely on dynamic policy updates. - Recommendation: Clearly document this change in the release notes and provide guidance on how users can adapt their workflows. Consider adding a warning log when
freeze()is called to alert users during development.
- Issue: The
-
Tool Content Hashing
- Issue: The addition of the
content_hashattribute to theToolclass changes its schema. While this is backward-compatible in terms of runtime behavior, it may break integrations or tests that rely on the previous schema. - Recommendation: Highlight this change in the release notes and provide migration guidance for users who serialize or deserialize
Toolobjects.
- Issue: The addition of the
💡 Suggestions for Improvement
-
Enhanced Adversarial Testing
- The adversarial scenarios are a great addition. Consider expanding the test coverage to include:
- Memory corruption attacks (e.g., buffer overflows).
- File system attacks (e.g., symlink traversal, unauthorized file access).
- Network-based attacks (e.g., DNS rebinding, SSRF).
- Privilege escalation attempts.
- The adversarial scenarios are a great addition. Consider expanding the test coverage to include:
-
Documentation
- The new
freeze()method andcontent_hashfeature are well-documented in the code, but the README and CHANGELOG could provide more detailed examples and use cases. - The "Security Model & Limitations" section is a valuable addition. Consider linking to external resources for implementing OS-level isolation (e.g., Docker, Kubernetes).
- The new
-
Testing
- The new tests in
test_security_hardening.pyare comprehensive. However, consider adding stress tests to evaluate the performance impact of the new features (e.g., hashing, quorum checks, fatigue detection) under high load.
- The new tests in
-
Error Handling
- The
RuntimeErrorraised by_assert_mutablecould include more specific error messages to help developers understand the issue and how to resolve it.
- The
-
Logging Levels
- The logging for
PolicyEnginemutations and integrity violations is currently set toinfoandwarning. Consider using a dedicated logging level or adding more granular log levels to differentiate between normal operations and potential security incidents.
- The logging for
Final Assessment
This PR addresses critical security concerns and introduces robust mechanisms to prevent sandbox escape vectors. However, the identified issues related to content hashing, logging, and thread safety must be addressed to ensure the effectiveness and reliability of the proposed changes. Additionally, the potential breaking changes should be clearly communicated to users to facilitate a smooth transition.
🤖 AI Agent: security-scannerSecurity Review of Pull Request: feat(security): harden against agent sandbox escape vectorsThis pull request introduces several security enhancements aimed at hardening the agent governance toolkit against sandbox escape vectors. Below is a detailed security analysis of the changes, categorized by potential vulnerabilities and their mitigations. 1. Prompt Injection Defense BypassAnalysis:The Findings:
Severity: 🟡 MEDIUMRecommendation:
2. Policy Engine CircumventionAnalysis:The Findings:
Severity: 🟠 HIGHRecommendation:
3. Trust Chain WeaknessesAnalysis:No changes in this PR directly affect the trust chain (e.g., SPIFFE/SVID validation or certificate pinning). However, the Findings:
Severity: 🟡 MEDIUMRecommendation:
4. Credential ExposureAnalysis:No sensitive credentials or secrets are exposed in the changes. The logging of mutation attempts in the Findings:
Severity: 🔵 LOWRecommendation:
5. Sandbox EscapeAnalysis:The PR addresses sandbox escape vectors through the following mechanisms:
Findings:
Severity: 🟠 HIGHRecommendation:
6. Deserialization AttacksAnalysis:No changes in this PR involve deserialization of untrusted data. The use of Findings:
Severity: 🔵 LOWRecommendation:
7. Race ConditionsAnalysis:The Findings:
Severity: 🟠 HIGHRecommendation:
8. Supply ChainAnalysis:The PR updates the Findings:
Severity: 🔵 LOWRecommendation:
Summary of Findings:
Final Recommendations:
These changes significantly improve the security posture of the agent governance toolkit, but the identified issues should be addressed to ensure comprehensive protection. |
Document the 3 sandbox escape defenses with usage examples: - Tool content hashing with ToolRegistry and ContentHashInterceptor - PolicyEngine.freeze() with MappingProxyType immutability - Approval quorum (QuorumConfig) and fatigue detection Addresses docs-sync-checker feedback on PR microsoft#297. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
🤖 AI Agent: docs-sync-checker📝 Documentation Sync ReportIssues Found
Suggestions
Action Items
Once these issues are addressed, the documentation will be in sync with the code changes. Let me know if you need help drafting the docstrings or updating the documentation! |
🤖 AI Agent: test-generator🧪 Test Coverage Analysis
|
🤖 AI Agent: security-scannerSecurity Review of PR:
|
There was a problem hiding this comment.
🤖 AI Agent: code-reviewer
Code Review for PR: feat(security): harden against agent sandbox escape vectors
This PR introduces several critical security enhancements to the microsoft/agent-governance-toolkit repository, focusing on sandbox escape prevention, policy engine immutability, and approval fatigue defenses. Below is a detailed review of the changes, with actionable feedback.
🔴 CRITICAL: Security Issues
1. Potential Bypass of PolicyEngine.freeze()
- Issue: While
freeze()makes thePolicyEngineimmutable by replacing mutable data structures withMappingProxyType, it does not protect against direct manipulation of thePolicyEngineobject itself. For example, a malicious actor could directly modify the_frozenattribute or replace theMappingProxyTypeobjects with mutable ones. - Impact: This could allow an attacker to bypass the immutability guarantees of the
PolicyEngine, re-enable mutation, and modify policies at runtime. - Recommendation:
- Make
_frozena private attribute and enforce its immutability by using a property with no setter. - Use
@propertydecorators forstate_permissions,agent_contexts, andconditional_permissionsto ensure they are always accessed asMappingProxyTypeobjects. - Consider using a metaclass or
__setattr__override to prevent attribute reassignment.
- Make
@property
def state_permissions(self):
return self._state_permissions
@state_permissions.setter
def state_permissions(self, value):
raise AttributeError("state_permissions is immutable once frozen.")2. Audit Log Integrity
- Issue: The
_mutation_logis a mutable list, which could be tampered with by an attacker to erase evidence of unauthorized operations. - Impact: This undermines the integrity of the audit log, which is critical for security and compliance.
- Recommendation:
- Use an immutable data structure (e.g.,
tuple) for the_mutation_logor store the log in an append-only external system. - If the log must remain mutable, ensure that access is protected by a lock to prevent race conditions in concurrent environments.
- Use an immutable data structure (e.g.,
3. Content Hashing: Lack of Replay Attack Protection
- Issue: The
ContentHashInterceptorverifies the integrity of tools using a SHA-256 hash. However, there is no mechanism to prevent an attacker from replacing a malicious tool with a valid one temporarily during verification and then swapping it back. - Impact: This could allow an attacker to bypass the content hashing mechanism.
- Recommendation:
- Implement a mechanism to periodically re-verify the integrity of tools at runtime.
- Consider using a secure storage mechanism (e.g., HSM or TPM) to store the hashes and prevent tampering.
4. Concurrency Issues in PolicyEngine
- Issue: The
PolicyEngineis not thread-safe. Concurrent calls to methods likeadd_constraintoradd_conditional_permissioncould lead to race conditions, especially in multi-threaded agent execution environments. - Impact: This could result in inconsistent policy states, potentially leading to security vulnerabilities.
- Recommendation:
- Use threading locks (e.g.,
threading.Lock) to ensure thread safety for all methods that modify shared state. - Alternatively, consider using a thread-safe data structure like
collections.defaultdictwiththreading.Lock.
- Use threading locks (e.g.,
🟡 WARNING: Potential Breaking Changes
1. Behavior of PolicyEngine.freeze()
- Issue: The
freeze()method introduces a new behavior that may break existing code if developers are not aware of its implications. For example, existing code that modifies thePolicyEngineafter initialization will now raise aRuntimeError. - Impact: This could cause unexpected runtime errors in existing deployments.
- Recommendation:
- Clearly document the behavior of
freeze()in the release notes and provide migration guidance. - Consider adding a warning log when
freeze()is called, to alert developers during the transition period.
- Clearly document the behavior of
💡 Suggestions for Improvement
1. Enhanced Testing for Adversarial Scenarios
- Observation: The
scenario_adversarial_attacksfunction tests several attack vectors, but the coverage could be expanded. - Suggestion:
- Add tests for additional attack vectors, such as:
- Memory corruption or buffer overflow attempts.
- Exploitation of race conditions in concurrent environments.
- Attempts to bypass
MappingProxyTypeimmutability.
- Use fuzz testing to identify edge cases and unexpected behaviors.
- Add tests for additional attack vectors, such as:
2. Logging Improvements
- Observation: The logging in
PolicyEngineis minimal and does not provide sufficient context for debugging. - Suggestion:
- Include more detailed information in log messages, such as the specific agent or tool involved in a mutation attempt.
- Use structured logging (e.g., JSON format) to make logs easier to parse and analyze.
3. Documentation Enhancements
- Observation: The documentation provides a good overview of the new features but could benefit from more detailed examples and usage scenarios.
- Suggestion:
- Add a dedicated section in the documentation for "Security Hardening" with detailed examples for each new feature.
- Include a migration guide for existing users to adopt the new features.
4. Backward Compatibility
- Observation: While the PR claims to be backward-compatible, the introduction of
freeze()and other changes could have unintended side effects. - Suggestion:
- Add a compatibility test suite to verify that existing functionality is not broken.
- Consider releasing this as a minor version update (e.g., 2.3.0) to indicate new features and potential changes in behavior.
5. Performance Considerations
- Observation: The
ContentHashInterceptoradds an additional layer of verification, which could impact performance in high-throughput environments. - Suggestion:
- Benchmark the performance impact of
ContentHashInterceptorand provide guidance on when to enable/disable it. - Consider making the hash verification frequency configurable.
- Benchmark the performance impact of
✅ Positive Highlights
-
Comprehensive Security Enhancements:
- The PR addresses three critical sandbox escape vectors: tool aliasing, runtime self-modification, and approval fatigue.
- The use of SHA-256 for content hashing and
MappingProxyTypefor immutability is a strong design choice.
-
Thorough Testing:
- 33 new tests were added, covering both functional and adversarial scenarios.
- The
scenario_adversarial_attacksfunction is a great addition for testing the resilience of the governance stack.
-
Backward Compatibility:
- The PR ensures that all new features are optional, minimizing the risk of breaking existing functionality.
-
Improved Documentation:
- The README and SECURITY.md files have been updated with detailed explanations of the new features and their security implications.
Summary of Recommendations
- Critical: Address potential bypass of
PolicyEngine.freeze()by hardening the implementation against direct attribute manipulation. - Critical: Protect the
_mutation_logagainst tampering and ensure its integrity. - Critical: Add replay attack protection for the
ContentHashInterceptor. - Critical: Ensure thread safety in
PolicyEngineby using locks or thread-safe data structures. - Warning: Document the potential breaking changes introduced by
freeze()and provide migration guidance. - Suggestion: Expand adversarial scenario testing to cover additional attack vectors.
- Suggestion: Enhance logging for better debugging and auditability.
- Suggestion: Add a compatibility test suite to verify backward compatibility.
- Suggestion: Benchmark the performance impact of
ContentHashInterceptorand make its frequency configurable.
This PR is a significant step forward in hardening the agent governance toolkit against critical security threats. Addressing the identified issues and incorporating the suggested improvements will further enhance the robustness and reliability of the system.
🤖 AI Agent: breaking-change-detector🔍 API Compatibility ReportSummaryThe pull request introduces several new features aimed at enhancing security, including immutability for the Findings
Migration GuideFor
|
Address 3 critical gaps identified in Ona/Veto agent security research:
1. Tool content hashing (defeats tool aliasing/wrapping attacks):
- ToolRegistry now computes SHA-256 hash of handler source at registration
- execute_tool() verifies integrity before execution, blocks on mismatch
- New ContentHashInterceptor in base.py for intercept-level hash verification
- Integrity violation audit log with get_integrity_violations()
2. PolicyEngine freeze (prevents runtime self-modification):
- New freeze() method makes engine immutable after initialization
- add_constraint, set/update_agent_context, add_conditional_permission
all raise RuntimeError when frozen
- Full mutation audit log records all operations (allowed and blocked)
- is_frozen property for inspection
3. Approval quorum and fatigue detection (defeats approval fatigue):
- New QuorumConfig dataclass for M-of-N approval requirements
- EscalationHandler supports quorum-based vote counting
- Fatigue detection: auto-DENY when agent exceeds escalation rate threshold
- Per-agent rate tracking with configurable window and threshold
- EscalationRequest.votes field tracks individual approver votes
All changes are backward-compatible: new parameters are optional with
defaults that preserve existing behavior. 33 new tests, 53 total pass.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- PolicyEngine.freeze() now converts dicts to MappingProxyType/frozenset for true immutability (not just boolean guard) — addresses HIGH finding - Removed insecure bytecode fallback from _compute_handler_hash; returns empty string with warning for unverifiable handlers — addresses CRITICAL - Added CHANGELOG entries for all new security features - Added 2 new tests: frozen dicts are immutable proxies, permissions are frozensets 55 tests pass (20 existing + 35 new). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Document the 3 sandbox escape defenses with usage examples: - Tool content hashing with ToolRegistry and ContentHashInterceptor - PolicyEngine.freeze() with MappingProxyType immutability - Approval quorum (QuorumConfig) and fatigue detection Addresses docs-sync-checker feedback on PR microsoft#297. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
🤖 AI Agent: breaking-change-detector🔍 API Compatibility ReportSummaryThis pull request introduces several new features aimed at hardening security against sandbox escape vectors. It adds new classes, methods, and parameters while maintaining backward compatibility. No breaking changes were identified in the diff. Findings
Migration GuideNo migration steps are necessary as no breaking changes were detected. Downstream users can adopt new features without modifying existing code. Conclusion✅ No breaking changes detected. All changes are additive and backward-compatible. |
🤖 AI Agent: docs-sync-checker📝 Documentation Sync ReportIssues Found
Suggestions
README
CHANGELOG
Example Code
Type Hints
Final AssessmentThe PR introduces significant new functionality for security hardening, and the documentation has been updated in the README and CHANGELOG. However, several new public APIs are missing detailed docstrings, and the Please address the missing docstrings and consider adding an example for |
There was a problem hiding this comment.
🤖 AI Agent: code-reviewer
Review Summary
This PR introduces significant security hardening measures to mitigate sandbox escape vectors in the microsoft/agent-governance-toolkit. The changes address three critical areas: tool content hashing, policy engine immutability, and approval fatigue defenses. These updates are well-documented, include extensive test coverage, and appear to be backward-compatible. However, there are a few areas that require attention to ensure robustness, security, and maintainability.
🔴 CRITICAL: Security Issues
-
Content Hashing for Built-in or C-extension Functions
- The
_compute_handler_hashmethod inToolRegistryreturns an empty string for handlers whose source code is unavailable (e.g., built-in or C-extension functions). This creates a potential security gap, as such tools will be treated as unverifiable but still allowed to execute. - Recommendation: Explicitly disallow tools with unverifiable handlers unless explicitly marked as "trusted" during registration. This will prevent attackers from exploiting this gap to bypass integrity checks.
def _compute_handler_hash(handler: Callable) -> str: ... if not source: raise ValueError( f"Cannot compute source hash for handler {handler} — " "source unavailable (built-in or C-extension). " "Explicitly mark this tool as trusted if necessary." )
- The
-
Content Hash Interceptor Metadata Dependency
- The
ContentHashInterceptorrelies on thecontent_hashfield in theToolCallRequest.metadatato verify integrity. If this metadata is tampered with or omitted, the interceptor will fail to verify the tool's integrity. - Recommendation: Instead of relying on metadata, the interceptor should fetch the tool's handler directly from the
ToolRegistryand compute the hash dynamically. This ensures that the integrity check cannot be bypassed by manipulating metadata.
- The
-
Policy Engine Freeze Bypass via Direct Attribute Access
- While the
freeze()method replaces mutable dictionaries withMappingProxyTypeto enforce immutability, this protection can be bypassed if an attacker directly modifies the underlying_toolsor_tools_by_typeattributes. - Recommendation: Make these attributes private (e.g.,
_tools→__tools) and enforce immutability checks in all methods that modify them. Alternatively, use a more robust immutability library likeimmutables.
- While the
🟡 WARNING: Potential Breaking Changes
-
Policy Engine Freeze Behavior
- The
freeze()method introduces a new state to thePolicyEnginethat prevents further mutations. While this is a valuable security feature, it changes the behavior of the public API. Any existing code that relies on mutating thePolicyEngineafter initialization will now raise aRuntimeError. - Recommendation: Clearly document this behavior in the release notes and consider adding a deprecation warning in a prior release to give users time to adapt.
- The
-
Tool Content Hashing
- The addition of content hashing to
ToolRegistrymay cause issues for existing tools that cannot have their handlers hashed (e.g., built-in or C-extension functions). This could break workflows that rely on such tools. - Recommendation: Provide a migration guide for users to handle these cases, such as allowing them to mark certain tools as "trusted" during registration.
- The addition of content hashing to
💡 Suggestions for Improvement
-
Logging Enhancements
- The logging messages in the
PolicyEngineandToolRegistryare helpful but could benefit from additional context, such as the agent ID or request ID, to aid in debugging and auditing. - Recommendation: Include contextual information in log messages to make them more actionable. For example:
logger.warning( "Blocked mutation '%s' on frozen PolicyEngine by agent '%s'", operation, agent_id )
- The logging messages in the
-
Test Coverage
- While the PR includes 33 new tests, it is unclear if all edge cases are covered, especially for the new
freeze()andContentHashInterceptorfeatures. - Recommendation: Add tests for the following scenarios:
- Attempting to modify a frozen
PolicyEnginevia direct attribute access. - Registering tools with handlers that cannot be hashed (e.g., built-in functions).
- Verifying tool integrity when the
content_hashmetadata is missing or tampered with.
- Attempting to modify a frozen
- While the PR includes 33 new tests, it is unclear if all edge cases are covered, especially for the new
-
Approval Fatigue Detection
- The fatigue detection mechanism in
EscalationHandleris a valuable addition, but it currently uses a fixed time window (fatigue_window_seconds) for rate tracking. This could lead to edge cases where agents exploit the reset of the window to bypass the fatigue detection. - Recommendation: Use a sliding window or token bucket algorithm for more robust rate limiting.
- The fatigue detection mechanism in
-
Documentation
- The new features are well-documented in the
README.md, but thefreeze()method's behavior and implications are not fully explained. - Recommendation: Add a dedicated section to the documentation explaining the
freeze()method, its use cases, and its impact on existing workflows.
- The new features are well-documented in the
Final Assessment
This PR addresses critical security concerns and introduces robust mechanisms to mitigate sandbox escape vectors. However, there are a few critical issues that need to be addressed before merging, particularly around content hashing and policy engine immutability. Additionally, the potential breaking changes should be clearly communicated to users to ensure a smooth transition.
Action Items:
- Address the critical security issues related to content hashing and policy engine immutability.
- Add tests for edge cases and potential bypass scenarios.
- Enhance logging and documentation for better usability and debugging.
- Consider the impact of breaking changes and provide a migration guide if necessary.
Once these issues are resolved, this PR will be a strong step forward in improving the security posture of the microsoft/agent-governance-toolkit.
🤖 AI Agent: test-generator🧪 Test Coverage Analysis
|
🤖 AI Agent: security-scannerSecurity Review of PR: feat(security): harden against agent sandbox escape vectorsThis pull request introduces several security hardening measures to address sandbox escape vectors, including tool content hashing, policy engine immutability, and quorum-based approval mechanisms. Below is a detailed security review of the changes. 1. Tool Content HashingSummary:
Findings:
2. PolicyEngine FreezeSummary:
Findings:
3. Approval Quorum & Fatigue DetectionSummary:
Findings:
4. General ObservationsLogging:
Dependency Management:
Summary of Findings
Final Recommendation:While this PR introduces critical security improvements, the identified issues must be addressed before merging. The most pressing concerns are the hash bypass for unverifiable handlers and the potential race condition in the |
* feat(security): harden against agent sandbox escape vectors
Address 3 critical gaps identified in Ona/Veto agent security research:
1. Tool content hashing (defeats tool aliasing/wrapping attacks):
- ToolRegistry now computes SHA-256 hash of handler source at registration
- execute_tool() verifies integrity before execution, blocks on mismatch
- New ContentHashInterceptor in base.py for intercept-level hash verification
- Integrity violation audit log with get_integrity_violations()
2. PolicyEngine freeze (prevents runtime self-modification):
- New freeze() method makes engine immutable after initialization
- add_constraint, set/update_agent_context, add_conditional_permission
all raise RuntimeError when frozen
- Full mutation audit log records all operations (allowed and blocked)
- is_frozen property for inspection
3. Approval quorum and fatigue detection (defeats approval fatigue):
- New QuorumConfig dataclass for M-of-N approval requirements
- EscalationHandler supports quorum-based vote counting
- Fatigue detection: auto-DENY when agent exceeds escalation rate threshold
- Per-agent rate tracking with configurable window and threshold
- EscalationRequest.votes field tracks individual approver votes
All changes are backward-compatible: new parameters are optional with
defaults that preserve existing behavior. 33 new tests, 53 total pass.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* fix(security): address PR review feedback on sandbox hardening
- PolicyEngine.freeze() now converts dicts to MappingProxyType/frozenset
for true immutability (not just boolean guard) — addresses HIGH finding
- Removed insecure bytecode fallback from _compute_handler_hash; returns
empty string with warning for unverifiable handlers — addresses CRITICAL
- Added CHANGELOG entries for all new security features
- Added 2 new tests: frozen dicts are immutable proxies, permissions are
frozensets
55 tests pass (20 existing + 35 new).
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* docs: add security hardening section to README
Document the 3 sandbox escape defenses with usage examples:
- Tool content hashing with ToolRegistry and ContentHashInterceptor
- PolicyEngine.freeze() with MappingProxyType immutability
- Approval quorum (QuorumConfig) and fatigue detection
Addresses docs-sync-checker feedback on PR #297.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* feat(adk): add Google ADK governance adapter with PolicyEvaluator
Implements the PolicyEvaluator protocol from google/adk-python#4897:
- ADKPolicyEvaluator: YAML-configurable policy engine for ADK agents
- GovernanceCallbacks: wires into before/after tool/agent hooks
- DelegationScope: monotonic scope narrowing for sub-agents
- Structured audit events with pluggable handlers
- Sample policy config (examples/policies/adk-governance.yaml)
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* fix(security): address all 24 security findings across codebase
Critical (9 fixed):
- CWE-502: Replace pickle.loads with JSON in process_isolation.py and agent_hibernation.py
- CWE-78: Convert shell=True to list-form subprocess in prepare_release.py, prepare_pypi.py
- CWE-94: Replace eval() with safe AST walker in calculator.py
- CWE-77: Sanitize issue title injection in ai-spec-drafter.yml
- CWE-829: Pin setup-node action to SHA in ai-agent-runner/action.yml
- CWE-494: Add SHA-256 verification for NuGet download in publish.yml
- CWE-1395: Tighten cryptography>=44.0.0, django>=4.2 across 7 pyproject.toml files
High (6 fixed):
- CWE-798: Replace hardcoded API key placeholder in VS Code extension
- CWE-502: yaml.safe_load + json.load in github-reviewer example
- CWE-94: Replace eval() docstring example in langchain tools
- CWE-22: Add path traversal validation in .NET FileTrustStore
- CWE-295: Remove non-hash pip install fallback in ci.yml and publish.yml
- GHSA-rf6f-7fwh-wjgh: Fix flatted prototype pollution in 3 npm packages
Medium (6 fixed):
- CWE-79: Replace innerHTML with safe DOM APIs in Chrome extension
- CWE-328: Replace MD5 with SHA-256 in github-reviewer
- CWE-330: Replace random.randint with secrets module in defi-sentinel
- CWE-327: Add deprecation warnings on HMAC-SHA256 fallback in .NET
- CWE-250: Narrow scorecard.yml permissions
- Audit all 10 pull_request_target workflows for HEAD checkout safety
Low (3 fixed):
- Replace weak default passwords in examples
- Add security justification comments to safe workflows
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* fix(ci): restore working pip install syntax for test jobs
The --require-hashes with inline --hash flags breaks when mixed
with editable installs. Restore the working pattern for test deps
while keeping hash verification for the lint requirements file.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
---------
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Summary
Addresses 3 critical security gaps identified through analysis of agent sandbox escape research (Ona/Veto):
1. Tool Content Hashing (defeats tool aliasing/wrapping attacks)
2. PolicyEngine Freeze (prevents runtime self-modification)
3. Approval Quorum & Fatigue Detection (defeats approval fatigue)
Testing
Files Changed