feat(ci): add PR review orchestrator — collapse agents, post unified summary by imran-siddique · Pull Request #345 · microsoft/agent-governance-toolkit

imran-siddique · 2026-03-22T20:41:47Z

Problem

Contributors opening a 3-line fix get bombarded with 5-7 separate bot comments (code-reviewer, security-scanner, breaking-change-detector, docs-sync, test-generator, contributor-guide). This is overwhelming — especially for first-time contributors.

Solution

1. Collapsed agent comments

Individual agent comments are now wrapped in <details>\ tags — collapsed by default with a one-line summary visible:

\
▶ 🤖 AI Agent: security-scanner — No vulnerabilities detected
\\

Click to expand for full details.

2. Unified summary table

A new \�i-pr-summary.yml\ workflow runs after all agents complete and posts ONE clean verdict:

Check	Status	Details
🔍 Code Review	✅ Passed	No issues found
🛡️ Security	✅ Passed	No vulnerabilities
🔄 Breaking Changes	⚠️ Warning	API signature change
📝 Docs Sync	✅ Passed	Up to date
🧪 Tests	ℹ️ Suggestion	Consider adding tests

Verdict: Ready for human review

3. Idempotent comments

Agent comments now upsert (update existing, don't duplicate) — re-pushes update the same comment instead of posting new ones.

Files changed

.github/actions/ai-agent-runner/action.yml\ — collapse + upsert logic
.github/workflows/ai-pr-summary.yml\ — new orchestrator workflow

Before/After

Before: 5-7 separate expanded bot comments cluttering the PR
After: 1 summary table + 5-7 collapsed details (one-liner visible)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Add EU AI Act, Colorado AI Act, and GPAI obligations timeline with AGT coverage mapping. Reference Microsoft Purview DSPM for AI as complementary data governance layer. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…toolkit

The Scorecard API rejects workflows with write permissions at the workflow level. id-token: write and security-events: write must be scoped to the job level only. Restores permissions: read-all at workflow level while keeping job-level write permissions intact. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…ft#324) Add Google-style docstrings with Args, Returns, Raises, Attributes, and Example sections to MCPMessageType, MCPAdapter, and MCPServer classes. Also enhances docstrings for key methods including handle_message, _handle_tools_call, _handle_resources_read, and _map_tool_to_action. Fixes microsoft#316

Co-authored-by: Matt Van Horn <455140+mvanhorn@users.noreply.github.com>

…icrosoft#326)

…s (dependency confusion) (microsoft#325) - Replace !pip install agent-os with !pip install -e ../.. in all 6 notebooks; agent-os is not on PyPI and installing it from PyPI is a dependency confusion vector - Replace zendesk-sdk/freshdesk-sdk with zenpy/freshdesk (the real published SDKs) in customer-service/requirements.txt - Remove hashlib-compat from healthcare-hipaa/requirements.txt; hashlib is stdlib and hashlib-compat is not a real PyPI package

…stall agent-os with agent-os-kernel Replace all remaining instances of `pip install agent-os` (unregistered on PyPI) with `pip install agent-os-kernel` (the actual package) across docs, examples, TypeScript extensions, CLI source, tests, and SVG assets. Also fixes `pip install emk` references to point to `agent-os-kernel[full]` since emk is a submodule, not a standalone PyPI package. Completes the fix started in PR microsoft#325 which only covered notebooks. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Dify 65K→133K, AutoGen 42K→55K, CrewAI 28K→46K, Semantic Kernel 24K→27K, LangGraph 24K→27K, Haystack 22K→24K, Agent Framework 7.6K→8K. Added star counts for OpenAI Agents SDK (20K) and Google ADK (18K). Sorted by stars descending. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…toolkit

…summary - Wrap individual agent comments in <details> tags (collapsed by default) - Make agent comments idempotent (update on re-push, don't duplicate) - Add ai-pr-summary.yml workflow that posts one clean verdict table - Summary uses HTML marker for upsert behavior Contributors now see ONE summary table instead of 5-7 separate bot comments. Individual agent reports are preserved but collapsed for reference. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

github-actions

🤖 AI Agent: code-reviewer

Review Summary

This PR introduces a new GitHub Actions workflow and updates existing logic to improve the user experience of PR reviews by consolidating bot comments into a single unified summary. The changes aim to reduce clutter, improve readability, and ensure idempotency of bot comments.

The changes are well-structured and address the stated problem effectively. However, there are some areas that require attention to ensure security, maintainability, and backward compatibility.

🔴 CRITICAL

Potential for Comment Spoofing
- The findExistingComment function relies on a simple marker string (e.g., ) to identify existing comments. This approach is vulnerable to spoofing, where a malicious user could create a comment containing the same marker, causing the bot to overwrite or fail to post its own comment.
- Recommendation: Use a more robust mechanism to identify bot-generated comments, such as including a unique identifier (e.g., a hash or UUID) in the marker that is generated dynamically and stored securely.
Insufficient Validation of PR Number
- The ai-pr-summary.yml workflow retrieves the PR number from the triggering workflow using a combination of branch name and commit SHA. However, there is no validation to ensure the PR number is correct or that it matches the intended PR.
- Recommendation: Add validation to ensure the PR number corresponds to the correct repository and branch. Consider using GitHub's context.payload.pull_request.number directly when available.
Unvalidated User Input in Comment Parsing
- The parseVerdict function processes user-generated content (PR comments) without sanitizing or escaping the input. This could lead to potential injection attacks or malformed output in the summary table.
- Recommendation: Sanitize and escape user-generated content before including it in the summary table to prevent potential injection attacks or rendering issues.

🟡 WARNING

Backward Compatibility
- The introduction of the ai-pr-summary.yml workflow and the changes to ai-agent-runner may alter the behavior of existing workflows and comments. For example:
  - Collapsing comments by default may affect users who rely on expanded comments for quick access.
  - The unified summary table may not include all details from individual agent comments, which could impact users who rely on those details.
- Recommendation: Clearly document these changes in the release notes and provide a way to opt-out of the new behavior if needed.

💡 SUGGESTIONS

Error Handling for API Calls
- The ghApi function and other API calls (e.g., github.rest.issues.listComments) lack robust error handling. If an API call fails, the workflow may silently fail or produce incomplete results.
- Recommendation: Add error handling and retries for API calls to ensure the workflow is resilient to transient failures.
Performance Optimization
- The findExistingComment function uses a paginated approach to search for existing comments. While this is functional, it could be optimized by limiting the number of pages fetched or using a more efficient search mechanism.
- Recommendation: Consider using GitHub's GraphQL API for more efficient querying, as it allows filtering and searching comments directly.
Improved Logging
- The logging in the ai-pr-summary.yml workflow is minimal and does not provide detailed insights into the workflow's execution.
- Recommendation: Add more granular logging to help debug issues, such as logging the number of comments fetched, the PR number identified, and the status of each agent.
Test Coverage
- There is no evidence of automated tests for the new functionality introduced in this PR.
- Recommendation: Add unit tests for the helper functions (e.g., extractOneLiner, findExistingComment, parseVerdict) and integration tests for the ai-pr-summary.yml workflow.
Documentation
- While the PR description is detailed, there is no accompanying documentation update for the new workflow and changes to the existing behavior.
- Recommendation: Update the repository's documentation to include details about the new workflow, how it works, and how users can customize or opt-out of the unified summary.

Final Verdict

The PR introduces valuable improvements to the CI/CD pipeline, but the identified critical issues must be addressed before merging. Additionally, implementing the suggested improvements will enhance the robustness, security, and maintainability of the changes.

github-actions · 2026-03-22T20:42:19Z

🤖 AI Agent: security-scanner

Security Analysis of the Pull Request

This PR introduces a new GitHub Actions workflow and modifies an existing action to improve the presentation of bot-generated comments on pull requests. While the changes are primarily focused on improving the user experience, they involve handling user-generated content (e.g., PR comments) and interacting with the GitHub API, which can introduce potential security risks.

Findings

1. Prompt Injection Defense Bypass

Severity: 🔴 CRITICAL
Issue: The extractOneLiner function attempts to extract a summary from user-generated content (e.g., PR comments) by parsing the text for specific patterns. This approach is vulnerable to prompt injection attacks. A malicious user could craft a comment that includes a fake "###" or "Summary" line to manipulate the one-liner summary displayed in the collapsed comment.
Attack Vector: A malicious actor could craft a comment like:
```
### Malicious Summary
This is a fake summary that will be displayed in the collapsed view.
```
This would result in the malicious summary being displayed in the collapsed view, potentially misleading reviewers.
Recommendation: Sanitize and validate the extracted one-liner to ensure it does not contain malicious or misleading content. For example, limit the length of the one-liner and strip any HTML or Markdown formatting. Additionally, consider using a more robust method to generate the one-liner, such as extracting the first sentence from a predefined section of the comment.

2. Policy Engine Circumvention

Severity: 🟠 HIGH
Issue: The PR summary workflow determines the overall verdict based on the presence of specific symbols (e.g., ❌, ⚠️, ⏳) in the agent comments. This approach is vulnerable to manipulation if a malicious actor includes these symbols in their comments, potentially altering the overall verdict.
Attack Vector: A malicious user could post a comment containing ❌ or ⚠️ to make it appear as though a critical issue or warning exists, even if all agents have passed.
Recommendation: Instead of parsing symbols from comments, use a more secure mechanism to communicate agent results to the summary workflow. For example, agents could write their results to a shared artifact or use a dedicated API endpoint to report their status.

3. Trust Chain Weaknesses

Severity: 🔵 LOW
Issue: The workflow uses the GITHUB_TOKEN to authenticate API requests. While this is standard practice, it is important to ensure that the token has the minimum required permissions and is not exposed in logs.
Recommendation: Verify that the GITHUB_TOKEN permissions are scoped to only what is necessary (e.g., contents: read, pull-requests: write, issues: write). Additionally, ensure that sensitive data, such as the token, is not logged.

4. Credential Exposure

Severity: 🔵 LOW
Issue: There is no evidence of sensitive credentials being exposed in logs or error messages. However, the use of console.log for debugging (e.g., console.log('Created new comment')) should be carefully reviewed to ensure no sensitive data is inadvertently logged.
Recommendation: Replace console.log with a logging mechanism that can be configured to redact sensitive information. Avoid logging sensitive data, such as API responses or tokens.

5. Sandbox Escape

Severity: 🔵 LOW
Issue: The changes do not introduce any new code execution paths or external dependencies that could lead to a sandbox escape.
Recommendation: No action required.

6. Deserialization Attacks

Severity: 🔵 LOW
Issue: The PR does not introduce any deserialization of untrusted data. The JSON parsing is limited to data fetched from the GitHub API, which is considered trusted in this context.
Recommendation: No action required.

7. Race Conditions

Severity: 🟡 MEDIUM
Issue: The findExistingComment function iterates through all comments on a PR to find an existing comment with a specific marker. If multiple workflows attempt to upsert comments simultaneously, there is a risk of a race condition where multiple comments are created instead of updating the existing one.
Recommendation: Implement a locking mechanism or use a more robust method to ensure atomic updates to comments. For example, use a dedicated API or database to track the state of comments.

8. Supply Chain

Severity: 🟠 HIGH
Issue: The workflow uses the actions/github-script action, which is pinned to a specific commit hash (60a0d83039c74a4aee543508d2ffcb1c3799cdea). While this is a good practice, it is important to verify the integrity of the referenced commit to ensure it has not been tampered with.
Recommendation: Verify the integrity of the actions/github-script action by checking the commit hash against the official repository. Additionally, consider using Dependabot or a similar tool to monitor for updates to the action.

Summary of Findings

Category	Severity	Issue	Recommendation
Prompt Injection Defense Bypass	🔴 CRITICAL	Malicious input can manipulate the one-liner summary in collapsed comments.	Sanitize and validate extracted one-liner.
Policy Engine Circumvention	🟠 HIGH	Malicious comments can manipulate the overall PR summary verdict.	Use a secure mechanism (e.g., shared artifacts) to communicate agent results.
Trust Chain Weaknesses	🔵 LOW	Potential over-permission of `GITHUB_TOKEN`.	Verify token permissions and avoid logging sensitive data.
Credential Exposure	🔵 LOW	No sensitive credentials exposed, but logging practices should be reviewed.	Avoid logging sensitive data.
Sandbox Escape	🔵 LOW	No new code execution paths or external dependencies introduced.	No action required.
Deserialization Attacks	🔵 LOW	No unsafe deserialization detected.	No action required.
Race Conditions	🟡 MEDIUM	Potential race condition when upserting comments.	Implement locking or atomic update mechanisms.
Supply Chain	🟠 HIGH	Dependency on `actions/github-script` should be verified for integrity.	Verify commit hash and use Dependabot for updates.

Conclusion

This PR introduces useful improvements to the CI/CD pipeline, but it also introduces critical and high-severity security risks that must be addressed before merging. Specifically, the prompt injection vulnerability and policy engine circumvention are the most concerning issues. Addressing these vulnerabilities should be prioritized to ensure the integrity of the repository and its security features.

imran-siddique and others added 12 commits March 20, 2026 10:56

docs: add testing guide for external testers and customers

9bd6672

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Merge branch 'main' of https://github.com/microsoft/agent-governance-…

3ab4a65

…toolkit

ci: add markdown link checker workflow (microsoft#323)

baeed60

Co-authored-by: Matt Van Horn <455140+mvanhorn@users.noreply.github.com>

feat: add policy evaluation heatmap to SRE dashboard (microsoft#309) (m…

8ee3c81

…icrosoft#326)

Merge branch 'main' of https://github.com/microsoft/agent-governance-…

96b2046

…toolkit

github-actions bot added ci/cd CI/CD and workflows size/L Large PR (< 500 lines) labels Mar 22, 2026

github-actions bot reviewed Mar 22, 2026

View reviewed changes

imran-siddique merged commit 51a4253 into microsoft:main Mar 22, 2026
54 of 55 checks passed

imran-siddique mentioned this pull request Mar 22, 2026

feat(compliance): add compliance_grade() method to GovernanceAttestation #346

Merged

imran-siddique deleted the feat/pr-review-orchestrator branch March 28, 2026 16:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(ci): add PR review orchestrator — collapse agents, post unified summary#345

feat(ci): add PR review orchestrator — collapse agents, post unified summary#345
imran-siddique merged 12 commits intomicrosoft:mainfrom
imran-siddique:feat/pr-review-orchestrator

imran-siddique commented Mar 22, 2026

Uh oh!

github-actions bot left a comment

Uh oh!

github-actions bot commented Mar 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

imran-siddique commented Mar 22, 2026

Problem

Solution

1. Collapsed agent comments

2. Unified summary table

3. Idempotent comments

Files changed

Before/After

Uh oh!

github-actions bot left a comment

Choose a reason for hiding this comment

🤖 AI Agent: code-reviewer

Review Summary

🔴 CRITICAL

🟡 WARNING

💡 SUGGESTIONS

Final Verdict

Uh oh!

github-actions bot commented Mar 22, 2026

🤖 AI Agent: security-scanner

Security Analysis of the Pull Request

Findings

1. Prompt Injection Defense Bypass

2. Policy Engine Circumvention

3. Trust Chain Weaknesses

4. Credential Exposure

5. Sandbox Escape

6. Deserialization Attacks

7. Race Conditions

8. Supply Chain

Summary of Findings

Conclusion

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants