Skip to content

feat(ci): add PR review orchestrator — collapse agents, post unified summary#345

Merged
imran-siddique merged 12 commits intomicrosoft:mainfrom
imran-siddique:feat/pr-review-orchestrator
Mar 22, 2026
Merged

feat(ci): add PR review orchestrator — collapse agents, post unified summary#345
imran-siddique merged 12 commits intomicrosoft:mainfrom
imran-siddique:feat/pr-review-orchestrator

Conversation

@imran-siddique
Copy link
Copy Markdown
Member

Problem

Contributors opening a 3-line fix get bombarded with 5-7 separate bot comments (code-reviewer, security-scanner, breaking-change-detector, docs-sync, test-generator, contributor-guide). This is overwhelming — especially for first-time contributors.

Solution

1. Collapsed agent comments

Individual agent comments are now wrapped in <details>\ tags — collapsed by default with a one-line summary visible:

\
▶ 🤖 AI Agent: security-scanner — No vulnerabilities detected
\\

Click to expand for full details.

2. Unified summary table

A new \�i-pr-summary.yml\ workflow runs after all agents complete and posts ONE clean verdict:

Check Status Details
🔍 Code Review ✅ Passed No issues found
🛡️ Security ✅ Passed No vulnerabilities
🔄 Breaking Changes ⚠️ Warning API signature change
📝 Docs Sync ✅ Passed Up to date
🧪 Tests ℹ️ Suggestion Consider adding tests

Verdict: Ready for human review

3. Idempotent comments

Agent comments now upsert (update existing, don't duplicate) — re-pushes update the same comment instead of posting new ones.

Files changed

  • .github/actions/ai-agent-runner/action.yml\ — collapse + upsert logic
  • .github/workflows/ai-pr-summary.yml\ — new orchestrator workflow

Before/After

Before: 5-7 separate expanded bot comments cluttering the PR
After: 1 summary table + 5-7 collapsed details (one-liner visible)

imran-siddique and others added 12 commits March 20, 2026 10:56
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add EU AI Act, Colorado AI Act, and GPAI obligations timeline with
AGT coverage mapping. Reference Microsoft Purview DSPM for AI as
complementary data governance layer.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The Scorecard API rejects workflows with write permissions at the
workflow level. id-token: write and security-events: write must be
scoped to the job level only. Restores permissions: read-all at
workflow level while keeping job-level write permissions intact.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ft#324)

Add Google-style docstrings with Args, Returns, Raises, Attributes,
and Example sections to MCPMessageType, MCPAdapter, and MCPServer
classes. Also enhances docstrings for key methods including
handle_message, _handle_tools_call, _handle_resources_read, and
_map_tool_to_action.

Fixes microsoft#316
Co-authored-by: Matt Van Horn <455140+mvanhorn@users.noreply.github.com>
…s (dependency confusion) (microsoft#325)

- Replace !pip install agent-os with !pip install -e ../.. in all 6 notebooks;
  agent-os is not on PyPI and installing it from PyPI is a dependency confusion vector
- Replace zendesk-sdk/freshdesk-sdk with zenpy/freshdesk (the real published SDKs)
  in customer-service/requirements.txt
- Remove hashlib-compat from healthcare-hipaa/requirements.txt; hashlib is stdlib
  and hashlib-compat is not a real PyPI package
…stall agent-os with agent-os-kernel

Replace all remaining instances of `pip install agent-os` (unregistered
on PyPI) with `pip install agent-os-kernel` (the actual package) across
docs, examples, TypeScript extensions, CLI source, tests, and SVG assets.

Also fixes `pip install emk` references to point to `agent-os-kernel[full]`
since emk is a submodule, not a standalone PyPI package.

Completes the fix started in PR microsoft#325 which only covered notebooks.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Dify 65K→133K, AutoGen 42K→55K, CrewAI 28K→46K, Semantic Kernel
24K→27K, LangGraph 24K→27K, Haystack 22K→24K, Agent Framework
7.6K→8K. Added star counts for OpenAI Agents SDK (20K) and
Google ADK (18K). Sorted by stars descending.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…summary

- Wrap individual agent comments in <details> tags (collapsed by default)
- Make agent comments idempotent (update on re-push, don't duplicate)
- Add ai-pr-summary.yml workflow that posts one clean verdict table
- Summary uses HTML marker for upsert behavior

Contributors now see ONE summary table instead of 5-7 separate bot comments.
Individual agent reports are preserved but collapsed for reference.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions github-actions bot added ci/cd CI/CD and workflows size/L Large PR (< 500 lines) labels Mar 22, 2026
Copy link
Copy Markdown

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 AI Agent: code-reviewer

Review Summary

This PR introduces a new GitHub Actions workflow and updates existing logic to improve the user experience of PR reviews by consolidating bot comments into a single unified summary. The changes aim to reduce clutter, improve readability, and ensure idempotency of bot comments.

The changes are well-structured and address the stated problem effectively. However, there are some areas that require attention to ensure security, maintainability, and backward compatibility.


🔴 CRITICAL

  1. Potential for Comment Spoofing

    • The findExistingComment function relies on a simple marker string (e.g., <!-- ai-agent:code-reviewer -->) to identify existing comments. This approach is vulnerable to spoofing, where a malicious user could create a comment containing the same marker, causing the bot to overwrite or fail to post its own comment.
    • Recommendation: Use a more robust mechanism to identify bot-generated comments, such as including a unique identifier (e.g., a hash or UUID) in the marker that is generated dynamically and stored securely.
  2. Insufficient Validation of PR Number

    • The ai-pr-summary.yml workflow retrieves the PR number from the triggering workflow using a combination of branch name and commit SHA. However, there is no validation to ensure the PR number is correct or that it matches the intended PR.
    • Recommendation: Add validation to ensure the PR number corresponds to the correct repository and branch. Consider using GitHub's context.payload.pull_request.number directly when available.
  3. Unvalidated User Input in Comment Parsing

    • The parseVerdict function processes user-generated content (PR comments) without sanitizing or escaping the input. This could lead to potential injection attacks or malformed output in the summary table.
    • Recommendation: Sanitize and escape user-generated content before including it in the summary table to prevent potential injection attacks or rendering issues.

🟡 WARNING

  1. Backward Compatibility
    • The introduction of the ai-pr-summary.yml workflow and the changes to ai-agent-runner may alter the behavior of existing workflows and comments. For example:
      • Collapsing comments by default may affect users who rely on expanded comments for quick access.
      • The unified summary table may not include all details from individual agent comments, which could impact users who rely on those details.
    • Recommendation: Clearly document these changes in the release notes and provide a way to opt-out of the new behavior if needed.

💡 SUGGESTIONS

  1. Error Handling for API Calls

    • The ghApi function and other API calls (e.g., github.rest.issues.listComments) lack robust error handling. If an API call fails, the workflow may silently fail or produce incomplete results.
    • Recommendation: Add error handling and retries for API calls to ensure the workflow is resilient to transient failures.
  2. Performance Optimization

    • The findExistingComment function uses a paginated approach to search for existing comments. While this is functional, it could be optimized by limiting the number of pages fetched or using a more efficient search mechanism.
    • Recommendation: Consider using GitHub's GraphQL API for more efficient querying, as it allows filtering and searching comments directly.
  3. Improved Logging

    • The logging in the ai-pr-summary.yml workflow is minimal and does not provide detailed insights into the workflow's execution.
    • Recommendation: Add more granular logging to help debug issues, such as logging the number of comments fetched, the PR number identified, and the status of each agent.
  4. Test Coverage

    • There is no evidence of automated tests for the new functionality introduced in this PR.
    • Recommendation: Add unit tests for the helper functions (e.g., extractOneLiner, findExistingComment, parseVerdict) and integration tests for the ai-pr-summary.yml workflow.
  5. Documentation

    • While the PR description is detailed, there is no accompanying documentation update for the new workflow and changes to the existing behavior.
    • Recommendation: Update the repository's documentation to include details about the new workflow, how it works, and how users can customize or opt-out of the unified summary.

Final Verdict

The PR introduces valuable improvements to the CI/CD pipeline, but the identified critical issues must be addressed before merging. Additionally, implementing the suggested improvements will enhance the robustness, security, and maintainability of the changes.

@github-actions
Copy link
Copy Markdown

🤖 AI Agent: security-scanner

Security Analysis of the Pull Request

This PR introduces a new GitHub Actions workflow and modifies an existing action to improve the presentation of bot-generated comments on pull requests. While the changes are primarily focused on improving the user experience, they involve handling user-generated content (e.g., PR comments) and interacting with the GitHub API, which can introduce potential security risks.


Findings

1. Prompt Injection Defense Bypass

  • Severity: 🔴 CRITICAL
  • Issue: The extractOneLiner function attempts to extract a summary from user-generated content (e.g., PR comments) by parsing the text for specific patterns. This approach is vulnerable to prompt injection attacks. A malicious user could craft a comment that includes a fake "###" or "Summary" line to manipulate the one-liner summary displayed in the collapsed comment.
  • Attack Vector: A malicious actor could craft a comment like:
    ### Malicious Summary
    This is a fake summary that will be displayed in the collapsed view.
    
    This would result in the malicious summary being displayed in the collapsed view, potentially misleading reviewers.
  • Recommendation: Sanitize and validate the extracted one-liner to ensure it does not contain malicious or misleading content. For example, limit the length of the one-liner and strip any HTML or Markdown formatting. Additionally, consider using a more robust method to generate the one-liner, such as extracting the first sentence from a predefined section of the comment.

2. Policy Engine Circumvention

  • Severity: 🟠 HIGH
  • Issue: The PR summary workflow determines the overall verdict based on the presence of specific symbols (e.g., , ⚠️, ) in the agent comments. This approach is vulnerable to manipulation if a malicious actor includes these symbols in their comments, potentially altering the overall verdict.
  • Attack Vector: A malicious user could post a comment containing or ⚠️ to make it appear as though a critical issue or warning exists, even if all agents have passed.
  • Recommendation: Instead of parsing symbols from comments, use a more secure mechanism to communicate agent results to the summary workflow. For example, agents could write their results to a shared artifact or use a dedicated API endpoint to report their status.

3. Trust Chain Weaknesses

  • Severity: 🔵 LOW
  • Issue: The workflow uses the GITHUB_TOKEN to authenticate API requests. While this is standard practice, it is important to ensure that the token has the minimum required permissions and is not exposed in logs.
  • Recommendation: Verify that the GITHUB_TOKEN permissions are scoped to only what is necessary (e.g., contents: read, pull-requests: write, issues: write). Additionally, ensure that sensitive data, such as the token, is not logged.

4. Credential Exposure

  • Severity: 🔵 LOW
  • Issue: There is no evidence of sensitive credentials being exposed in logs or error messages. However, the use of console.log for debugging (e.g., console.log('Created new comment')) should be carefully reviewed to ensure no sensitive data is inadvertently logged.
  • Recommendation: Replace console.log with a logging mechanism that can be configured to redact sensitive information. Avoid logging sensitive data, such as API responses or tokens.

5. Sandbox Escape

  • Severity: 🔵 LOW
  • Issue: The changes do not introduce any new code execution paths or external dependencies that could lead to a sandbox escape.
  • Recommendation: No action required.

6. Deserialization Attacks

  • Severity: 🔵 LOW
  • Issue: The PR does not introduce any deserialization of untrusted data. The JSON parsing is limited to data fetched from the GitHub API, which is considered trusted in this context.
  • Recommendation: No action required.

7. Race Conditions

  • Severity: 🟡 MEDIUM
  • Issue: The findExistingComment function iterates through all comments on a PR to find an existing comment with a specific marker. If multiple workflows attempt to upsert comments simultaneously, there is a risk of a race condition where multiple comments are created instead of updating the existing one.
  • Recommendation: Implement a locking mechanism or use a more robust method to ensure atomic updates to comments. For example, use a dedicated API or database to track the state of comments.

8. Supply Chain

  • Severity: 🟠 HIGH
  • Issue: The workflow uses the actions/github-script action, which is pinned to a specific commit hash (60a0d83039c74a4aee543508d2ffcb1c3799cdea). While this is a good practice, it is important to verify the integrity of the referenced commit to ensure it has not been tampered with.
  • Recommendation: Verify the integrity of the actions/github-script action by checking the commit hash against the official repository. Additionally, consider using Dependabot or a similar tool to monitor for updates to the action.

Summary of Findings

Category Severity Issue Recommendation
Prompt Injection Defense Bypass 🔴 CRITICAL Malicious input can manipulate the one-liner summary in collapsed comments. Sanitize and validate extracted one-liner.
Policy Engine Circumvention 🟠 HIGH Malicious comments can manipulate the overall PR summary verdict. Use a secure mechanism (e.g., shared artifacts) to communicate agent results.
Trust Chain Weaknesses 🔵 LOW Potential over-permission of GITHUB_TOKEN. Verify token permissions and avoid logging sensitive data.
Credential Exposure 🔵 LOW No sensitive credentials exposed, but logging practices should be reviewed. Avoid logging sensitive data.
Sandbox Escape 🔵 LOW No new code execution paths or external dependencies introduced. No action required.
Deserialization Attacks 🔵 LOW No unsafe deserialization detected. No action required.
Race Conditions 🟡 MEDIUM Potential race condition when upserting comments. Implement locking or atomic update mechanisms.
Supply Chain 🟠 HIGH Dependency on actions/github-script should be verified for integrity. Verify commit hash and use Dependabot for updates.

Conclusion

This PR introduces useful improvements to the CI/CD pipeline, but it also introduces critical and high-severity security risks that must be addressed before merging. Specifically, the prompt injection vulnerability and policy engine circumvention are the most concerning issues. Addressing these vulnerabilities should be prioritized to ensure the integrity of the repository and its security features.

@imran-siddique imran-siddique merged commit 51a4253 into microsoft:main Mar 22, 2026
54 of 55 checks passed
@imran-siddique imran-siddique deleted the feat/pr-review-orchestrator branch March 28, 2026 16:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci/cd CI/CD and workflows size/L Large PR (< 500 lines)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants