Skip to content

feat(bot): enforce evaluation role, multi-iteration feedback loop, and diagnostic rigor#26303

Draft
gundermanc wants to merge 15 commits intomainfrom
bot-prompt-improvements
Draft

feat(bot): enforce evaluation role, multi-iteration feedback loop, and diagnostic rigor#26303
gundermanc wants to merge 15 commits intomainfrom
bot-prompt-improvements

Conversation

@gundermanc
Copy link
Copy Markdown
Member

@gundermanc gundermanc commented Apr 30, 2026

Summary

This PR improves the Gemini CLI Bot's system prompts to explicitly identify and resolve architectural conflicts, restricts the critique agent to an evaluation-only role, implements a multi-iteration feedback loop, and improves its diagnostic rigor and context awareness.

Details

  • metrics.md (The Brain): Added a new rule instructing the Brain to actively search the repository for overlapping systems before optimizing and to verify their intent (contradictory vs. complementary).
  • critique.md (The Reviewer):
    • Redefined the critique agent as an "evaluator ONLY" that must NOT apply fixes or modify the code itself.
    • Added an Architectural Conflict check to the Logical & Workflow Integrity checklist.
    • Added a Systemic Simulation requirement, forcing the agent to explicitly write out a timeline simulation for time-based logic.
    • Added a validation step instructing the critique agent to ensure changes pass the build, tests, and linter.
  • common.md:
    • Clarified the "Surgical Changes" rule to explicitly state that deleting duplicated, conflicting, or obsolete workflows is considered the ultimate "surgical" fix.
    • Added Empirical Log Verification rule to the Defensive Scripting section, mandating the use of gh run view --log-failed for diagnosing CI/workflow failures instead of relying on static code analysis.
  • interactive.md: Added an exception to the "Ignore Pending Tasks" rule so that if the user's comment is on a PR authored by the bot, the bot can engage the UNBLOCKING PROTOCOL and inspect the PR's true technical state (like CI failures) instead of being stuck in a restricted context loop.
  • gemini-cli-bot-brain.yml: Implemented a new bash loop to replace the static Brain -> Critique pipeline. The pipeline now supports an investigate -> critique -> investigate -> critique loop. If the Critique agent rejects the Brain's changes, it passes the feedback via critique_feedback.md, resets the repository state, and allows the Brain a second attempt to implement a fix.

Related Issues

Resolves a systemic issue discovered while reviewing PR #26302 and PR #26304, where the bot either ignored conflicting workflows or introduced structural bugs due to generative flaws in the critique phase. Also addresses the root cause of the bot's inability to diagnose CI failures on its own PRs as seen in #26513.

How to Validate

  1. Review the prompt changes to ensure they are clear, nuanced, and enforce the desired behavior for log verification, unblocking protocol engagement, and architectural conflict resolution.
  2. Review the workflow changes to confirm the while loop logic successfully captures and passes feedback between the agents.
  3. Trigger the workflow manually and observe the run logs. If the critique agent rejects the first attempt, you should see Iteration 2 start, receive the feedback, and attempt a new solution.

Pre-Merge Checklist

  • Updated relevant documentation and README (if needed)
  • Added/updated tests (if needed)
  • Noted breaking changes (if any)
  • Validated on required platforms/methods:
    • MacOS
      • npm run
      • npx
      • Docker
      • Podman
      • Seatbelt
    • Windows
      • npm run
      • npx
      • Docker
    • Linux
      • npm run
      • npx
      • Docker

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 30, 2026

Size Change: -4 B (0%)

Total Size: 34 MB

Filename Size Change
./bundle/chunk-2KV5MEJA.js 0 B -3.43 kB (removed) 🏆
./bundle/chunk-57KM6YND.js 0 B -2.78 MB (removed) 🏆
./bundle/chunk-ASRU6KWB.js 0 B -658 kB (removed) 🏆
./bundle/chunk-H2G2F75F.js 0 B -14.7 MB (removed) 🏆
./bundle/chunk-KIXZ2FCH.js 0 B -3.8 kB (removed) 🏆
./bundle/chunk-NKU7TUNE.js 0 B -19.5 kB (removed) 🏆
./bundle/chunk-VF2E5OO3.js 0 B -49.2 kB (removed) 🏆
./bundle/chunk-VYNU6FEB.js 0 B -12.5 kB (removed) 🏆
./bundle/core-SM75YMGM.js 0 B -48.8 kB (removed) 🏆
./bundle/devtoolsService-3FA3XYXG.js 0 B -28 kB (removed) 🏆
./bundle/gemini-N5PF7HCL.js 0 B -583 kB (removed) 🏆
./bundle/interactiveCli-AKPXIBTN.js 0 B -1.29 MB (removed) 🏆
./bundle/liteRtServerManager-3USJIXPG.js 0 B -2.11 kB (removed) 🏆
./bundle/oauth2-provider-WTI6KIQJ.js 0 B -9.16 kB (removed) 🏆
./bundle/chunk-4TCIBMU6.js 19.5 kB +19.5 kB (new file) 🆕
./bundle/chunk-73DCHPYR.js 12.5 kB +12.5 kB (new file) 🆕
./bundle/chunk-C42BOO2A.js 3.8 kB +3.8 kB (new file) 🆕
./bundle/chunk-FXK3LICV.js 49.2 kB +49.2 kB (new file) 🆕
./bundle/chunk-JLQRYP72.js 658 kB +658 kB (new file) 🆕
./bundle/chunk-O7FCEU4Z.js 2.78 MB +2.78 MB (new file) 🆕
./bundle/chunk-VZ6CZHM7.js 14.7 MB +14.7 MB (new file) 🆕
./bundle/chunk-XWGESCQP.js 3.43 kB +3.43 kB (new file) 🆕
./bundle/core-FPNOC3JH.js 48.8 kB +48.8 kB (new file) 🆕
./bundle/devtoolsService-G2UYLTUH.js 28 kB +28 kB (new file) 🆕
./bundle/gemini-D5A7NLRF.js 583 kB +583 kB (new file) 🆕
./bundle/interactiveCli-QPMYW6QD.js 1.29 MB +1.29 MB (new file) 🆕
./bundle/liteRtServerManager-LSMIXHLQ.js 2.11 kB +2.11 kB (new file) 🆕
./bundle/oauth2-provider-RPZNJDAR.js 9.16 kB +9.16 kB (new file) 🆕
ℹ️ View Unchanged
Filename Size Change
./bundle/bundled/third_party/index.js 8 MB 0 B
./bundle/chunk-34MYV7JD.js 2.45 kB 0 B
./bundle/chunk-5AUYMPVF.js 858 B 0 B
./bundle/chunk-5PS3AYFU.js 1.18 kB 0 B
./bundle/chunk-664ZODQF.js 124 kB 0 B
./bundle/chunk-DAHVX5MI.js 206 kB 0 B
./bundle/chunk-IUUIT4SU.js 56.5 kB 0 B
./bundle/chunk-RJTRUG2J.js 39.8 kB 0 B
./bundle/chunk-VJSUVOZ4.js 1.97 MB 0 B
./bundle/cleanup-O7FPPK6O.js 0 B -932 B (removed) 🏆
./bundle/devtools-36NN55EP.js 696 kB 0 B
./bundle/dist-T73EYRDX.js 356 B 0 B
./bundle/events-XB7DADIJ.js 418 B 0 B
./bundle/examples/hooks/scripts/on-start.js 188 B 0 B
./bundle/examples/mcp-server/example.js 1.43 kB 0 B
./bundle/gemini.js 5.1 kB 0 B
./bundle/getMachineId-bsd-TXG52NKR.js 1.55 kB 0 B
./bundle/getMachineId-darwin-7OE4DDZ6.js 1.55 kB 0 B
./bundle/getMachineId-linux-SHIFKOOX.js 1.34 kB 0 B
./bundle/getMachineId-unsupported-5U5DOEYY.js 1.06 kB 0 B
./bundle/getMachineId-win-6KLLGOI4.js 1.72 kB 0 B
./bundle/memoryDiscovery-NGHTMHWQ.js 980 B 0 B
./bundle/multipart-parser-KPBZEGQU.js 11.7 kB 0 B
./bundle/node_modules/@google/gemini-cli-devtools/dist/client/main.js 222 kB 0 B
./bundle/node_modules/@google/gemini-cli-devtools/dist/src/_client-assets.js 229 kB 0 B
./bundle/node_modules/@google/gemini-cli-devtools/dist/src/index.js 13.4 kB 0 B
./bundle/node_modules/@google/gemini-cli-devtools/dist/src/types.js 132 B 0 B
./bundle/sandbox-macos-permissive-open.sb 890 B 0 B
./bundle/sandbox-macos-permissive-proxied.sb 1.31 kB 0 B
./bundle/sandbox-macos-restrictive-open.sb 3.36 kB 0 B
./bundle/sandbox-macos-restrictive-proxied.sb 3.56 kB 0 B
./bundle/sandbox-macos-strict-open.sb 4.82 kB 0 B
./bundle/sandbox-macos-strict-proxied.sb 5.02 kB 0 B
./bundle/src-QVCVGIUX.js 47 kB 0 B
./bundle/start-ZKA2MUE3.js 0 B -652 B (removed) 🏆
./bundle/tree-sitter-7U6MW5PS.js 274 kB 0 B
./bundle/tree-sitter-bash-34ZGLXVX.js 1.84 MB 0 B
./bundle/cleanup-LKHVZU2K.js 932 B +932 B (new file) 🆕
./bundle/start-BNGOVM37.js 652 B +652 B (new file) 🆕

compressed-size-action

@gundermanc gundermanc force-pushed the bot-prompt-improvements branch from db278fa to 0b87ff1 Compare April 30, 2026 23:59
@gundermanc gundermanc changed the title feat(bot): improve conflict detection in system prompts feat(bot): improve nuanced conflict detection in system prompts Apr 30, 2026
@gundermanc gundermanc marked this pull request as ready for review May 1, 2026 00:00
@gundermanc gundermanc requested a review from a team as a code owner May 1, 2026 00:00
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request enhances the Gemini CLI Bot's decision-making capabilities by refining its system prompts. The changes focus on improving how the bot identifies and handles architectural conflicts within the repository, ensuring that it differentiates between redundant workflows that should be consolidated and complementary systems that should be preserved. This update aims to prevent the bot from making superficial optimizations that ignore underlying systemic conflicts.

Highlights

  • Architectural Conflict Detection: Updated system prompts to require the bot to actively search for and evaluate overlapping systems before proposing optimizations.
  • Nuanced Resolution Strategy: Instructed the bot to distinguish between contradictory and complementary workflows, favoring consolidation for conflicts rather than naive adjustments.
  • Reviewer Checklist Update: Added an explicit 'Architectural Conflict' check to the critique process to prevent the bot from treating symptoms of deeper structural issues.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the bot's internal guidelines across common.md, critique.md, and metrics.md to better address architectural conflicts and redundant systems. Key changes include defining the removal of obsolete workflows as a surgical fix, adding a critique rule to reject symptom-only fixes for architectural issues, and requiring the identification of overlapping systems during policy evaluation. Feedback was provided to refine the language in common.md, suggesting that workflows should be deleted only when redundant or in direct conflict with other systems, rather than based on vague 'standard practices', to avoid the risk of deleting valid legacy code.

Note: Security Review has been skipped due to the limited scope of the PR.

Comment on lines +105 to +106
delete files or workflows if your evidence shows they are conflicting with
standard practices.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The phrase "conflicting with standard practices" is vague and potentially risky for an automated agent. It could lead to the deletion of valid workflows that simply use non-standard patterns or legacy styles that are still intended. Given the PR's focus on architectural conflicts, it is safer to instruct the bot to delete files only when they are redundant or in direct conflict with other systems, as specified in the other prompt updates in this PR. Maintaining consistency in documentation is crucial to avoid contradictions across the repository.

Suggested change
delete files or workflows if your evidence shows they are conflicting with
standard practices.
delete files or workflows if your evidence shows they are redundant or
conflicting with other systems.
References
  1. Maintain consistency in documentation. When information about a feature is present in multiple documents, ensure all instances are updated or removed together to avoid contradictions.

@gundermanc gundermanc force-pushed the bot-prompt-improvements branch from 0b87ff1 to d6add1d Compare May 1, 2026 00:04
@gundermanc gundermanc changed the title feat(bot): improve nuanced conflict detection in system prompts feat(bot): improve nuanced conflict detection and validation in system prompts May 1, 2026
@gundermanc gundermanc force-pushed the bot-prompt-improvements branch from d6add1d to be65d2f Compare May 1, 2026 03:47
@gundermanc gundermanc requested a review from a team as a code owner May 1, 2026 03:47
@gundermanc gundermanc changed the title feat(bot): improve nuanced conflict detection and validation in system prompts feat(bot): enforce evaluation role and multi-iteration feedback loop May 1, 2026
@gundermanc gundermanc force-pushed the bot-prompt-improvements branch from be65d2f to c6121d5 Compare May 1, 2026 03:51
Bojun-Vvibe added a commit to Bojun-Vvibe/oss-contributions that referenced this pull request May 1, 2026
- BerriAI/litellm#26969: tool-permission guardrail tightening (merge-after-nits)
- BerriAI/litellm#26967: VCR Redis observability (merge-as-is)
- google-gemini/gemini-cli#26303: brain/critique role split + iteration (needs-discussion)
- google-gemini/gemini-cli#26287: voice transcription cursor-position insert (merge-after-nits)
- google-gemini/gemini-cli#26274: ssh:// extension install scheme (merge-as-is)
gundermanc and others added 2 commits May 1, 2026 10:22
Fixes the throughput metrics script and introduces new visibility into backlog bottlenecks and priority distribution.

### Changes
- **Throughput Fixes**: Resolved a `ReferenceError` where `isMaintainer` was not correctly scoped, fixed a malformed license header, and added a new metric for `issue_arrival_rate_per_day` to enable growth-vs-closure analysis.
- **Backlog Bottlenecks**: Introduced `bottlenecks.ts` to identify "Zombie" issues (no activity > 30 days) and "Hot" issues (high activity).
- **Priority Distribution**: Introduced `priority_distribution.ts` to track the count of open issues by priority level (P0-P3).

### Impact
These metrics will provide the necessary data to confirm if the repository is experiencing systemic backlog growth (Arrival Rate > Throughput) and help identify which segments of the backlog require urgent triage.
@gundermanc gundermanc force-pushed the bot-prompt-improvements branch from b93f573 to 381aae2 Compare May 1, 2026 20:52
gundermanc added 4 commits May 5, 2026 08:34
…edback loops

This update hardens the bot's reasoning and validation layers to stop thrashing and ensure technical quality:
- Mandates local validation (lint, build, test) in Brain and Critique prompts.
- Uncaps bottleneck metrics (zombie issues, priority distribution) to 1000 items.
- Enhances PR awareness to handle multiple bot identities and exclude release PRs.
- Formally defines closed (unmerged) PRs as explicit user rejection signals.
- Strengthens domain rotation and anti-pigeonholing enforcement.
…g CI

This update resolves the bot's persistent focus on already-completed tasks:
- Moves and syncs lessons-learned.md to tools/gemini-cli-bot/ to ensure persistent memory.
- Marks metrics fixes, prompt hardenings, and user rejection signals as DONE in the ledger.
- Implements the CI matrix optimization (Node 20.x for PRs) the bot was re-attempting.
- This forces the bot to rotate to a new domain in the next run by satisfying its current goals.
- Mandate the use of `gh run view` for empirical log verification rather than static code inspection.
- Update interactive mode prompt to allow the agent to retain task context and run the unblocking protocol when following up on its own PRs.
@gundermanc gundermanc changed the title feat(bot): enforce evaluation role and multi-iteration feedback loop feat(bot): enforce evaluation role, multi-iteration feedback loop, and diagnostic rigor May 5, 2026
@gundermanc gundermanc marked this pull request as draft May 6, 2026 20:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant