feat(bot): enforce evaluation role, multi-iteration feedback loop, and diagnostic rigor#26303
feat(bot): enforce evaluation role, multi-iteration feedback loop, and diagnostic rigor#26303gundermanc wants to merge 15 commits intomainfrom
Conversation
|
Size Change: -4 B (0%) Total Size: 34 MB
ℹ️ View Unchanged
|
db278fa to
0b87ff1
Compare
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request enhances the Gemini CLI Bot's decision-making capabilities by refining its system prompts. The changes focus on improving how the bot identifies and handles architectural conflicts within the repository, ensuring that it differentiates between redundant workflows that should be consolidated and complementary systems that should be preserved. This update aims to prevent the bot from making superficial optimizations that ignore underlying systemic conflicts. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request updates the bot's internal guidelines across common.md, critique.md, and metrics.md to better address architectural conflicts and redundant systems. Key changes include defining the removal of obsolete workflows as a surgical fix, adding a critique rule to reject symptom-only fixes for architectural issues, and requiring the identification of overlapping systems during policy evaluation. Feedback was provided to refine the language in common.md, suggesting that workflows should be deleted only when redundant or in direct conflict with other systems, rather than based on vague 'standard practices', to avoid the risk of deleting valid legacy code.
Note: Security Review has been skipped due to the limited scope of the PR.
| delete files or workflows if your evidence shows they are conflicting with | ||
| standard practices. |
There was a problem hiding this comment.
The phrase "conflicting with standard practices" is vague and potentially risky for an automated agent. It could lead to the deletion of valid workflows that simply use non-standard patterns or legacy styles that are still intended. Given the PR's focus on architectural conflicts, it is safer to instruct the bot to delete files only when they are redundant or in direct conflict with other systems, as specified in the other prompt updates in this PR. Maintaining consistency in documentation is crucial to avoid contradictions across the repository.
| delete files or workflows if your evidence shows they are conflicting with | |
| standard practices. | |
| delete files or workflows if your evidence shows they are redundant or | |
| conflicting with other systems. |
References
- Maintain consistency in documentation. When information about a feature is present in multiple documents, ensure all instances are updated or removed together to avoid contradictions.
0b87ff1 to
d6add1d
Compare
d6add1d to
be65d2f
Compare
be65d2f to
c6121d5
Compare
- BerriAI/litellm#26969: tool-permission guardrail tightening (merge-after-nits) - BerriAI/litellm#26967: VCR Redis observability (merge-as-is) - google-gemini/gemini-cli#26303: brain/critique role split + iteration (needs-discussion) - google-gemini/gemini-cli#26287: voice transcription cursor-position insert (merge-after-nits) - google-gemini/gemini-cli#26274: ssh:// extension install scheme (merge-as-is)
Fixes the throughput metrics script and introduces new visibility into backlog bottlenecks and priority distribution. ### Changes - **Throughput Fixes**: Resolved a `ReferenceError` where `isMaintainer` was not correctly scoped, fixed a malformed license header, and added a new metric for `issue_arrival_rate_per_day` to enable growth-vs-closure analysis. - **Backlog Bottlenecks**: Introduced `bottlenecks.ts` to identify "Zombie" issues (no activity > 30 days) and "Hot" issues (high activity). - **Priority Distribution**: Introduced `priority_distribution.ts` to track the count of open issues by priority level (P0-P3). ### Impact These metrics will provide the necessary data to confirm if the repository is experiencing systemic backlog growth (Arrival Rate > Throughput) and help identify which segments of the backlog require urgent triage.
b93f573 to
381aae2
Compare
…edback loops This update hardens the bot's reasoning and validation layers to stop thrashing and ensure technical quality: - Mandates local validation (lint, build, test) in Brain and Critique prompts. - Uncaps bottleneck metrics (zombie issues, priority distribution) to 1000 items. - Enhances PR awareness to handle multiple bot identities and exclude release PRs. - Formally defines closed (unmerged) PRs as explicit user rejection signals. - Strengthens domain rotation and anti-pigeonholing enforcement.
…g CI This update resolves the bot's persistent focus on already-completed tasks: - Moves and syncs lessons-learned.md to tools/gemini-cli-bot/ to ensure persistent memory. - Marks metrics fixes, prompt hardenings, and user rejection signals as DONE in the ledger. - Implements the CI matrix optimization (Node 20.x for PRs) the bot was re-attempting. - This forces the bot to rotate to a new domain in the next run by satisfying its current goals.
- Mandate the use of `gh run view` for empirical log verification rather than static code inspection. - Update interactive mode prompt to allow the agent to retain task context and run the unblocking protocol when following up on its own PRs.
Summary
This PR improves the Gemini CLI Bot's system prompts to explicitly identify and resolve architectural conflicts, restricts the critique agent to an evaluation-only role, implements a multi-iteration feedback loop, and improves its diagnostic rigor and context awareness.
Details
Empirical Log Verificationrule to the Defensive Scripting section, mandating the use ofgh run view --log-failedfor diagnosing CI/workflow failures instead of relying on static code analysis.Related Issues
Resolves a systemic issue discovered while reviewing PR #26302 and PR #26304, where the bot either ignored conflicting workflows or introduced structural bugs due to generative flaws in the critique phase. Also addresses the root cause of the bot's inability to diagnose CI failures on its own PRs as seen in #26513.
How to Validate
whileloop logic successfully captures and passes feedback between the agents.Pre-Merge Checklist