test(evals): add behavioral eval for file creation and write_file tool selection#26292
test(evals): add behavioral eval for file creation and write_file tool selection#26292
Conversation
Update instructions to prompt the agent to stage only task-specific changed files to prevent unprompted staging of untracked files. Fixes #24628
|
You already have 7 pull requests open. Please work on getting existing PRs merged before opening more. |
🛑 Action Required: Evaluation ApprovalSteering changes have been detected in this PR. To prevent regressions, a maintainer must approve the evaluation run before this PR can be merged. Maintainers:
Once approved, the evaluation results will be posted here automatically. |
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request enhances the agent's behavioral evaluations and prompt instructions to improve its interaction with the file system and Git. The changes aim to ensure safer and more precise file operations, particularly regarding file creation and Git staging, by adding new tests and refining agent guidance. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
|
Size Change: +430 B (0%) Total Size: 33.9 MB
ℹ️ View Unchanged
|
There was a problem hiding this comment.
Code Review
This pull request introduces new behavioral evaluations for file creation and git repository management. It also updates the system prompts to explicitly discourage the use of broad git staging commands like git add . or git add -A, favoring specific file staging instead. Feedback on the new tests suggests parsing JSON tool arguments rather than using string inclusion checks to ensure the assertions are more robust and less prone to failure.
Note: Security Review is unavailable for this PR.
| const targetReadFileIndex = logs.findIndex( | ||
| (log) => | ||
| log.toolRequest?.name === 'read_file' && | ||
| log.toolRequest.args.includes('config.json'), | ||
| ); |
There was a problem hiding this comment.
Using String.prototype.includes() on a JSON string to check tool arguments is brittle. It's better to parse the JSON and inspect the properties directly. This makes the test more robust against formatting changes and less prone to being flaky, which is critical for maintaining a reliable CI pipeline.
| const targetReadFileIndex = logs.findIndex( | |
| (log) => | |
| log.toolRequest?.name === 'read_file' && | |
| log.toolRequest.args.includes('config.json'), | |
| ); | |
| const targetReadFileIndex = logs.findIndex((log) => { | |
| try { | |
| return log.toolRequest?.name === 'read_file' && JSON.parse(log.toolRequest.args).path === 'config.json'; | |
| } catch { return false; } | |
| }); |
References
- The toolRequest.args property is a JSON string and must be parsed using JSON.parse() before its properties can be accessed.
There was a problem hiding this comment.
Done. Updated to use JSON parsing instead of includes() string checks.
| const targetWriteFileIndex = logs.findIndex( | ||
| (log) => | ||
| log.toolRequest?.name === 'write_file' && | ||
| log.toolRequest.args.includes('config.json'), | ||
| ); |
There was a problem hiding this comment.
Similar to the read_file check, using String.prototype.includes() here is brittle. Parsing the JSON arguments for write_file will make this assertion more robust and reliable.
| const targetWriteFileIndex = logs.findIndex( | |
| (log) => | |
| log.toolRequest?.name === 'write_file' && | |
| log.toolRequest.args.includes('config.json'), | |
| ); | |
| const targetWriteFileIndex = logs.findIndex((log) => { | |
| try { | |
| return log.toolRequest?.name === 'write_file' && JSON.parse(log.toolRequest.args).path === 'config.json'; | |
| } catch { return false; } | |
| }); |
References
- The toolRequest.args property is a JSON string and must be parsed using JSON.parse() before its properties can be accessed.
There was a problem hiding this comment.
Done. Same change applied here to parse JSON arguments safely.
Fixes #24806