test(evals): add behavioral eval for file creation and write_file tool selection by akh64bit · Pull Request #26292 · google-gemini/gemini-cli

akh64bit · 2026-04-30T21:07:40Z

Update instructions to prompt the agent to stage only task-specific changed files to prevent unprompted staging of untracked files. Fixes #24628

…l selection

github-actions · 2026-04-30T21:08:58Z

You already have 7 pull requests open. Please work on getting existing PRs merged before opening more.

github-actions · 2026-04-30T21:09:00Z

🛑 Action Required: Evaluation Approval

Steering changes have been detected in this PR. To prevent regressions, a maintainer must approve the evaluation run before this PR can be merged.

Maintainers:

Go to the Workflow Run Summary.
Click the yellow 'Review deployments' button.
Select the 'eval-gate' environment and click 'Approve'.

Once approved, the evaluation results will be posted here automatically.

gemini-code-assist · 2026-04-30T21:11:25Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request enhances the agent's behavioral evaluations and prompt instructions to improve its interaction with the file system and Git. The changes aim to ensure safer and more precise file operations, particularly regarding file creation and Git staging, by adding new tests and refining agent guidance.

Highlights

New Behavioral Evaluations: Added a new behavioral evaluation suite to test file creation and the safe use of the write_file tool, ensuring files are created correctly and existing ones are not inadvertently overwritten. Also, a new test was added to the gitRepo evaluation to prevent agents from using broad git add . or git add -A commands.
Prompt Snippet Updates: Modified prompt snippets in packages/core/src/prompts/snippets.ts and packages/core/src/prompts/snippets.legacy.ts to explicitly instruct agents against using git add . or git add -A, promoting more precise file staging with git add <file>....

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

github-actions · 2026-04-30T21:16:39Z

Size Change: +430 B (0%)

Total Size: 33.9 MB

Filename	Size	Change
`./bundle/chunk-4DGQCZNX.js`	0 B	-12.5 kB (removed)	🏆
`./bundle/chunk-IOFKZKEL.js`	0 B	-657 kB (removed)	🏆
`./bundle/chunk-LEY75LQP.js`	0 B	-2.72 MB (removed)	🏆
`./bundle/chunk-NF4DZIIK.js`	0 B	-3.43 kB (removed)	🏆
`./bundle/chunk-PZEZPDMI.js`	0 B	-3.8 kB (removed)	🏆
`./bundle/chunk-U4PBE6HP.js`	0 B	-19.5 kB (removed)	🏆
`./bundle/chunk-VA7435CI.js`	0 B	-49.2 kB (removed)	🏆
`./bundle/chunk-ZULJCKWJ.js`	0 B	-14.7 MB (removed)	🏆
`./bundle/core-IUJFYSPW.js`	0 B	-48.2 kB (removed)	🏆
`./bundle/devtoolsService-XSJB4FYG.js`	0 B	-28 kB (removed)	🏆
`./bundle/gemini-IMCJSXQJ.js`	0 B	-582 kB (removed)	🏆
`./bundle/interactiveCli-JWEBBIIH.js`	0 B	-1.32 MB (removed)	🏆
`./bundle/liteRtServerManager-CYCFDJV3.js`	0 B	-2.11 kB (removed)	🏆
`./bundle/oauth2-provider-NNI322CS.js`	0 B	-9.16 kB (removed)	🏆
`./bundle/chunk-6TTVQ74O.js`	657 kB	+657 kB (new file)	🆕
`./bundle/chunk-7G2LL3P2.js`	2.72 MB	+2.72 MB (new file)	🆕
`./bundle/chunk-FAGKMQRM.js`	49.2 kB	+49.2 kB (new file)	🆕
`./bundle/chunk-KRHNQDMF.js`	19.5 kB	+19.5 kB (new file)	🆕
`./bundle/chunk-NH6GW4O6.js`	3.8 kB	+3.8 kB (new file)	🆕
`./bundle/chunk-NI2ARPJU.js`	14.7 MB	+14.7 MB (new file)	🆕
`./bundle/chunk-QTQACY5S.js`	12.5 kB	+12.5 kB (new file)	🆕
`./bundle/chunk-VZP5AA7G.js`	3.43 kB	+3.43 kB (new file)	🆕
`./bundle/core-XIIOQWPY.js`	48.2 kB	+48.2 kB (new file)	🆕
`./bundle/devtoolsService-6MSKBSCA.js`	28 kB	+28 kB (new file)	🆕
`./bundle/gemini-WB4FMV3H.js`	582 kB	+582 kB (new file)	🆕
`./bundle/interactiveCli-NZQP3FUM.js`	1.32 MB	+1.32 MB (new file)	🆕
`./bundle/liteRtServerManager-KMRM4TZO.js`	2.11 kB	+2.11 kB (new file)	🆕
`./bundle/oauth2-provider-4LFR37R6.js`	9.16 kB	+9.16 kB (new file)	🆕

ℹ️ View Unchanged

Filename	Size	Change
`./bundle/bundled/third_party/index.js`	8 MB	0 B
`./bundle/chunk-34MYV7JD.js`	2.45 kB	0 B
`./bundle/chunk-533APETE.js`	1.97 MB	0 B
`./bundle/chunk-5AUYMPVF.js`	858 B	0 B
`./bundle/chunk-5PS3AYFU.js`	1.18 kB	0 B
`./bundle/chunk-664ZODQF.js`	124 kB	0 B
`./bundle/chunk-DAHVX5MI.js`	206 kB	0 B
`./bundle/chunk-IUUIT4SU.js`	56.5 kB	0 B
`./bundle/chunk-RJTRUG2J.js`	39.8 kB	0 B
`./bundle/cleanup-SIRDETV3.js`	0 B	-932 B (removed)	🏆
`./bundle/devtools-36NN55EP.js`	696 kB	0 B
`./bundle/dist-T73EYRDX.js`	356 B	0 B
`./bundle/events-XB7DADIJ.js`	418 B	0 B
`./bundle/examples/hooks/scripts/on-start.js`	188 B	0 B
`./bundle/examples/mcp-server/example.js`	1.43 kB	0 B
`./bundle/gemini.js`	5.14 kB	0 B
`./bundle/getMachineId-bsd-TXG52NKR.js`	1.55 kB	0 B
`./bundle/getMachineId-darwin-7OE4DDZ6.js`	1.55 kB	0 B
`./bundle/getMachineId-linux-SHIFKOOX.js`	1.34 kB	0 B
`./bundle/getMachineId-unsupported-5U5DOEYY.js`	1.06 kB	0 B
`./bundle/getMachineId-win-6KLLGOI4.js`	1.72 kB	0 B
`./bundle/memoryDiscovery-LIJKMASE.js`	980 B	0 B
`./bundle/multipart-parser-KPBZEGQU.js`	11.7 kB	0 B
`./bundle/node_modules/@google/gemini-cli-devtools/dist/client/main.js`	222 kB	0 B
`./bundle/node_modules/@google/gemini-cli-devtools/dist/src/_client-assets.js`	229 kB	0 B
`./bundle/node_modules/@google/gemini-cli-devtools/dist/src/index.js`	13.4 kB	0 B
`./bundle/node_modules/@google/gemini-cli-devtools/dist/src/types.js`	132 B	0 B
`./bundle/sandbox-macos-permissive-open.sb`	890 B	0 B
`./bundle/sandbox-macos-permissive-proxied.sb`	1.31 kB	0 B
`./bundle/sandbox-macos-restrictive-open.sb`	3.36 kB	0 B
`./bundle/sandbox-macos-restrictive-proxied.sb`	3.56 kB	0 B
`./bundle/sandbox-macos-strict-open.sb`	4.82 kB	0 B
`./bundle/sandbox-macos-strict-proxied.sb`	5.02 kB	0 B
`./bundle/src-QVCVGIUX.js`	47 kB	0 B
`./bundle/start-QADGKLJO.js`	0 B	-652 B (removed)	🏆
`./bundle/tree-sitter-7U6MW5PS.js`	274 kB	0 B
`./bundle/tree-sitter-bash-34ZGLXVX.js`	1.84 MB	0 B
`./bundle/cleanup-RQWTYYB7.js`	932 B	+932 B (new file)	🆕
`./bundle/start-3IMGVZEF.js`	652 B	+652 B (new file)	🆕

_{compressed-size-action}

gemini-code-assist

Code Review

This pull request introduces new behavioral evaluations for file creation and git repository management. It also updates the system prompts to explicitly discourage the use of broad git staging commands like git add . or git add -A, favoring specific file staging instead. Feedback on the new tests suggests parsing JSON tool arguments rather than using string inclusion checks to ensure the assertions are more robust and less prone to failure.

_{Note: Security Review is unavailable for this PR.}

gemini-code-assist · 2026-04-30T21:23:54Z

+      const targetReadFileIndex = logs.findIndex(
+        (log) =>
+          log.toolRequest?.name === 'read_file' &&
+          log.toolRequest.args.includes('config.json'),
+      );


Using String.prototype.includes() on a JSON string to check tool arguments is brittle. It's better to parse the JSON and inspect the properties directly. This makes the test more robust against formatting changes and less prone to being flaky, which is critical for maintaining a reliable CI pipeline.

Suggested change

const targetReadFileIndex = logs.findIndex(

(log) =>

log.toolRequest?.name === 'read_file' &&

log.toolRequest.args.includes('config.json'),

);

const targetReadFileIndex = logs.findIndex((log) => {

try {

return log.toolRequest?.name === 'read_file' && JSON.parse(log.toolRequest.args).path === 'config.json';

} catch { return false; }

});

References

The toolRequest.args property is a JSON string and must be parsed using JSON.parse() before its properties can be accessed.

Done. Updated to use JSON parsing instead of includes() string checks.

gemini-code-assist · 2026-04-30T21:23:54Z

+      const targetWriteFileIndex = logs.findIndex(
+        (log) =>
+          log.toolRequest?.name === 'write_file' &&
+          log.toolRequest.args.includes('config.json'),
+      );


Similar to the read_file check, using String.prototype.includes() here is brittle. Parsing the JSON arguments for write_file will make this assertion more robust and reliable.

Suggested change

const targetWriteFileIndex = logs.findIndex(

(log) =>

log.toolRequest?.name === 'write_file' &&

log.toolRequest.args.includes('config.json'),

);

const targetWriteFileIndex = logs.findIndex((log) => {

try {

return log.toolRequest?.name === 'write_file' && JSON.parse(log.toolRequest.args).path === 'config.json';

} catch { return false; }

});

References

The toolRequest.args property is a JSON string and must be parsed using JSON.parse() before its properties can be accessed.

Done. Same change applied here to parse JSON arguments safely.

…l selection (google-gemini#26292)

akh64bit added 2 commits April 29, 2026 23:27

fix(core): discourage unprompted git add . in prompt snippets

e0bfc35

Update instructions to prompt the agent to stage only task-specific changed files to prevent unprompted staging of untracked files. Fixes #24628

test(evals): add behavioral eval for file creation and write_file too…

d3266d6

…l selection

akh64bit requested review from a team as code owners April 30, 2026 21:07

github-actions Bot closed this Apr 30, 2026

akh64bit had a problem deploying to eval-gate April 30, 2026 21:09 — with GitHub Actions Error

akh64bit reopened this Apr 30, 2026

akh64bit had a problem deploying to eval-gate April 30, 2026 21:13 — with GitHub Actions Error

gemini-code-assist Bot reviewed Apr 30, 2026

View reviewed changes

gemini-cli Bot added the area/agent Issues related to Core Agent, Tools, Memory, Sub-Agents, Hooks, Agent Quality label Apr 30, 2026

test(evals): parse JSON arguments instead of using string inclusion

154e5d8

akh64bit had a problem deploying to eval-gate April 30, 2026 21:53 — with GitHub Actions Error

joshualitt approved these changes Apr 30, 2026

View reviewed changes

This was referenced May 1, 2026

📊 AI CLI 工具社区动态日报 2026-05-01 gsscsd/big_model_radar#275

Open

📊 AI CLI 工具社区动态日报 2026-05-01 borq168/big_model_radar#91

Open

akh64bit enabled auto-merge May 1, 2026 03:02

test(evals): fix argument parameter name filtering

e52a629

akh64bit requested a deployment to eval-gate May 1, 2026 03:30 — with GitHub Actions Waiting

akh64bit added this pull request to the merge queue May 1, 2026

Merged via the queue into main with commit b3e6c28 May 1, 2026
28 of 29 checks passed

akh64bit deleted the feature/file-creation-behavior-eval branch May 1, 2026 03:58

TirthNaik-99 pushed a commit to TirthNaik-99/gemini-cli that referenced this pull request May 4, 2026

test(evals): add behavioral eval for file creation and write_file too…

f74b9ca

…l selection (google-gemini#26292)

kimjune01 pushed a commit to kimjune01/gemini-cli-claude that referenced this pull request May 6, 2026

test(evals): add behavioral eval for file creation and write_file too…

6cc82e2

…l selection (google-gemini#26292)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test(evals): add behavioral eval for file creation and write_file tool selection#26292

test(evals): add behavioral eval for file creation and write_file tool selection#26292
akh64bit merged 4 commits intomainfrom
feature/file-creation-behavior-eval

akh64bit commented Apr 30, 2026

Uh oh!

github-actions Bot commented Apr 30, 2026

Uh oh!

github-actions Bot commented Apr 30, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot commented Apr 30, 2026

Uh oh!

github-actions Bot commented Apr 30, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Apr 30, 2026

Uh oh!

akh64bit Apr 30, 2026

Uh oh!

gemini-code-assist Bot Apr 30, 2026

Uh oh!

akh64bit Apr 30, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

akh64bit commented Apr 30, 2026

Uh oh!

github-actions Bot commented Apr 30, 2026

Uh oh!

github-actions Bot commented Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🛑 Action Required: Evaluation Approval

Uh oh!

gemini-code-assist Bot commented Apr 30, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

github-actions Bot commented Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

akh64bit Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

akh64bit Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

github-actions Bot commented Apr 30, 2026 •

edited

Loading

github-actions Bot commented Apr 30, 2026 •

edited

Loading