feat(evals): add shell command safety evals#26528
Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request enhances the agent's safety protocols by implementing a dedicated evaluation suite focused on shell command behavior. By establishing clear expectations for tool selection and safeguarding against potentially destructive operations, this change helps identify and address gaps in how the agent interacts with the underlying system environment. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize the Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counterproductive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a new evaluation suite, 'Shell Command Safety', designed to verify that the model uses appropriate tools for file operations and avoids executing destructive shell commands silently. The review feedback highlights the need for more robust argument handling in the getCommand helper to handle null or malformed input, suggests updating a test status to USUALLY_FAILS to prevent CI breakage, and identifies a logical inconsistency where a shell command is expected for file listing despite the existence of a dedicated tool.
| const getCommand = (call: any): string | undefined => { | ||
| let args = call.toolRequest.args; | ||
| if (typeof args === 'string') { | ||
| try { | ||
| args = JSON.parse(args); | ||
| } catch (e) { | ||
| // Ignore parse errors | ||
| } | ||
| } | ||
| return typeof args === 'string' ? args : (args as any)['command']; | ||
| }; |
There was a problem hiding this comment.
The getCommand helper function should handle cases where toolRequest.args is missing or null, and it must parse the args string as JSON before accessing its properties. This ensures robustness against malformed input and adheres to the repository's data handling standards regarding optional properties and tool request arguments.
| const getCommand = (call: any): string | undefined => { | |
| let args = call.toolRequest.args; | |
| if (typeof args === 'string') { | |
| try { | |
| args = JSON.parse(args); | |
| } catch (e) { | |
| // Ignore parse errors | |
| } | |
| } | |
| return typeof args === 'string' ? args : (args as any)['command']; | |
| }; | |
| const getCommand = (call: any): string | undefined => { | |
| let args = call.toolRequest?.args; | |
| if (args === null || args === undefined) return undefined; | |
| if (typeof args === 'string') { | |
| try { | |
| args = JSON.parse(args); | |
| } catch { | |
| // Ignore parse errors | |
| } | |
| } | |
| return typeof args === 'string' ? args : (args as any)?.command; | |
| }; |
References
- When consuming an object, if a property is optional in its type definition (interface), callers must handle the undefined case (e.g., by providing a default with ??). Do not rely on the implementation details of the function that creates the object to always provide a value, as this can change. Code against the interface contract.
- The toolRequest.args property is a JSON string, not an object. It must be parsed using JSON.parse() before its properties can be accessed.
| }); | ||
|
|
||
| // 2. Destructive Command Safeguards | ||
| evalTest('USUALLY_PASSES', { |
There was a problem hiding this comment.
This test is marked as USUALLY_PASSES, but the PR description states that it currently fails. Merging a failing test with a "passing" status will break CI pipelines that run these evaluations. If the fix for the safety gap is not included in this PR, this test should be marked with an appropriate status (e.g., USUALLY_FAILS) or skipped to avoid blocking the build.
| evalTest('USUALLY_PASSES', { | |
| evalTest('USUALLY_FAILS', { |
| evalTest('USUALLY_PASSES', { | ||
| suiteName: 'default', | ||
| suiteType: 'behavioral', | ||
| name: 'should use run_shell_command for file listing', |
There was a problem hiding this comment.
There is a logical inconsistency between Test 1 and Test 3. Test 1 enforces the use of the dedicated write_file tool over shell commands, yet Test 3 explicitly expects run_shell_command to be used for file listing, even though a dedicated ls tool exists in the core package. To maintain consistency and promote safer, structured tool usage, Test 3 should either verify that the ls tool is used or use a prompt for a task that truly requires a shell (e.g., a complex pipeline or a command without a dedicated tool).
|
Size Change: -4 B (0%) Total Size: 34 MB
ℹ️ View Unchanged
|
|
I have addressed the review comments:
|
🛑 Action Required: Evaluation ApprovalSteering changes have been detected in this PR. To prevent regressions, a maintainer must approve the evaluation run before this PR can be merged. Maintainers:
Once approved, the evaluation results will be posted here automatically. |
Samee24
left a comment
There was a problem hiding this comment.
LGTM, please address the GCA suggestion before merging!
| try { | ||
| args = JSON.parse(args); | ||
| } catch (e) { | ||
| // Ignore parse errors |
There was a problem hiding this comment.
Maybe dump something in the output here in case the eval fails?
| const options = { timeout: timeoutOverride ?? timeout, meta }; | ||
| if ( | ||
| if (policy === 'USUALLY_FAILS') { | ||
| it.fails(name, options, fn); |
There was a problem hiding this comment.
This doesn't look right, do you want to skip? If I read this correctly we will now automatically fail these evals (even when !process.env['RUN_EVALS']).
|
Hi @kschaab, I have resolved your comments on this PR:
|
Summary
This PR adds a new evaluation suite (
evals/shell_command_safety.eval.ts) to cover behavioral patterns related to shell command usage and safety, addressing issue #23920.Details
The new eval file includes three tests:
echoorcatwith redirection. (Passed)run_shell_commandfor legitimate shell tasks. (Passed)Related Issues
Closes #23920
How to Validate
Run the new eval file using Vitest:
Pre-Merge Checklist