fix(cli): use byte length instead of string length for readStdin size limits#26224
Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request addresses a critical issue in the CLI's stdin processing where size limits were incorrectly calculated using string length instead of byte length. By implementing a robust truncation utility and switching to byte-based tracking, the changes prevent potential data corruption when handling multi-byte characters like emojis or CJK scripts, ensuring the 8MB limit is accurately and safely enforced. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
|
Size Change: +341 B (0%) Total Size: 33.9 MB
ℹ️ View Unchanged
|
There was a problem hiding this comment.
Code Review
This pull request improves the handling of standard input truncation by switching from character-based length to byte-based length. It introduces a truncateUtf8Bytes utility to ensure that multi-byte UTF-8 characters are not split at the truncation boundary, preventing data corruption or invalid characters. New unit tests verify the correct truncation of multi-byte characters and the enforcement of the 8MB byte limit. I have no feedback to provide.
Summary
Fixes the size limitation logic in
readStdin()to use byte length instead of string length. This ensures that the 8MB limit is accurately enforced for multi-byte characters (e.g., CJK characters, emojis) and prevents data corruption caused by splitting multi-byte characters at the truncation boundary.Details
chunk.lengthwithBuffer.byteLength(chunk, 'utf8')for accurate byte tracking.truncateUtf8Bytesutility to safely truncate UTF-8 strings at a byte limit without splitting multi-byte characters (continuation bytes).readStdinLines.ts(from PR feat(cli): allow -i/--prompt-interactive with piped stdin #23414).Related Issues
Fixes #23417
How to Validate
readStdin:npm test -w @google/gemini-cli -- src/utils/readStdin.test.ts\uFFFD) are present in the output.Pre-Merge Checklist