fix: prevent UTF-8 panics on multi-byte characters#93
Merged
pszymkowiak merged 4 commits intortk-ai:masterfrom Feb 13, 2026
Merged
fix: prevent UTF-8 panics on multi-byte characters#93pszymkowiak merged 4 commits intortk-ai:masterfrom
pszymkowiak merged 4 commits intortk-ai:masterfrom
Conversation
Collaborator
|
review the commit there is a conflict. resolve to review the PR. |
7c5af55 to
74f40e7
Compare
pszymkowiak
reviewed
Feb 12, 2026
src/utils.rs
Outdated
| /// let s = "hello"; | ||
| /// assert_eq!(safe_char_boundary(s, 3), 3); | ||
| /// ``` | ||
| pub fn safe_char_boundary(s: &str, byte_idx: usize) -> usize { |
Collaborator
There was a problem hiding this comment.
is it use somewhere ?
pszymkowiak
reviewed
Feb 12, 2026
src/wget_cmd.rs
Outdated
| @@ -199,11 +199,9 @@ fn compact_url(url: &str) -> String { | |||
| if without_proto.len() <= 50 { | |||
Collaborator
There was a problem hiding this comment.
let char_count = without_proto.chars().count();
if char_count <= 50 {
without_proto.to_string()
} else {
let prefix: String = without_proto.chars().take(25).collect();
let suffix: String = without_proto.chars().skip(char_count - 20).collect();
format!("{}...{}", prefix, suffix)
}
chars , chars
?
Replace all byte-indexed string slicing (`&s[..n]`) with char-aware operations across 10 locations in 7 files. Rust strings are UTF-8, so byte slicing can land mid-character on Thai (3 bytes), emoji (4 bytes), or CJK text and panic at runtime. Fixes applied: - git.rs: filter_log_output, format_status_output - log_cmd.rs: error/warning message truncation (2 locations) - env_cmd.rs: long value display, mask_value - parser/mod.rs: truncate_output - grep_cmd.rs: clean_line (pattern context + fallback truncation) - wget_cmd.rs: compact_url, truncate_line, parse_error Also adds safe_char_boundary() utility and 20 regression tests covering Thai, emoji, and CJK input across all affected modules. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Address review feedback: collect chars into Vec once instead of calling .chars() multiple times. Also fixes byte-vs-char length check. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
5c3cfdb to
f25a2e5
Compare
Collaborator
|
waiting for passing function in chars and not a mix string chars. |
Address review: function was defined but never called anywhere. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Contributor
Author
|
Just to be clear you want me to stop mixing string slicing with chars and just do everything through .chars() instead, right? |
Collaborator
|
yes go full .chars() . |
Address review: use consistent .chars() everywhere instead of mixing byte-based is_char_boundary with char-based operations. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
FlorianBruniaux
added a commit
to FlorianBruniaux/rtk
that referenced
this pull request
Feb 14, 2026
Resolved conflicts: - Version bumped to 0.15.4 (Cargo.toml, Cargo.lock, .release-please-manifest.json) - CHANGELOG.md: Added upstream releases (0.15.4, 0.15.3, 0.15.2) - Hooks: Adopted POSIX character classes ([[:space:]]) from upstream - src/parser/mod.rs: Added multibyte UTF-8 tests from upstream - src/ruff_cmd.rs: Kept functions public for lint/format dispatcher feature Upstream changes integrated: - rtk-ai#120: git status fix for non-repo folders - rtk-ai#93: UTF-8 panic prevention on multibyte chars - rtk-ai#98: POSIX grep compatibility in hooks - rtk-ai#95, rtk-ai#92: CI reliability and hook coverage improvements Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
5 tasks
ahundt
pushed a commit
to ahundt/rtk
that referenced
this pull request
Feb 23, 2026
* fix: prevent UTF-8 panics on multi-byte characters (Thai, emoji, CJK) Replace all byte-indexed string slicing (`&s[..n]`) with char-aware operations across 10 locations in 7 files. Rust strings are UTF-8, so byte slicing can land mid-character on Thai (3 bytes), emoji (4 bytes), or CJK text and panic at runtime. Fixes applied: - git.rs: filter_log_output, format_status_output - log_cmd.rs: error/warning message truncation (2 locations) - env_cmd.rs: long value display, mask_value - parser/mod.rs: truncate_output - grep_cmd.rs: clean_line (pattern context + fallback truncation) - wget_cmd.rs: compact_url, truncate_line, parse_error Also adds safe_char_boundary() utility and 20 regression tests covering Thai, emoji, and CJK input across all affected modules. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: collect chars once in compact_url to avoid redundant iteration Address review feedback: collect chars into Vec once instead of calling .chars() multiple times. Also fixes byte-vs-char length check. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: remove unused safe_char_boundary function and tests Address review: function was defined but never called anywhere. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: convert remaining is_char_boundary to full .chars() approach Address review: use consistent .chars() everywhere instead of mixing byte-based is_char_boundary with char-based operations. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
&s[..n]) that panic on multi-byte UTF-8 characters (Thai, emoji, CJK)chars().take(),.get(),is_char_boundary()safe_char_boundary()utility inutils.rsProblem
RTK crashes on non-ASCII content because Rust strings are UTF-8 encoded. Thai characters are 3 bytes, emoji are 4 bytes.
&line[..77]can land mid-character and panic withbyte index is not a char boundary.Files Changed
src/git.rsfilter_log_output:chars().take(77)instead of&line[..77];format_status_output:.get()instead of&line[0..2]src/log_cmd.rschars().take(97)instead of&original[..97](2 locations)src/env_cmd.rschars().take(50);mask_value: char-based prefix/suffixsrc/parser/mod.rstruncate_output:is_char_boundary()loop instead of&output[..max_chars]src/grep_cmd.rsclean_line: char-boundary snapping for pattern context;chars().take()for fallbacksrc/wget_cmd.rscompact_url,truncate_line,parse_error: all switched tochars().take()src/utils.rssafe_char_boundary()helper + multi-byte truncation testsTest plan
cargo test— 284 tests pass (20 new UTF-8 regression tests)cargo clippy— no new warnings (all pre-existing)cargo build --release— builds successfullyrtk git status— should not panic🤖 Generated with Claude Code