Skip to content

fix: prevent UTF-8 panics on multi-byte characters#93

Merged
pszymkowiak merged 4 commits intortk-ai:masterfrom
polaminggkub-debug:fix/utf8-panics
Feb 13, 2026
Merged

fix: prevent UTF-8 panics on multi-byte characters#93
pszymkowiak merged 4 commits intortk-ai:masterfrom
polaminggkub-debug:fix/utf8-panics

Conversation

@polaminggkub-debug
Copy link
Contributor

Summary

  • Fix 10 byte-indexed string slicing locations (&s[..n]) that panic on multi-byte UTF-8 characters (Thai, emoji, CJK)
  • Replace with char-aware operations: chars().take(), .get(), is_char_boundary()
  • Add safe_char_boundary() utility in utils.rs
  • Add 20 regression tests covering Thai, emoji, and CJK input

Problem

RTK crashes on non-ASCII content because Rust strings are UTF-8 encoded. Thai characters are 3 bytes, emoji are 4 bytes. &line[..77] can land mid-character and panic with byte index is not a char boundary.

Files Changed

File Fix
src/git.rs filter_log_output: chars().take(77) instead of &line[..77]; format_status_output: .get() instead of &line[0..2]
src/log_cmd.rs Error/warning truncation: chars().take(97) instead of &original[..97] (2 locations)
src/env_cmd.rs Long value display: chars().take(50); mask_value: char-based prefix/suffix
src/parser/mod.rs truncate_output: is_char_boundary() loop instead of &output[..max_chars]
src/grep_cmd.rs clean_line: char-boundary snapping for pattern context; chars().take() for fallback
src/wget_cmd.rs compact_url, truncate_line, parse_error: all switched to chars().take()
src/utils.rs New safe_char_boundary() helper + multi-byte truncation tests

Test plan

  • cargo test — 284 tests pass (20 new UTF-8 regression tests)
  • cargo clippy — no new warnings (all pre-existing)
  • cargo build --release — builds successfully
  • Manual: create file with Thai name, run rtk git status — should not panic

🤖 Generated with Claude Code

@pszymkowiak
Copy link
Collaborator

review the commit there is a conflict. resolve to review the PR.

src/utils.rs Outdated
/// let s = "hello";
/// assert_eq!(safe_char_boundary(s, 3), 3);
/// ```
pub fn safe_char_boundary(s: &str, byte_idx: usize) -> usize {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it use somewhere ?

src/wget_cmd.rs Outdated
@@ -199,11 +199,9 @@ fn compact_url(url: &str) -> String {
if without_proto.len() <= 50 {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  let char_count = without_proto.chars().count();
  if char_count <= 50 {
      without_proto.to_string()
  } else {
      let prefix: String = without_proto.chars().take(25).collect();
      let suffix: String = without_proto.chars().skip(char_count - 20).collect();
      format!("{}...{}", prefix, suffix)
  }

chars , chars
?

polaminggkub-debug and others added 2 commits February 13, 2026 18:46
Replace all byte-indexed string slicing (`&s[..n]`) with char-aware
operations across 10 locations in 7 files. Rust strings are UTF-8, so
byte slicing can land mid-character on Thai (3 bytes), emoji (4 bytes),
or CJK text and panic at runtime.

Fixes applied:
- git.rs: filter_log_output, format_status_output
- log_cmd.rs: error/warning message truncation (2 locations)
- env_cmd.rs: long value display, mask_value
- parser/mod.rs: truncate_output
- grep_cmd.rs: clean_line (pattern context + fallback truncation)
- wget_cmd.rs: compact_url, truncate_line, parse_error

Also adds safe_char_boundary() utility and 20 regression tests covering
Thai, emoji, and CJK input across all affected modules.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Address review feedback: collect chars into Vec once instead of
calling .chars() multiple times. Also fixes byte-vs-char length check.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@pszymkowiak
Copy link
Collaborator

waiting for passing function in chars and not a mix string chars.

Address review: function was defined but never called anywhere.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@polaminggkub-debug
Copy link
Contributor Author

Just to be clear you want me to stop mixing string slicing with chars and just do everything through .chars() instead, right?

@pszymkowiak
Copy link
Collaborator

yes go full .chars() .

Address review: use consistent .chars() everywhere instead of
mixing byte-based is_char_boundary with char-based operations.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@pszymkowiak pszymkowiak merged commit 155e264 into rtk-ai:master Feb 13, 2026
2 checks passed
FlorianBruniaux added a commit to FlorianBruniaux/rtk that referenced this pull request Feb 14, 2026
Resolved conflicts:
- Version bumped to 0.15.4 (Cargo.toml, Cargo.lock, .release-please-manifest.json)
- CHANGELOG.md: Added upstream releases (0.15.4, 0.15.3, 0.15.2)
- Hooks: Adopted POSIX character classes ([[:space:]]) from upstream
- src/parser/mod.rs: Added multibyte UTF-8 tests from upstream
- src/ruff_cmd.rs: Kept functions public for lint/format dispatcher feature

Upstream changes integrated:
- rtk-ai#120: git status fix for non-repo folders
- rtk-ai#93: UTF-8 panic prevention on multibyte chars
- rtk-ai#98: POSIX grep compatibility in hooks
- rtk-ai#95, rtk-ai#92: CI reliability and hook coverage improvements

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
ahundt pushed a commit to ahundt/rtk that referenced this pull request Feb 23, 2026
* fix: prevent UTF-8 panics on multi-byte characters (Thai, emoji, CJK)

Replace all byte-indexed string slicing (`&s[..n]`) with char-aware
operations across 10 locations in 7 files. Rust strings are UTF-8, so
byte slicing can land mid-character on Thai (3 bytes), emoji (4 bytes),
or CJK text and panic at runtime.

Fixes applied:
- git.rs: filter_log_output, format_status_output
- log_cmd.rs: error/warning message truncation (2 locations)
- env_cmd.rs: long value display, mask_value
- parser/mod.rs: truncate_output
- grep_cmd.rs: clean_line (pattern context + fallback truncation)
- wget_cmd.rs: compact_url, truncate_line, parse_error

Also adds safe_char_boundary() utility and 20 regression tests covering
Thai, emoji, and CJK input across all affected modules.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: collect chars once in compact_url to avoid redundant iteration

Address review feedback: collect chars into Vec once instead of
calling .chars() multiple times. Also fixes byte-vs-char length check.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: remove unused safe_char_boundary function and tests

Address review: function was defined but never called anywhere.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: convert remaining is_char_boundary to full .chars() approach

Address review: use consistent .chars() everywhere instead of
mixing byte-based is_char_boundary with char-based operations.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants