str::replace has a fast path for replacing a single ASCII byte with another single ASCII byte. To decide whether that path applies, it calls the safe Pattern::as_utf8_pattern() method and accepts Utf8Pattern::StringPattern([from_byte]) as a one-byte pattern. Despite the name, Utf8Pattern::StringPattern stores &[u8], so a custom safe Pattern implementation can construct it with non-UTF-8 bytes. str::replace then treats that byte as ASCII, performs a raw byte substitution through replace_ascii, and returns a String whose contents are no longer valid UTF-8.
This breaks String’s UTF-8 invariant through safe API calls. Subsequent safe operations that rely on the invariant can reach unsafe standard-library internals with invalid data; the PoC below triggers Rust’s optional UB precondition check in char::from_u32_unchecked via String::chars().next().
Two possible fixes are to make the invariant explicit and unsafe by making Pattern an unsafe trait, or to change Utf8Pattern::StringPattern to carry &str instead of &[u8] so custom safe implementations cannot expose arbitrary non-UTF-8 bytes through this fast-path metadata.
Security impact
Expected to be low.
This requires a nightly-only custom Pattern implementation to actively misbehave while still using safe Rust: it must return a non-ASCII one-byte Utf8Pattern::StringPattern, the haystack must contain that byte inside a multibyte UTF-8 code point, and the replacement must be one byte so str::replace takes the ASCII fast path. This is not reachable through stable Rust or the built-in pattern types.
Data flow trace
str::replace → Pattern::as_utf8_pattern → replace_ascii → String::from_utf8_unchecked → Chars::next → char::from_u32_unchecked
-
Pattern::as_utf8_pattern: safe trait method returning Utf8Pattern; StringPattern is documented as bytes returned by string-like patterns, but the method itself is not unsafe:
|
pub trait Pattern: Sized { |
|
/// Associated searcher for this pattern |
|
type Searcher<'a>: Searcher<'a>; |
|
|
|
/// Constructs the associated searcher from |
|
/// `self` and the `haystack` to search in. |
|
fn into_searcher(self, haystack: &str) -> Self::Searcher<'_>; |
|
|
|
/// Checks whether the pattern matches anywhere in the haystack |
|
#[inline] |
|
fn is_contained_in(self, haystack: &str) -> bool { |
|
self.into_searcher(haystack).next_match().is_some() |
|
} |
|
|
|
/// Checks whether the pattern matches at the front of the haystack |
|
#[inline] |
|
fn is_prefix_of(self, haystack: &str) -> bool { |
|
matches!(self.into_searcher(haystack).next(), SearchStep::Match(0, _)) |
|
} |
|
|
|
/// Checks whether the pattern matches at the back of the haystack |
|
#[inline] |
|
fn is_suffix_of<'a>(self, haystack: &'a str) -> bool |
|
where |
|
Self::Searcher<'a>: ReverseSearcher<'a>, |
|
{ |
|
matches!(self.into_searcher(haystack).next_back(), SearchStep::Match(_, j) if haystack.len() == j) |
|
} |
|
|
|
/// Removes the pattern from the front of haystack, if it matches. |
|
#[inline] |
|
fn strip_prefix_of(self, haystack: &str) -> Option<&str> { |
|
if let SearchStep::Match(start, len) = self.into_searcher(haystack).next() { |
|
debug_assert_eq!( |
|
start, 0, |
|
"The first search step from Searcher \ |
|
must include the first character" |
|
); |
|
// SAFETY: `Searcher` is known to return valid indices. |
|
unsafe { Some(haystack.get_unchecked(len..)) } |
|
} else { |
|
None |
|
} |
|
} |
|
|
|
/// Removes the pattern from the back of haystack, if it matches. |
|
#[inline] |
|
fn strip_suffix_of<'a>(self, haystack: &'a str) -> Option<&'a str> |
|
where |
|
Self::Searcher<'a>: ReverseSearcher<'a>, |
|
{ |
|
if let SearchStep::Match(start, end) = self.into_searcher(haystack).next_back() { |
|
debug_assert_eq!( |
|
end, |
|
haystack.len(), |
|
"The first search step from ReverseSearcher \ |
|
must include the last character" |
|
); |
|
// SAFETY: `Searcher` is known to return valid indices. |
|
unsafe { Some(haystack.get_unchecked(..start)) } |
|
} else { |
|
None |
|
} |
|
} |
|
|
|
/// Returns the pattern as utf-8 bytes if possible. |
|
fn as_utf8_pattern(&self) -> Option<Utf8Pattern<'_>> { |
|
None |
|
} |
|
} |
|
/// Result of calling [`Pattern::as_utf8_pattern()`]. |
|
/// Can be used for inspecting the contents of a [`Pattern`] in cases |
|
/// where the underlying representation can be represented as UTF-8. |
|
#[derive(Copy, Clone, Eq, PartialEq, Debug)] |
|
pub enum Utf8Pattern<'a> { |
|
/// Type returned by String and str types. |
|
StringPattern(&'a [u8]), |
|
/// Type returned by char types. |
|
CharPattern(char), |
|
} |
-
str::replace: accepts Utf8Pattern::StringPattern([from_byte]) without validating ASCII, while the CharPattern branch does validate through c.as_ascii():
|
pub fn replace<P: Pattern>(&self, from: P, to: &str) -> String { |
|
// Fast path for replacing a single ASCII character with another. |
|
if let Some(from_byte) = match from.as_utf8_pattern() { |
|
Some(Utf8Pattern::StringPattern([from_byte])) => Some(*from_byte), |
|
Some(Utf8Pattern::CharPattern(c)) => c.as_ascii().map(|ascii_char| ascii_char.to_u8()), |
|
_ => None, |
|
} { |
|
if let [to_byte] = to.as_bytes() { |
|
return unsafe { replace_ascii(self.as_bytes(), from_byte, *to_byte) }; |
|
} |
|
} |
-
replace_ascii: rewrites raw bytes and reconstructs a String with String::from_utf8_unchecked; its safety comment assumes ASCII-to-ASCII replacement:
|
/// Faster implementation of string replacement for ASCII to ASCII cases. |
|
/// Should produce fast vectorized code. |
|
unsafe fn replace_ascii(utf8_bytes: &[u8], from: u8, to: u8) -> String { |
|
let result: Vec<u8> = utf8_bytes.iter().map(|b| if *b == from { to } else { *b }).collect(); |
|
// SAFETY: We replaced ascii with ascii on valid utf8 strings. |
|
unsafe { String::from_utf8_unchecked(result) } |
|
} |
Demonstration
The PoC uses only safe Rust code. It defines a custom Pattern whose as_utf8_pattern() exposes the non-ASCII byte 0x80, then calls replace on "\u{d000}" (ed 80 80) with a one-byte replacement.
Run with:
cargo +nightly run --features unstable-pattern --bin str_replace_non_ascii_pattern_poc
// cargo +nightly run --features unstable-pattern --bin str_replace_non_ascii_pattern_poc
#![feature(pattern)]
#![deny(unsafe_code)]
use std::str::pattern::{Pattern, Utf8Pattern};
const HAYSTACK: &str = "\u{d000}";
const FROM_BYTE: u8 = 0x80;
const TO: &str = "?";
fn main() {
let corrupted = HAYSTACK.replace(NonAsciiBytePattern, TO);
let bytes = corrupted.as_bytes();
println!("haystack bytes: {:02x?}", HAYSTACK.as_bytes());
println!("pattern exposed by as_utf8_pattern(): [{FROM_BYTE:#04x}]");
println!("replacement bytes: {:02x?}", TO.as_bytes());
println!("replace() result bytes: {bytes:02x?}");
assert!(std::str::from_utf8(bytes).is_err());
corrupted.chars().next();
}
struct NonAsciiBytePattern;
impl Pattern for NonAsciiBytePattern {
type Searcher<'a> = <&'static str as Pattern>::Searcher<'a>;
fn into_searcher(self, haystack: &str) -> Self::Searcher<'_> {
<&'static str as Pattern>::into_searcher("", haystack)
}
fn as_utf8_pattern(&self) -> Option<Utf8Pattern<'_>> {
static NEEDLE: [u8; 1] = [FROM_BYTE];
Some(Utf8Pattern::StringPattern(&NEEDLE))
}
}
Output
haystack bytes: [ed, 80, 80]
pattern exposed by as_utf8_pattern(): [0x80]
replacement bytes: [3f]
replace() result bytes: [ed, 3f, 3f]
thread 'main' (1058134) panicked at /rustc/4b0c9d76ae7d387229caea55cfa73c280b08b8a7/library/core/src/char/methods.rs:239:18:
unsafe precondition(s) violated: invalid value for `char`
This indicates a bug in the program. This Undefined Behavior check is optional, and cannot be relied on for safety.
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
thread caused non-unwinding panic. aborting.
Aborted (core dumped)
Environment
$ uname -a
Linux Abraxas 6.6.114.1-microsoft-standard-WSL2 #1 SMP PREEMPT_DYNAMIC Mon Dec 1 20:46:23 UTC 2025 x86_64 GNU/Linux
$ cat /etc/os-release
PRETTY_NAME="Ubuntu 26.04 LTS"
VERSION_ID="26.04"
VERSION="26.04 (Resolute Raccoon)"
$ rustc +nightly --version --verbose
rustc 1.97.0-nightly (4b0c9d76a 2026-05-10)
binary: rustc
commit-hash: 4b0c9d76ae7d387229caea55cfa73c280b08b8a7
commit-date: 2026-05-10
host: x86_64-unknown-linux-gnu
release: 1.97.0-nightly
LLVM version: 22.1.4
The initial discovery was made by AI. The analysis, exploitation, and this report have been reviewed and verified by human experts.
Reporting on behalf of Autonomous Code Security (ACS) team at Microsoft.
str::replacehas a fast path for replacing a single ASCII byte with another single ASCII byte. To decide whether that path applies, it calls the safePattern::as_utf8_pattern()method and acceptsUtf8Pattern::StringPattern([from_byte])as a one-byte pattern. Despite the name,Utf8Pattern::StringPatternstores&[u8], so a custom safePatternimplementation can construct it with non-UTF-8 bytes.str::replacethen treats that byte as ASCII, performs a raw byte substitution throughreplace_ascii, and returns aStringwhose contents are no longer valid UTF-8.This breaks
String’s UTF-8 invariant through safe API calls. Subsequent safe operations that rely on the invariant can reach unsafe standard-library internals with invalid data; the PoC below triggers Rust’s optional UB precondition check inchar::from_u32_uncheckedviaString::chars().next().Two possible fixes are to make the invariant explicit and unsafe by making
Patternanunsafe trait, or to changeUtf8Pattern::StringPatternto carry&strinstead of&[u8]so custom safe implementations cannot expose arbitrary non-UTF-8 bytes through this fast-path metadata.Security impact
Expected to be low.
This requires a nightly-only custom
Patternimplementation to actively misbehave while still using safe Rust: it must return a non-ASCII one-byteUtf8Pattern::StringPattern, the haystack must contain that byte inside a multibyte UTF-8 code point, and the replacement must be one byte sostr::replacetakes the ASCII fast path. This is not reachable through stable Rust or the built-in pattern types.Data flow trace
str::replace→Pattern::as_utf8_pattern→replace_ascii→String::from_utf8_unchecked→Chars::next→char::from_u32_uncheckedPattern::as_utf8_pattern: safe trait method returningUtf8Pattern;StringPatternis documented as bytes returned by string-like patterns, but the method itself is not unsafe:rust/library/core/src/str/pattern.rs
Lines 99 to 178 in 4b0c9d7
str::replace: acceptsUtf8Pattern::StringPattern([from_byte])without validating ASCII, while theCharPatternbranch does validate throughc.as_ascii():rust/library/alloc/src/str.rs
Lines 308 to 318 in 4b0c9d7
replace_ascii: rewrites raw bytes and reconstructs aStringwithString::from_utf8_unchecked; its safety comment assumes ASCII-to-ASCII replacement:rust/library/alloc/src/str.rs
Lines 884 to 890 in 4b0c9d7
Demonstration
The PoC uses only safe Rust code. It defines a custom
Patternwhoseas_utf8_pattern()exposes the non-ASCII byte0x80, then callsreplaceon"\u{d000}"(ed 80 80) with a one-byte replacement.Run with:
Output
Environment
The initial discovery was made by AI. The analysis, exploitation, and this report have been reviewed and verified by human experts.
Reporting on behalf of Autonomous Code Security (ACS) team at Microsoft.