Skip to content

str::replace with a custom Pattern implementation can create a non-UTF-8 String through safe API calls #156491

@qwaz

Description

@qwaz

str::replace has a fast path for replacing a single ASCII byte with another single ASCII byte. To decide whether that path applies, it calls the safe Pattern::as_utf8_pattern() method and accepts Utf8Pattern::StringPattern([from_byte]) as a one-byte pattern. Despite the name, Utf8Pattern::StringPattern stores &[u8], so a custom safe Pattern implementation can construct it with non-UTF-8 bytes. str::replace then treats that byte as ASCII, performs a raw byte substitution through replace_ascii, and returns a String whose contents are no longer valid UTF-8.

This breaks String’s UTF-8 invariant through safe API calls. Subsequent safe operations that rely on the invariant can reach unsafe standard-library internals with invalid data; the PoC below triggers Rust’s optional UB precondition check in char::from_u32_unchecked via String::chars().next().

Two possible fixes are to make the invariant explicit and unsafe by making Pattern an unsafe trait, or to change Utf8Pattern::StringPattern to carry &str instead of &[u8] so custom safe implementations cannot expose arbitrary non-UTF-8 bytes through this fast-path metadata.

Security impact

Expected to be low.

This requires a nightly-only custom Pattern implementation to actively misbehave while still using safe Rust: it must return a non-ASCII one-byte Utf8Pattern::StringPattern, the haystack must contain that byte inside a multibyte UTF-8 code point, and the replacement must be one byte so str::replace takes the ASCII fast path. This is not reachable through stable Rust or the built-in pattern types.

Data flow trace

str::replacePattern::as_utf8_patternreplace_asciiString::from_utf8_uncheckedChars::nextchar::from_u32_unchecked

  • Pattern::as_utf8_pattern: safe trait method returning Utf8Pattern; StringPattern is documented as bytes returned by string-like patterns, but the method itself is not unsafe:

    pub trait Pattern: Sized {
    /// Associated searcher for this pattern
    type Searcher<'a>: Searcher<'a>;
    /// Constructs the associated searcher from
    /// `self` and the `haystack` to search in.
    fn into_searcher(self, haystack: &str) -> Self::Searcher<'_>;
    /// Checks whether the pattern matches anywhere in the haystack
    #[inline]
    fn is_contained_in(self, haystack: &str) -> bool {
    self.into_searcher(haystack).next_match().is_some()
    }
    /// Checks whether the pattern matches at the front of the haystack
    #[inline]
    fn is_prefix_of(self, haystack: &str) -> bool {
    matches!(self.into_searcher(haystack).next(), SearchStep::Match(0, _))
    }
    /// Checks whether the pattern matches at the back of the haystack
    #[inline]
    fn is_suffix_of<'a>(self, haystack: &'a str) -> bool
    where
    Self::Searcher<'a>: ReverseSearcher<'a>,
    {
    matches!(self.into_searcher(haystack).next_back(), SearchStep::Match(_, j) if haystack.len() == j)
    }
    /// Removes the pattern from the front of haystack, if it matches.
    #[inline]
    fn strip_prefix_of(self, haystack: &str) -> Option<&str> {
    if let SearchStep::Match(start, len) = self.into_searcher(haystack).next() {
    debug_assert_eq!(
    start, 0,
    "The first search step from Searcher \
    must include the first character"
    );
    // SAFETY: `Searcher` is known to return valid indices.
    unsafe { Some(haystack.get_unchecked(len..)) }
    } else {
    None
    }
    }
    /// Removes the pattern from the back of haystack, if it matches.
    #[inline]
    fn strip_suffix_of<'a>(self, haystack: &'a str) -> Option<&'a str>
    where
    Self::Searcher<'a>: ReverseSearcher<'a>,
    {
    if let SearchStep::Match(start, end) = self.into_searcher(haystack).next_back() {
    debug_assert_eq!(
    end,
    haystack.len(),
    "The first search step from ReverseSearcher \
    must include the last character"
    );
    // SAFETY: `Searcher` is known to return valid indices.
    unsafe { Some(haystack.get_unchecked(..start)) }
    } else {
    None
    }
    }
    /// Returns the pattern as utf-8 bytes if possible.
    fn as_utf8_pattern(&self) -> Option<Utf8Pattern<'_>> {
    None
    }
    }
    /// Result of calling [`Pattern::as_utf8_pattern()`].
    /// Can be used for inspecting the contents of a [`Pattern`] in cases
    /// where the underlying representation can be represented as UTF-8.
    #[derive(Copy, Clone, Eq, PartialEq, Debug)]
    pub enum Utf8Pattern<'a> {
    /// Type returned by String and str types.
    StringPattern(&'a [u8]),
    /// Type returned by char types.
    CharPattern(char),
    }

  • str::replace: accepts Utf8Pattern::StringPattern([from_byte]) without validating ASCII, while the CharPattern branch does validate through c.as_ascii():

    pub fn replace<P: Pattern>(&self, from: P, to: &str) -> String {
    // Fast path for replacing a single ASCII character with another.
    if let Some(from_byte) = match from.as_utf8_pattern() {
    Some(Utf8Pattern::StringPattern([from_byte])) => Some(*from_byte),
    Some(Utf8Pattern::CharPattern(c)) => c.as_ascii().map(|ascii_char| ascii_char.to_u8()),
    _ => None,
    } {
    if let [to_byte] = to.as_bytes() {
    return unsafe { replace_ascii(self.as_bytes(), from_byte, *to_byte) };
    }
    }

  • replace_ascii: rewrites raw bytes and reconstructs a String with String::from_utf8_unchecked; its safety comment assumes ASCII-to-ASCII replacement:

    /// Faster implementation of string replacement for ASCII to ASCII cases.
    /// Should produce fast vectorized code.
    unsafe fn replace_ascii(utf8_bytes: &[u8], from: u8, to: u8) -> String {
    let result: Vec<u8> = utf8_bytes.iter().map(|b| if *b == from { to } else { *b }).collect();
    // SAFETY: We replaced ascii with ascii on valid utf8 strings.
    unsafe { String::from_utf8_unchecked(result) }
    }

Demonstration

The PoC uses only safe Rust code. It defines a custom Pattern whose as_utf8_pattern() exposes the non-ASCII byte 0x80, then calls replace on "\u{d000}" (ed 80 80) with a one-byte replacement.

Run with:

cargo +nightly run --features unstable-pattern --bin str_replace_non_ascii_pattern_poc
// cargo +nightly run --features unstable-pattern --bin str_replace_non_ascii_pattern_poc
#![feature(pattern)]
#![deny(unsafe_code)]

use std::str::pattern::{Pattern, Utf8Pattern};

const HAYSTACK: &str = "\u{d000}";
const FROM_BYTE: u8 = 0x80;
const TO: &str = "?";

fn main() {
    let corrupted = HAYSTACK.replace(NonAsciiBytePattern, TO);
    let bytes = corrupted.as_bytes();

    println!("haystack bytes: {:02x?}", HAYSTACK.as_bytes());
    println!("pattern exposed by as_utf8_pattern(): [{FROM_BYTE:#04x}]");
    println!("replacement bytes: {:02x?}", TO.as_bytes());
    println!("replace() result bytes: {bytes:02x?}");

    assert!(std::str::from_utf8(bytes).is_err());
    corrupted.chars().next();
}

struct NonAsciiBytePattern;

impl Pattern for NonAsciiBytePattern {
    type Searcher<'a> = <&'static str as Pattern>::Searcher<'a>;

    fn into_searcher(self, haystack: &str) -> Self::Searcher<'_> {
        <&'static str as Pattern>::into_searcher("", haystack)
    }

    fn as_utf8_pattern(&self) -> Option<Utf8Pattern<'_>> {
        static NEEDLE: [u8; 1] = [FROM_BYTE];

        Some(Utf8Pattern::StringPattern(&NEEDLE))
    }
}

Output

haystack bytes: [ed, 80, 80]
pattern exposed by as_utf8_pattern(): [0x80]
replacement bytes: [3f]
replace() result bytes: [ed, 3f, 3f]

thread 'main' (1058134) panicked at /rustc/4b0c9d76ae7d387229caea55cfa73c280b08b8a7/library/core/src/char/methods.rs:239:18:
unsafe precondition(s) violated: invalid value for `char`

This indicates a bug in the program. This Undefined Behavior check is optional, and cannot be relied on for safety.
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
thread caused non-unwinding panic. aborting.
Aborted                    (core dumped)

Environment

$ uname -a
Linux Abraxas 6.6.114.1-microsoft-standard-WSL2 #1 SMP PREEMPT_DYNAMIC Mon Dec  1 20:46:23 UTC 2025 x86_64 GNU/Linux

$ cat /etc/os-release
PRETTY_NAME="Ubuntu 26.04 LTS"
VERSION_ID="26.04"
VERSION="26.04 (Resolute Raccoon)"

$ rustc +nightly --version --verbose
rustc 1.97.0-nightly (4b0c9d76a 2026-05-10)
binary: rustc
commit-hash: 4b0c9d76ae7d387229caea55cfa73c280b08b8a7
commit-date: 2026-05-10
host: x86_64-unknown-linux-gnu
release: 1.97.0-nightly
LLVM version: 22.1.4

The initial discovery was made by AI. The analysis, exploitation, and this report have been reviewed and verified by human experts.

Reporting on behalf of Autonomous Code Security (ACS) team at Microsoft.

Metadata

Metadata

Assignees

Labels

C-bugCategory: This is a bug.I-unsoundIssue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/SoundnessT-libsRelevant to the library team, which will review and decide on the PR/issue.requires-nightlyThis issue requires a nightly compiler in some way. When possible, use a F-* label instead.

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions