Make `Input::new` guard against incorrect `AsRef` implementations by SkiFire13 · Pull Request #1154 · rust-lang/regex

SkiFire13 · 2024-01-19T21:16:11Z

Currently Input::new calls haystack.as_ref() twice, once to get the actual haystack slice and the second time to get its length. It makes the assumption that the second call will return the same slice, but malicious implementations of AsRef can return different slices and thus different lengths. This is important because there's unsafe code relying on the Input's span being inbounds with respect to the haystack, but if the second call to .as_ref() returns a bigger slice this won't be true.

For example, this snippet causes MIRI to report UB on an unchecked slice access in find_fwd_imp (though it will also panic sometime later when run normally, but at that point the UB already happened):

use regex_automata::{Input, meta::{Builder, Config}};
use std::cell::Cell;

struct Bad(Cell<bool>);

impl AsRef<[u8]> for Bad {
    fn as_ref(&self) -> &[u8] {
        if self.0.replace(false) {
            &[]
        } else {
            &[0; 1000]
        }
    }
}

let bad = Bad(Cell::new(true));
let input = Input::new(&bad);
let regex = Builder::new()
    .configure(Config::new().auto_prefilter(false)) // Not setting this causes some checked access to occur before the unchecked ones, avoiding the UB
    .build("a+")
    .unwrap();
regex.find(input);

The proposed fix is to just call .as_ref() once and use the length of the returned slice as the span's end value. A regression test has also been added.

BurntSushi

Nice fix! Out of curiosity, how did you find this?

I think this overall looks good, but I'd like to find a word other than "malicious." The issue here really isn't "malicious" per se, because the threat model here doesn't really involve some bad actor doing something sneaky (if a bad actor can insert a malicious AsRef impl, then they can do a whole bunch of other stuff without need for such things). Perhaps "guarding against incorrect AsRef impls" is a better way to phrase it.

SkiFire13 · 2024-01-20T15:42:11Z

Nice fix! Out of curiosity, how did you find this?

I happened to gave a quick look at Input::new's source code and the two calls to .as_ref() reminded me of rust-lang/rust#80335 so I quickly checked if there was unsafe code relying on the span's end and there was.

Before this commit, Input::new calls haystack.as_ref() twice, once to get the actual haystack slice and the second time to get its length. It makes the assumption that the second call will return the same slice, but malicious implementations of AsRef can return different slices and thus different lengths. This is important because there's unsafe code relying on the Input's span being inbounds with respect to the haystack, but if the second call to .as_ref() returns a bigger slice this won't be true. For example, this snippet causes Miri to report UB on an unchecked slice access in find_fwd_imp (though it will also panic sometime later when run normally, but at that point the UB already happened): use regex_automata::{Input, meta::{Builder, Config}}; use std::cell::Cell; struct Bad(Cell<bool>); impl AsRef<[u8]> for Bad { fn as_ref(&self) -> &[u8] { if self.0.replace(false) { &[] } else { &[0; 1000] } } } let bad = Bad(Cell::new(true)); let input = Input::new(&bad); let regex = Builder::new() // Not setting this causes some checked access to occur before // the unchecked ones, avoiding the UB .configure(Config::new().auto_prefilter(false)) .build("a+") .unwrap(); regex.find(input); This commit fixes the problem by just calling .as_ref() once and use the length of the returned slice as the span's end value. A regression test has also been added. Closes rust-lang#1154

BurntSushi · 2024-01-21T14:09:44Z

This PR is on crates.io in regex 1.10.3.

BurntSushi requested changes Jan 20, 2024

View reviewed changes

SkiFire13 changed the title ~~Make Input::new robust against malicious AsRef implementations~~ Make Input::new guard against incorrect AsRef implementations Jan 20, 2024

BurntSushi force-pushed the fix-unsound-input-new branch from 1c2aa52 to 07246d4 Compare January 21, 2024 13:15

BurntSushi approved these changes Jan 21, 2024

View reviewed changes

BurntSushi merged commit fbd2537 into rust-lang:master Jan 21, 2024

SkiFire13 deleted the fix-unsound-input-new branch January 21, 2024 14:16

This was referenced Oct 13, 2025

chore(deps): bump regex from 1.11.1 to 1.12.1 dirvine/sb#40

Closed

chore(deps): bump regex from 1.11.1 to 1.12.2 dirvine/sb#42

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make `Input::new` guard against incorrect `AsRef` implementations#1154

Make `Input::new` guard against incorrect `AsRef` implementations#1154
BurntSushi merged 1 commit intorust-lang:masterfrom
SkiFire13:fix-unsound-input-new

SkiFire13 commented Jan 19, 2024

Uh oh!

BurntSushi left a comment

Uh oh!

SkiFire13 commented Jan 20, 2024

Uh oh!

BurntSushi commented Jan 21, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

SkiFire13 commented Jan 19, 2024

Uh oh!

BurntSushi left a comment

Choose a reason for hiding this comment

Uh oh!

SkiFire13 commented Jan 20, 2024

Uh oh!

BurntSushi commented Jan 21, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants