Make Input::new guard against incorrect AsRef implementations#1154
Make Input::new guard against incorrect AsRef implementations#1154BurntSushi merged 1 commit intorust-lang:masterfrom
Input::new guard against incorrect AsRef implementations#1154Conversation
BurntSushi
left a comment
There was a problem hiding this comment.
Nice fix! Out of curiosity, how did you find this?
I think this overall looks good, but I'd like to find a word other than "malicious." The issue here really isn't "malicious" per se, because the threat model here doesn't really involve some bad actor doing something sneaky (if a bad actor can insert a malicious AsRef impl, then they can do a whole bunch of other stuff without need for such things). Perhaps "guarding against incorrect AsRef impls" is a better way to phrase it.
I happened to gave a quick look at |
Input::new robust against malicious AsRef implementationsInput::new guard against incorrect AsRef implementations
Before this commit, Input::new calls haystack.as_ref() twice, once to
get the actual haystack slice and the second time to get its length. It
makes the assumption that the second call will return the same slice,
but malicious implementations of AsRef can return different slices
and thus different lengths. This is important because there's unsafe
code relying on the Input's span being inbounds with respect to the
haystack, but if the second call to .as_ref() returns a bigger slice
this won't be true.
For example, this snippet causes Miri to report UB on an unchecked
slice access in find_fwd_imp (though it will also panic sometime later
when run normally, but at that point the UB already happened):
use regex_automata::{Input, meta::{Builder, Config}};
use std::cell::Cell;
struct Bad(Cell<bool>);
impl AsRef<[u8]> for Bad {
fn as_ref(&self) -> &[u8] {
if self.0.replace(false) {
&[]
} else {
&[0; 1000]
}
}
}
let bad = Bad(Cell::new(true));
let input = Input::new(&bad);
let regex = Builder::new()
// Not setting this causes some checked access to occur before
// the unchecked ones, avoiding the UB
.configure(Config::new().auto_prefilter(false))
.build("a+")
.unwrap();
regex.find(input);
This commit fixes the problem by just calling .as_ref() once and use
the length of the returned slice as the span's end value. A regression
test has also been added.
Closes rust-lang#1154
1c2aa52 to
07246d4
Compare
|
This PR is on crates.io in |
Currently
Input::newcallshaystack.as_ref()twice, once to get the actualhaystackslice and the second time to get its length. It makes the assumption that the second call will return the same slice, but malicious implementations ofAsRefcan return different slices and thus different lengths. This is important because there'sunsafecode relying on theInput's span being inbounds with respect to thehaystack, but if the second call to.as_ref()returns a bigger slice this won't be true.For example, this snippet causes MIRI to report UB on an unchecked slice access in
find_fwd_imp(though it will also panic sometime later when run normally, but at that point the UB already happened):The proposed fix is to just call
.as_ref()once and use the length of the returned slice as the span's end value. A regression test has also been added.