Skip to content

Byte pattern detection works inconsistently #1838

@bdlmt

Description

@bdlmt

What version of ripgrep are you using?

ripgrep 12.1.1 (rev 7cb2113)
-SIMD -AVX (compiled)
+SIMD +AVX (runtime)

Initially, I found the bug while using 11.0.2:

ripgrep 11.0.2
-SIMD -AVX (compiled)
+SIMD +AVX (runtime)

How did you install ripgrep?

sudo dpkg -i ./ripgrep_12.1.1_amd64.deb

What operating system are you using ripgrep on?

Ubuntu 20.04.2 LTS

Linux local 5.8.0-45-generic #51~20.04.1-Ubuntu SMP Tue Feb 23 13:46:31 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

Describe your bug.

Using command line mode, escaping from Unicode mode in order to scan for bytes doesn't seem to work as documented.

What are the steps to reproduce the behavior?

Simple to reproduce. A scan of /usr/bin will suffice.

What is the actual behavior?

An rg scan of /usr/bin for ELF magic values returns 0 results, while there should be over 1000 matching files:

user@local:~$ rg -uuu '(?-u)\x7f\x45\x4c\x46\x02\x01\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00' /usr/bin/
user@local:~$ rg -uuu '(?-u)\x7f\x45\x4c\x46\x02\x01\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00' /usr/bin/*
user@local:~$

Using the same byte pattern, a direct scan of individual files works as expected:

user@local:~$ rg -uuu '(?-u)\x7f\x45\x4c\x46\x02\x01\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00' /usr/bin/ls
Binary file matches (found "\u{0}" byte around offset 7)
user@local:~$ rg -uuu '(?-u)\x7f\x45\x4c\x46\x02\x01\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00' /usr/bin/ed
Binary file matches (found "\u{0}" byte around offset 7)
user@local:~$ rg -uuu '(?-u)\x7f\x45\x4c\x46\x02\x01\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00' /usr/bin/file
Binary file matches (found "\u{0}" byte around offset 7)
user@local:~$ rg -uuu '(?-u)\x7f\x45\x4c\x46\x02\x01\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00' /usr/bin/find
Binary file matches (found "\u{0}" byte around offset 7)

Reducing the set of files scanned seems to change the results when using a filename wildcard, but not when just just specifying the directory:

user@local:~$ mkdir test
user@local:~/test$ cd test
user@local:~/test$ cp /usr/bin/ls .
user@local:~/test$ cp /usr/bin/ed . 
user@local:~/test$ cp /usr/bin/file .
user@local:~/test$ cp /usr/bin/find .
user@local:~/test$ rg -uuu '(?-u)\x7f\x45\x4c\x46\x02\x01\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00' ./*
Binary file ./ls matches (found "\u{0}" byte around offset 7)

Binary file ./find matches (found "\u{0}" byte around offset 7)

Binary file ./file matches (found "\u{0}" byte around offset 7)

Binary file ./ed matches (found "\u{0}" byte around offset 7)
user@local:~/test$ rg -uuu '(?-u)\x7f\x45\x4c\x46\x02\x01\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00' ./
user@local:~/test$ rg -uuu '(?-u)\x7f\x45\x4c\x46\x02\x01\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00' ./ls 
Binary file matches (found "\u{0}" byte around offset 7)
user@local:~/test$ rg -uuu '(?-u)\x7f\x45\x4c\x46\x02\x01\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00' ./ed
Binary file matches (found "\u{0}" byte around offset 7)
user@local:~/test$ rg -uuu '(?-u)\x7f\x45\x4c\x46\x02\x01\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00' ./file 
Binary file matches (found "\u{0}" byte around offset 7)
user@local:~/test$ rg -uuu '(?-u)\x7f\x45\x4c\x46\x02\x01\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00' ./find 
Binary file matches (found "\u{0}" byte around offset 7)

Removing the \x00 bytes from the tail of the pattern changes ripgrep's behavior, and seems to work as expected:

user@local:~/test$ rg -uuu '(?-u)\x7f\x45\x4c\x46\x02\x01\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00' /usr/bin/ | wc -l
0
user@local:~/test$ rg -uuu '(?-u)\x7f\x45\x4c\x46\x02\x01\x01' /usr/bin/ | wc -l
1092

Here are some debug outputs from the 4-file ~/test directory:

user@local:~/test$ rg -uuu '(?-u)\x7f\x45\x4c\x46\x02\x01\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00' ./ --debug
DEBUG|grep_regex::literal|crates/regex/src/literal.rs:58: literal prefixes detected: Literals { lits: [Complete(ELF)], limit_size: 250, limit_class: 10 }
DEBUG|globset|crates/globset/src/lib.rs:431: built glob set; 0 literals, 0 basenames, 12 extensions, 0 prefixes, 0 suffixes, 0 required extensions, 0 regexes
DEBUG|globset|crates/globset/src/lib.rs:431: built glob set; 0 literals, 0 basenames, 12 extensions, 0 prefixes, 0 suffixes, 0 required extensions, 0 regexes
DEBUG|globset|crates/globset/src/lib.rs:431: built glob set; 0 literals, 0 basenames, 12 extensions, 0 prefixes, 0 suffixes, 0 required extensions, 0 regexes
DEBUG|globset|crates/globset/src/lib.rs:431: built glob set; 0 literals, 0 basenames, 12 extensions, 0 prefixes, 0 suffixes, 0 required extensions, 0 regexes
DEBUG|globset|crates/globset/src/lib.rs:431: built glob set; 0 literals, 0 basenames, 12 extensions, 0 prefixes, 0 suffixes, 0 required extensions, 0 regexes
DEBUG|globset|crates/globset/src/lib.rs:431: built glob set; 0 literals, 0 basenames, 12 extensions, 0 prefixes, 0 suffixes, 0 required extensions, 0 regexes
DEBUG|globset|crates/globset/src/lib.rs:431: built glob set; 0 literals, 0 basenames, 12 extensions, 0 prefixes, 0 suffixes, 0 required extensions, 0 regexes
DEBUG|globset|crates/globset/src/lib.rs:431: built glob set; 0 literals, 0 basenames, 12 extensions, 0 prefixes, 0 suffixes, 0 required extensions, 0 regexes
DEBUG|globset|crates/globset/src/lib.rs:431: built glob set; 0 literals, 0 basenames, 12 extensions, 0 prefixes, 0 suffixes, 0 required extensions, 0 regexes
DEBUG|globset|crates/globset/src/lib.rs:431: built glob set; 0 literals, 0 basenames, 12 extensions, 0 prefixes, 0 suffixes, 0 required extensions, 0 regexes
DEBUG|globset|crates/globset/src/lib.rs:431: built glob set; 0 literals, 0 basenames, 12 extensions, 0 prefixes, 0 suffixes, 0 required extensions, 0 regexes
DEBUG|globset|crates/globset/src/lib.rs:431: built glob set; 0 literals, 0 basenames, 12 extensions, 0 prefixes, 0 suffixes, 0 required extensions, 0 regexes
DEBUG|globset|crates/globset/src/lib.rs:431: built glob set; 0 literals, 0 basenames, 12 extensions, 0 prefixes, 0 suffixes, 0 required extensions, 0 regexes
user@local:~/test$ rg -uuu '(?-u)\x7f\x45\x4c\x46\x02\x01\x01' ./* --debug
DEBUG|grep_regex::literal|crates/regex/src/literal.rs:58: literal prefixes detected: Literals { lits: [Complete(ELF)], limit_size: 250, limit_class: 10 }
DEBUG|globset|crates/globset/src/lib.rs:431: built glob set; 0 literals, 0 basenames, 12 extensions, 0 prefixes, 0 suffixes, 0 required extensions, 0 regexes
DEBUG|globset|crates/globset/src/lib.rs:431: built glob set; 0 literals, 0 basenames, 12 extensions, 0 prefixes, 0 suffixes, 0 required extensions, 0 regexes
DEBUG|globset|crates/globset/src/lib.rs:431: built glob set; 0 literals, 0 basenames, 12 extensions, 0 prefixes, 0 suffixes, 0 required extensions, 0 regexes
Binary file ./ls matches (found "\u{0}" byte around offset 7)
DEBUG|globset|crates/globset/src/lib.rs:431: built glob set; 0 literals, 0 basenames, 12 extensions, 0 prefixes, 0 suffixes, 0 required extensions, 0 regexes

Binary file ./find matches (found "\u{0}" byte around offset 7)

Binary file ./file matches (found "\u{0}" byte around offset 7)

Binary file ./ed matches (found "\u{0}" byte around offset 7)
DEBUG|globset|crates/globset/src/lib.rs:431: built glob set; 0 literals, 0 basenames, 12 extensions, 0 prefixes, 0 suffixes, 0 required extensions, 0 regexes
DEBUG|globset|crates/globset/src/lib.rs:431: built glob set; 0 literals, 0 basenames, 12 extensions, 0 prefixes, 0 suffixes, 0 required extensions, 0 regexes
DEBUG|globset|crates/globset/src/lib.rs:431: built glob set; 0 literals, 0 basenames, 12 extensions, 0 prefixes, 0 suffixes, 0 required extensions, 0 regexes
DEBUG|globset|crates/globset/src/lib.rs:431: built glob set; 0 literals, 0 basenames, 12 extensions, 0 prefixes, 0 suffixes, 0 required extensions, 0 regexes
DEBUG|globset|crates/globset/src/lib.rs:431: built glob set; 0 literals, 0 basenames, 12 extensions, 0 prefixes, 0 suffixes, 0 required extensions, 0 regexes
DEBUG|globset|crates/globset/src/lib.rs:431: built glob set; 0 literals, 0 basenames, 12 extensions, 0 prefixes, 0 suffixes, 0 required extensions, 0 regexes
DEBUG|globset|crates/globset/src/lib.rs:431: built glob set; 0 literals, 0 basenames, 12 extensions, 0 prefixes, 0 suffixes, 0 required extensions, 0 regexes
DEBUG|globset|crates/globset/src/lib.rs:431: built glob set; 0 literals, 0 basenames, 12 extensions, 0 prefixes, 0 suffixes, 0 required extensions, 0 regexes
DEBUG|globset|crates/globset/src/lib.rs:431: built glob set; 0 literals, 0 basenames, 12 extensions, 0 prefixes, 0 suffixes, 0 required extensions, 0 regexes
user@local:~/test$ rg -uuu '(?-u)\x7f\x45\x4c\x46\x02\x01\x01' ./ --debug
DEBUG|grep_regex::literal|crates/regex/src/literal.rs:58: literal prefixes detected: Literals { lits: [Complete(ELF)], limit_size: 250, limit_class: 10 }
DEBUG|globset|crates/globset/src/lib.rs:431: built glob set; 0 literals, 0 basenames, 12 extensions, 0 prefixes, 0 suffixes, 0 required extensions, 0 regexes
DEBUG|globset|crates/globset/src/lib.rs:431: built glob set; 0 literals, 0 basenames, 12 extensions, 0 prefixes, 0 suffixes, 0 required extensions, 0 regexes
DEBUG|globset|crates/globset/src/lib.rs:431: built glob set; 0 literals, 0 basenames, 12 extensions, 0 prefixes, 0 suffixes, 0 required extensions, 0 regexes
Binary file ./ls matches (found "\u{0}" byte around offset 7)
DEBUG|globset|crates/globset/src/lib.rs:431: built glob set; 0 literals, 0 basenames, 12 extensions, 0 prefixes, 0 suffixes, 0 required extensions, 0 regexes

Binary file ./find matches (found "\u{0}" byte around offset 7)

Binary file ./file matches (found "\u{0}" byte around offset 7)

Binary file ./ed matches (found "\u{0}" byte around offset 7)
DEBUG|globset|crates/globset/src/lib.rs:431: built glob set; 0 literals, 0 basenames, 12 extensions, 0 prefixes, 0 suffixes, 0 required extensions, 0 regexes
DEBUG|globset|crates/globset/src/lib.rs:431: built glob set; 0 literals, 0 basenames, 12 extensions, 0 prefixes, 0 suffixes, 0 required extensions, 0 regexes
DEBUG|globset|crates/globset/src/lib.rs:431: built glob set; 0 literals, 0 basenames, 12 extensions, 0 prefixes, 0 suffixes, 0 required extensions, 0 regexes
DEBUG|globset|crates/globset/src/lib.rs:431: built glob set; 0 literals, 0 basenames, 12 extensions, 0 prefixes, 0 suffixes, 0 required extensions, 0 regexes
DEBUG|globset|crates/globset/src/lib.rs:431: built glob set; 0 literals, 0 basenames, 12 extensions, 0 prefixes, 0 suffixes, 0 required extensions, 0 regexes
DEBUG|globset|crates/globset/src/lib.rs:431: built glob set; 0 literals, 0 basenames, 12 extensions, 0 prefixes, 0 suffixes, 0 required extensions, 0 regexes
DEBUG|globset|crates/globset/src/lib.rs:431: built glob set; 0 literals, 0 basenames, 12 extensions, 0 prefixes, 0 suffixes, 0 required extensions, 0 regexes
DEBUG|globset|crates/globset/src/lib.rs:431: built glob set; 0 literals, 0 basenames, 12 extensions, 0 prefixes, 0 suffixes, 0 required extensions, 0 regexes
DEBUG|globset|crates/globset/src/lib.rs:431: built glob set; 0 literals, 0 basenames, 12 extensions, 0 prefixes, 0 suffixes, 0 required extensions, 0 regexes
user@local:~/test$ rg -uuu '(?-u)\x7f\x45\x4c\x46\x02\x01\x01' ./ls --debug
DEBUG|grep_regex::literal|crates/regex/src/literal.rs:58: literal prefixes detected: Literals { lits: [Complete(ELF)], limit_size: 250, limit_class: 10 }
DEBUG|globset|crates/globset/src/lib.rs:431: built glob set; 0 literals, 0 basenames, 12 extensions, 0 prefixes, 0 suffixes, 0 required extensions, 0 regexes
Binary file matches (found "\u{0}" byte around offset 7)

What is the expected behavior?

ripgrep should consistently match on files in a search path which contain a specified byte pattern, using the documented method.

Metadata

Metadata

Assignees

No one assigned

    Labels

    docAn issue with or an improvement to documentation.questionAn issue that is lacking clarity on one or more points.rollupA PR that has been merged with many others in a rollup.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions