Skip to content

emit log message when file is skipped becaue it was detected as "binary" data #2246

@asomers

Description

@asomers

What version of ripgrep are you using?

ripgrep 13.0.0
+SIMD -AVX (compiled)
+SIMD +AVX (runtime)

How did you install ripgrep?

Through the FreeBSD ports collection

What operating system are you using ripgrep on?

FreeBSD 14.0-CURRENT amd64. The file system is ZFS.

Describe your bug.

I have a directory with 552,900 entries, one for every version of every crate ever published to crates.io. If I run rg with no PATH arguments, it finishes in about a minute with no results. However, if I run it with specific PATH arguments then it finds plentiful results. The likeliest explanation I can think of is that when recursing through ., rg doesn't iterate through every child.

What are the steps to reproduce the behavior?

First, create a fresh file system with at least 100 GB of space. Then download every published crate, using a command similar to the following. Note that fetch is a FreeBSD-specific command, and may be replaced by curl or wget.

git clone https:​//github.com/rust-lang/crates.io-index index
grep -hr . index/*/ | jq '.name + "-" + .vers + ".tar.gz " + "https://crates.io/api/v1/crates/" + .name + "/" + .vers + "/download"' -r | xargs -P100 -n2 fetch -o

Run this command to show which files match. You can CTRL-C it after you're satisfied.

ls | xargs -n 1000 rg -l -z '\bsigevent\b'`

Then run this command. It will wrongly produce no output.

rg -l -z '\bsigevent\b'

What is the actual behavior?

> rg --debug -l -z  '\bsigevent\b'
DEBUG|grep_regex::literal|crates/regex/src/literal.rs:180: required literal found: "sigevent"
DEBUG|grep_regex::matcher|crates/regex/src/matcher.rs:50: extracted fast line regex: (?-u:sigevent)
DEBUG|globset|crates/globset/src/lib.rs:421: built glob set; 0 literals, 0 basenames, 12 extensions, 0 prefixes, 0 suffixes, 0 required extensions, 0 regexes
DEBUG|globset|crates/globset/src/lib.rs:421: built glob set; 0 literals, 0 basenames, 12 extensions, 0 prefixes, 0 suffixes, 0 required extensions, 0 regexes
DEBUG|globset|crates/globset/src/lib.rs:421: built glob set; 0 literals, 0 basenames, 12 extensions, 0 prefixes, 0 suffixes, 0 required extensions, 0 regexes
DEBUG|globset|crates/globset/src/lib.rs:421: built glob set; 0 literals, 0 basenames, 12 extensions, 0 prefixes, 0 suffixes, 0 required extensions, 0 regexes
DEBUG|globset|crates/globset/src/lib.rs:421: built glob set; 0 literals, 0 basenames, 12 extensions, 0 prefixes, 0 suffixes, 0 required extensions, 0 regexes
DEBUG|globset|crates/globset/src/lib.rs:421: built glob set; 0 literals, 0 basenames, 12 extensions, 0 prefixes, 0 suffixes, 0 required extensions, 0 regexes
DEBUG|globset|crates/globset/src/lib.rs:421: built glob set; 0 literals, 0 basenames, 12 extensions, 0 prefixes, 0 suffixes, 0 required extensions, 0 regexes
DEBUG|globset|crates/globset/src/lib.rs:421: built glob set; 0 literals, 0 basenames, 12 extensions, 0 prefixes, 0 suffixes, 0 required extensions, 0 regexes
DEBUG|globset|crates/globset/src/lib.rs:421: built glob set; 0 literals, 0 basenames, 12 extensions, 0 prefixes, 0 suffixes, 0 required extensions, 0 regexes
DEBUG|globset|crates/globset/src/lib.rs:421: built glob set; 0 literals, 0 basenames, 12 extensions, 0 prefixes, 0 suffixes, 0 required extensions, 0 regexes
DEBUG|globset|crates/globset/src/lib.rs:421: built glob set; 0 literals, 0 basenames, 12 extensions, 0 prefixes, 0 suffixes, 0 required extensions, 0 regexes
DEBUG|globset|crates/globset/src/lib.rs:421: built glob set; 0 literals, 0 basenames, 12 extensions, 0 prefixes, 0 suffixes, 0 required extensions, 0 regexes
DEBUG|globset|crates/globset/src/lib.rs:421: built glob set; 0 literals, 0 basenames, 12 extensions, 0 prefixes, 0 suffixes, 0 required extensions, 0 regexes
DEBUG|ignore::walk|crates/ignore/src/walk.rs:1741: ignoring ./.affected-packages.txt.swp: Ignore(IgnoreMatch(Hidden))
DEBUG|ignore::walk|crates/ignore/src/walk.rs:1741: ignoring ./.summary.txt.swp: Ignore(IgnoreMatch(Hidden))
DEBUG|ignore::walk|crates/ignore/src/walk.rs:1741: ignoring ./index/.git: Ignore(IgnoreMatch(Hidden))
DEBUG|ignore::walk|crates/ignore/src/walk.rs:1741: ignoring ./index/.github: Ignore(IgnoreMatch(Hidden))

What is the expected behavior?

It should have returned about 1353 files.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementAn enhancement to the functionality of the software.rollupA PR that has been merged with many others in a rollup.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions