Skip to content

rg -c omits files when a NUL byte is encountered after an earlier match #3131

@ejh3

Description

@ejh3

Please tick this box to confirm you have reviewed the above.

  • I have a different issue.

What version of ripgrep are you using?

ripgrep 14.1.1

features:+pcre2
simd(compile):+SSE2,+SSSE3,-AVX2
simd(runtime):+SSE2,+SSSE3,+AVX2

PCRE2 10.43 is available (JIT is available)

How did you install ripgrep?

> rpm -qf $(which rg) 2>/dev/null
ripgrep-14.1.1-1.el9.x86_64

What operating system are you using ripgrep on?

> uname -a
Linux PLACEHOLDER 5.10.228-41577284.AroraKernel510.el7.x86_64 #1 SMP Tue Apr 15 16:42:53 PDT 2025 x86_64 x86_64 x86_64 GNU/Linux

Describe your bug.

When a file contains a match before a NUL byte but also contains a NUL later, rg -l correctly lists the file, but rg -c omits it entirely. This is inconsistent: -c should report the same set of files as -l or ripgrep with no options, even if binary-file detection stops the scan early. Using -ca (treat as text) gives the expected results.

What are the steps to reproduce the behavior?

ejhunter /tmp/rgBug > { echo "cat here"; yes "padding line" | head -n 150000; printf '\0'; } > file1.txt
ejhunter /tmp/rgBug > echo "cat here" > file2.txt

ejhunter /tmp/rgBug > rg 'cat' -l --no-config
file2.txt
file1.txt

ejhunter /tmp/rgBug > rg 'cat' -c --no-config
file2.txt:1

ejhunter /tmp/rgBug > rg 'cat' --no-config
file2.txt
1:cat here

file1.txt
1:cat here
file1.txt: WARNING: stopped searching binary file after match (found "\0" byte around offset 1950009)

ejhunter /tmp/rgBug > rg 'cat' -ca --no-config
file2.txt:1
file1.txt:1

What is the actual behavior?

ejhunter /tmp/rgBug > rg 'cat' -c --no-config --debug
rg: DEBUG|rg::flags::parse|crates/core/flags/parse.rs:89: not reading config files because --no-config is present
rg: DEBUG|rg::flags::hiargs|crates/core/flags/hiargs.rs:1083: number of paths given to search: 0
rg: DEBUG|rg::flags::hiargs|crates/core/flags/hiargs.rs:1108: using heuristics to determine whether to read from stdin or search ./ (is_readable_stdin=false, stdin_consumed=false, mode=Search(Count))
rg: DEBUG|rg::flags::hiargs|crates/core/flags/hiargs.rs:1118: heuristic chose to search ./
rg: DEBUG|rg::flags::hiargs|crates/core/flags/hiargs.rs:1269: found hostname for hyperlink configuration: REDACTED
rg: DEBUG|rg::flags::hiargs|crates/core/flags/hiargs.rs:1279: hyperlink format: ""
rg: DEBUG|rg::flags::hiargs|crates/core/flags/hiargs.rs:174: using 12 thread(s)
rg: DEBUG|grep_regex::config|/usr/share/cargo/registry/grep-regex-0.1.13/src/config.rs:175: assembling HIR from 1 fixed string literals
rg: DEBUG|globset|/usr/share/cargo/registry/globset-0.4.15/src/lib.rs:453: built glob set; 0 literals, 0 basenames, 12 extensions, 0 prefixes, 0 suffixes, 0 required extensions, 0 regexes
file2.txt:1
rg: DEBUG|grep_printer::summary|/usr/share/cargo/registry/grep-printer-0.2.2/src/summary.rs:698: ignoring file1.txt: found binary data at offset 1950009
ejhunter /tmp/rgBug > rg 'cat' -l --no-config --debug
rg: DEBUG|rg::flags::parse|crates/core/flags/parse.rs:89: not reading config files because --no-config is present
rg: DEBUG|rg::flags::hiargs|crates/core/flags/hiargs.rs:1083: number of paths given to search: 0
rg: DEBUG|rg::flags::hiargs|crates/core/flags/hiargs.rs:1108: using heuristics to determine whether to read from stdin or search ./ (is_readable_stdin=false, stdin_consumed=false, mode=Search(FilesWithMatches))
rg: DEBUG|rg::flags::hiargs|crates/core/flags/hiargs.rs:1118: heuristic chose to search ./
rg: DEBUG|rg::flags::hiargs|crates/core/flags/hiargs.rs:1269: found hostname for hyperlink configuration: REDACTED
rg: DEBUG|rg::flags::hiargs|crates/core/flags/hiargs.rs:1279: hyperlink format: ""
rg: DEBUG|rg::flags::hiargs|crates/core/flags/hiargs.rs:174: using 12 thread(s)
rg: DEBUG|grep_regex::config|/usr/share/cargo/registry/grep-regex-0.1.13/src/config.rs:175: assembling HIR from 1 fixed string literals
rg: DEBUG|globset|/usr/share/cargo/registry/globset-0.4.15/src/lib.rs:453: built glob set; 0 literals, 0 basenames, 12 extensions, 0 prefixes, 0 suffixes, 0 required extensions, 0 regexes
file2.txt
file1.txt

What is the expected behavior?

The --count option should behave the same as having no options wrt how many matches it finds. If there's only 1 match found before early termination, rg -c should return that. Whether it spit out a warning or not, I don't have a strong opinion on.

Metadata

Metadata

Assignees

No one assigned

    Labels

    docAn issue with or an improvement to documentation.wontfixA feature or bug that is unlikely to be implemented or fixed.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions