Skip to content

ripgrep is slower than grep when searching for whitespace between words guarded by unicode word boundaries #1760

@awalgarg

Description

@awalgarg

What version of ripgrep are you using?

$ rg --version
ripgrep 12.1.1
-SIMD -AVX (compiled)
+SIMD +AVX (runtime)

How did you install ripgrep?

# pacman -S ripgrep

What operating system are you using ripgrep on?

$ uname -mr
5.9.9-arch1-1 x86_64

Describe your bug.

When searching a file for a pattern of the following form, ripgrep is significantly slower compared to grep, unless --no-unicode is used.

\bfoo\b \bbar\b

Removing any of the \bs above or removing the space in the middle removes this problem. The issue can also be reproduced after replacing the space with other whitespace patterns such as \t and \r.

What are the steps to reproduce the behavior?

$ git clone git@github.com:git/git.git
[...]
$ cd git
$ git ls-files | xargs cat > foo
[...]
$ ls -alh foo
-rw-r--r-- 1 user user 36M Dec 11 03:06 foo
$ file foo
foo: UTF-8 Unicode text
$ time grep -ic '\bfoo\b \bbar\b' foo
114

________________________________________________________
Executed in   79.27 millis    fish           external
   usr time   66.16 millis  1167.00 micros   65.00 millis
   sys time   13.09 millis  106.00 micros   12.98 millis

$ time rg -ic '\bfoo\b \bbar\b' foo
114

________________________________________________________
Executed in    1.60 secs   fish           external
   usr time  1589.21 millis  1117.00 micros  1588.09 millis
   sys time    6.66 millis    0.00 micros    6.66 millis

With --no-unicode, ripgrep is as fast as grep or faster.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugA bug.rollupA PR that has been merged with many others in a rollup.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions