Skip to content

Ending newline character of a line seems to be exposed to the regex engine with -P option #1401

@learnbyexample

Description

@learnbyexample

What version of ripgrep are you using?

$ rg --version
ripgrep 11.0.2 (rev 3de31f7527)
-SIMD -AVX (compiled)
+SIMD -AVX (runtime)

How did you install ripgrep?

Using ripgrep_11.0.2_amd64.deb

What operating system are you using ripgrep on?

Ubuntu LTS 16.04

Describe your question, feature request, or bug.

When -P option is used, \s seems to take ending newline character of input line into consideration.

If this is a bug, what are the steps to reproduce the behavior?

Consider this sample input file:

$ printf 'foo 42\nxoyz\ncat\tdog\n' > ip.txt
$ cat ip.txt
foo 42
xoyz
cat	dog

Here's a minimal example showing the issue (I needed lookarounds, hence the need for -P switch)

$ # extract till last 'o' in the line, if there are no more whitespaces after 'o'
$ # the issue is that characters after 'o' are also displayed despite the -o switch
$ rg -NoP '.*o(?!.*\s)' ip.txt
xoyz
cat	dog

$ # works if I replace \s with space/tab character class or use GNU grep
$ # using \s(?!$) instead of \s is another workaround that works
$ rg -NoP '.*o(?!.*[ \t])' ip.txt
xo
cat	do
$ grep -oP '.*o(?!.*\s)' ip.txt
xo
cat	do

Another example shown below:

$ rg -No '.*\s' ip.txt
foo 
cat	

$ rg -NoP '.*\s' ip.txt
foo 42
cat	dog

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugA bug.rollupA PR that has been merged with many others in a rollup.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions