Python: parse mode chars should not be considered chars #13975

yoff · 2023-08-15T19:27:20Z

This is a PR on top of #13779 to aid in the discussion around parse mode characters.

They are simply considered part of the group start.

python/ql/lib/semmle/python/regexp/internal/ParseRegExp.qll

+   * Holds if a parse mode starts between `start` and `end`.
+   */
+  private predicate flag_group_start(int start, int end) {
+    exists(int no_modes_end |


erik-krogh · 2023-08-15T20:28:42Z

python/ql/test/query-tests/Security/CWE-730-ReDoS/unittests.py

@@ -7,3 +7,6 @@
 # Treatment of line breaks
 re.compile(r'(?:.|\n)*b') # No ReDoS.
 re.compile(r'(?:.|\n)*b', re.DOTALL) # Has ReDoS.
+re.compile(r'(?i)(?:.|\n)*b') # No ReDoS.
+re.compile(r'(?s)(?:.|\n)*b') # Has ReDoS.
+re.compile(r'(?is)(?:.|\n)*b') # Has ReDoS.


Could you also add the following as a test:

re.compile(r'(?is)X(?:.|\n)*Y')

The message for that should ideally mention that the attack string should start with an X.
If the message doesn't say that, then the "start" of the regex is wrong.

Excellent suggestion, thanks! The message does not mention X, I will look into why..

I suspect that (?is) is still considered a group, but with no children.
It shouldn't be a group, it shouldn't appear as a RegExpTerm at all.

FirstItem does find the X..

Maybe as a test revert this commit: e2de0e6
That allows you to view the tree structure of the parsed regular expression in the ast-viewer in VSCode.

..but NfaUtils::prefix does not..

This reverts commit e2de0e6.

erik-krogh · 2023-08-16T11:58:16Z

python/ql/test/query-tests/Security/CWE-730-ReDoS/ReDoS.expected

@@ -105,3 +105,6 @@
 | redos.py:391:15:391:25 | (\\u0061\|a)* | This part of the regular expression may cause exponential backtracking on strings starting with 'X' and containing many repetitions of 'a'. |
 | unittests.py:5:17:5:23 | (\u00c6\|\\\u00c6)+ | This part of the regular expression may cause exponential backtracking on strings starting with 'X' and containing many repetitions of '\u00c6'. |
 | unittests.py:9:16:9:24 | (?:.\|\\n)* | This part of the regular expression may cause exponential backtracking on strings containing many repetitions of '\\n'. |
+| unittests.py:11:20:11:28 | (?:.\|\\n)* | This part of the regular expression may cause exponential backtracking on strings containing many repetitions of '\\n'. |
+| unittests.py:12:21:12:29 | (?:.\|\\n)* | This part of the regular expression may cause exponential backtracking on strings containing many repetitions of '\\n'. |
+| unittests.py:13:22:13:30 | (?:.\|\\n)* | This part of the regular expression may cause exponential backtracking on strings starting with 'x' and containing many repetitions of '\\n'. |


I love how the regex uses an upper-case X, but the alert-message uses a lower-case x (because it's case-insensitive).

erik-krogh

The test outcomes look good to me now 👍

I can't really comment on the parser architecture.

geoffw0 and others added 6 commits July 19, 2023 18:44

Python: Test layout.

cb6276e

Python: Add test cases.

dbde99d

Python: Fix for multiple parse mode flags.

bb16731

Python: Change note.

aaf9907

Python: QLDoc.

a0b784e

Python: make mode characters not be characters

6385b3c

They are simply considered part of the group start.

github-actions bot added documentation Python labels Aug 15, 2023

yoff mentioned this pull request Aug 15, 2023

Java: Understand multiple parse mode flags specified in a regular expression string #13778

Open

github-code-scanning bot found potential problems Aug 15, 2023

View reviewed changes

erik-krogh reviewed Aug 15, 2023

View reviewed changes

yoff added 4 commits August 16, 2023 10:56

Python: Add test with prefix

fc468c3

Revert "Python: Remove RegExpTerm from PrintAST"

524233d

This reverts commit e2de0e6.

shared: handle empty groups in delta

a20c787

PythonÆ fix test expectations

ee70643

erik-krogh reviewed Aug 16, 2023

View reviewed changes

Python: parse mode chars should not be considered chars #13975

Python: parse mode chars should not be considered chars #13975

yoff commented Aug 15, 2023

erik-krogh Aug 15, 2023 •

edited

yoff Aug 16, 2023

erik-krogh Aug 16, 2023

yoff Aug 16, 2023

erik-krogh Aug 16, 2023

yoff Aug 16, 2023

erik-krogh Aug 16, 2023

erik-krogh left a comment

Python: parse mode chars should not be considered chars #13975

Are you sure you want to change the base?

Python: parse mode chars should not be considered chars #13975

Conversation

yoff commented Aug 15, 2023

erik-krogh Aug 15, 2023 • edited

Choose a reason for hiding this comment

yoff Aug 16, 2023

Choose a reason for hiding this comment

erik-krogh Aug 16, 2023

Choose a reason for hiding this comment

yoff Aug 16, 2023

Choose a reason for hiding this comment

erik-krogh Aug 16, 2023

Choose a reason for hiding this comment

yoff Aug 16, 2023

Choose a reason for hiding this comment

erik-krogh Aug 16, 2023

Choose a reason for hiding this comment

erik-krogh left a comment

Choose a reason for hiding this comment

erik-krogh Aug 15, 2023 •

edited