Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python: parse mode chars should not be considered chars #13975

Draft
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

yoff
Copy link
Contributor

@yoff yoff commented Aug 15, 2023

This is a PR on top of #13779 to aid in the discussion around parse mode characters.

* Holds if a parse mode starts between `start` and `end`.
*/
private predicate flag_group_start(int start, int end) {
exists(int no_modes_end |

Check warning

Code scanning / CodeQL

Omittable 'exists' variable Warning

This exists variable can be omitted by using a don't-care expression
in this argument
.
@@ -7,3 +7,6 @@
# Treatment of line breaks
re.compile(r'(?:.|\n)*b') # No ReDoS.
re.compile(r'(?:.|\n)*b', re.DOTALL) # Has ReDoS.
re.compile(r'(?i)(?:.|\n)*b') # No ReDoS.
re.compile(r'(?s)(?:.|\n)*b') # Has ReDoS.
re.compile(r'(?is)(?:.|\n)*b') # Has ReDoS.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you also add the following as a test:

re.compile(r'(?is)X(?:.|\n)*Y')

The message for that should ideally mention that the attack string should start with an X.
If the message doesn't say that, then the "start" of the regex is wrong.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent suggestion, thanks! The message does not mention X, I will look into why..

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect that (?is) is still considered a group, but with no children.
It shouldn't be a group, it shouldn't appear as a RegExpTerm at all.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FirstItem does find the X..

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe as a test revert this commit: e2de0e6
That allows you to view the tree structure of the parsed regular expression in the ast-viewer in VSCode.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

..but NfaUtils::prefix does not..

@@ -105,3 +105,6 @@
| redos.py:391:15:391:25 | (\\u0061\|a)* | This part of the regular expression may cause exponential backtracking on strings starting with 'X' and containing many repetitions of 'a'. |
| unittests.py:5:17:5:23 | (\u00c6\|\\\u00c6)+ | This part of the regular expression may cause exponential backtracking on strings starting with 'X' and containing many repetitions of '\u00c6'. |
| unittests.py:9:16:9:24 | (?:.\|\\n)* | This part of the regular expression may cause exponential backtracking on strings containing many repetitions of '\\n'. |
| unittests.py:11:20:11:28 | (?:.\|\\n)* | This part of the regular expression may cause exponential backtracking on strings containing many repetitions of '\\n'. |
| unittests.py:12:21:12:29 | (?:.\|\\n)* | This part of the regular expression may cause exponential backtracking on strings containing many repetitions of '\\n'. |
| unittests.py:13:22:13:30 | (?:.\|\\n)* | This part of the regular expression may cause exponential backtracking on strings starting with 'x' and containing many repetitions of '\\n'. |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I love how the regex uses an upper-case X, but the alert-message uses a lower-case x (because it's case-insensitive).

Copy link
Contributor

@erik-krogh erik-krogh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test outcomes look good to me now 👍

I can't really comment on the parser architecture.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants