Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Swift: Recognize regular expression parse mode flags #13715

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

geoffw0
Copy link
Contributor

@geoffw0 geoffw0 commented Jul 11, 2023

Recognize regular expression parse mode flags in Swift. For example in (?i)foo, the (?i) specifies a case-insensitive match of the rest of the regular expression foo.

This improves the accuracy of the REDOS and bad HTML filtering queries (and others to come).

Limitations:

  1. does not support alternative ways of setting the parse mode, i.e. using function calls / parameters outside of the regex string itself (this will be high priority follow-up work)
  2. does not support modes specified in regex literals (because we don't support regex literals yet)
  3. does not support setting the mode in the middle of a regex foo(?i)bar, which is allowed in Swift
  4. does not support setting the mode only for a portion of a regex (?i:foo)bar, which is allowed in Swift
  5. does not support turning flags off (?i-s), which is allowed in Swift

@erik-krogh @joefarebrother the initial version of this was based on code from Java and is also very similar to Python. I found that we were failing to recognize multiple mode flags (e.g. (?is)), which I fixed with 6e80021. Please could you review those changes with a view to both correctness for Swift, and whether or not it would be desirable for me to port that fix to Java and/or Python?

I'm also curious whether we support any of what I described in limitations 3-5 above in other languages. I suspect we do not, which makes it a pretty low priority for Swift where we're currently playing catch-up - but if we do it would be a starting point to solving them.

@geoffw0 geoffw0 added the Swift label Jul 11, 2023
@geoffw0 geoffw0 requested a review from a team as a code owner July 11, 2023 08:43
@erik-krogh
Copy link
Contributor

erik-krogh commented Jul 11, 2023

The multiple modes commit looks good to me, and I think the other languages could use a port of that fix.

I don't know of other languages that support limitations 3-5.
If they support it, then it isn't something I've encountered.

* MULTILINE
* UNICODE
*/
string getAMode() { result = this.getModeFromPrefix() }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the advantage of splitting the implementation out into the separate getModeFromPrefix?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants