Skip to content

ReDoS: restrict the edges considered in polynomial-redos for complex regular expressions#12543

Merged
erik-krogh merged 4 commits intogithub:mainfrom
erik-krogh:reg-perf
Mar 20, 2023
Merged

ReDoS: restrict the edges considered in polynomial-redos for complex regular expressions#12543
erik-krogh merged 4 commits intogithub:mainfrom
erik-krogh:reg-perf

Conversation

@erik-krogh
Copy link
Copy Markdown
Contributor

@erik-krogh erik-krogh commented Mar 15, 2023

Fixes #12528

The root cause is that some regular expressions just have a massive search space.
I've tried various solutions, but I couldn't find anything good.

So I've made an approach where a simplified search is done when a regular expression is determined to be sufficiently complex (currently measured by the amount of states in the NFA).
Consider this simplified regex:

/\w*(A|B)(C|D)(E|F)\w*Y/

There are many different paths from the first \w to the second in the NFA, and that blows up the complexity of the analysis.
When the complexity metric is hit the search will only consider one of the alternatives for each location where the regex is branching.
This causes us to miss some results, but it's very few results that are lost. E.g. the polynomial-redos in the above regex will still be flagged.

An evaluation on Ruby/Python/JavaScript (with the standard nightly source-suite) shows no lost results.
(The Ruby evaluation shows lost results, but we still flag other parts of the same regular expression).

I don't like using a complexity metric, but I didn't find any better approach.


Also drive-by fix that stops Python from parsing regular expression in extracted dependencies.
This made debugging way easier, and I feel that it is a good change.
(I'm quite sure I found an ugly way of achieving my goal, better solutions are very welcome).

@erik-krogh erik-krogh added the WIP This is a work-in-progress, do not merge yet! label Mar 15, 2023
@github-actions github-actions bot removed the Ruby label Mar 16, 2023
@erik-krogh erik-krogh changed the title ReDoS: performance thing ReDoS: restrict the edges considered in polynomial-redos for complex regular expressions Mar 16, 2023
@erik-krogh erik-krogh added Ruby and removed WIP This is a work-in-progress, do not merge yet! labels Mar 16, 2023
@erik-krogh erik-krogh marked this pull request as ready for review March 16, 2023 12:32
@erik-krogh erik-krogh requested review from a team as code owners March 16, 2023 12:32
@intrigus-lgtm
Copy link
Copy Markdown
Contributor

intrigus-lgtm commented Mar 16, 2023

If I remember correctly there are other tools that also try to find ReDoS. Do these tools also fail/take very long on this specific regex?
Basically, I'm just curious whether this is an inherent problem to ReDoS detection; if I were to guess, I'd assume that this problem is np complete (as most interesting problems are 😢 )?

@erik-krogh erik-krogh added the no-change-note-required This PR does not need a change note label Mar 16, 2023
@erik-krogh
Copy link
Copy Markdown
Contributor Author

If I remember correctly there are other tools that also try to find ReDoS. Do these tools also fail/take very long on this specific regex?

It is a problem with the implementation strategy I've taken in implementing this.
Which is a natural way of implementing it in CodeQL.

E.g. the Java implementation from the author of the research I've based my implementation on does not experience the same issue.

The bottom-up evaluation nature of CodeQL hurts here.

tausbn
tausbn previously approved these changes Mar 20, 2023
Copy link
Copy Markdown
Contributor

@tausbn tausbn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One typo, otherwise LGTM. 👍

Co-authored-by: Taus <tausbn@github.com>
@asgerf
Copy link
Copy Markdown
Contributor

asgerf commented Mar 20, 2023

LGTM if you want to merge like this.

But if you want to try and fix it in a more principled way I think this would work: (possibly for a future PR)

  • Include pivot in the StateTuple. It then represents a state tuple (q1, q2, q3) that is known to be reachable from (pivot, _, _).
  • Sharpen isFeasibleTuple to require that q1 can reach the associated pivot (rather than just any pivot)
  • Compute a variant of isReachableFromStartTuple that does not have trace and dist (the exponential number of traces is what caused this to blow up).
  • Compute dist using the shortestPath HOP over the subset of StateTuples that are reachable.
    • Here it's beneficial that pivot is part of the StateTuple, since we get an independent distance measure for each pivot.
  • Materialize traces based on the distance measure. In the recursive case, only extend a trace to the successor with the smallest distance from the pivot, breaking ties by picking the successor with the smallest InputSymbol.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

JS no-change-note-required This PR does not need a change note Python

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Python: polynomial-redos hangs in sonic-net/sonic-mgmt repo

4 participants