Skip to content

Swift: Query for REDOS (Regular Expression Denial Of Service)#13548

Merged
geoffw0 merged 8 commits into
github:mainfrom
geoffw0:redos
Jul 14, 2023
Merged

Swift: Query for REDOS (Regular Expression Denial Of Service)#13548
geoffw0 merged 8 commits into
github:mainfrom
geoffw0:redos

Conversation

@geoffw0

@geoffw0 geoffw0 commented Jun 23, 2023

Copy link
Copy Markdown
Contributor

Adds a query for REDOS (Regular Expression Denial Of Service), built atop the regular expressions library from #13470.

At present there are significant limitations (the largest of which being that the regex library doesn't support regular expression literals yet), but I believe the query adds value in its current state, and will find more results as we improve the library.

TODO:

  • team review
  • doc review
  • DCA run

@geoffw0 geoffw0 added the Swift label Jun 23, 2023
@geoffw0 geoffw0 requested a review from a team as a code owner June 23, 2023 15:51
@github-actions

github-actions Bot commented Jun 23, 2023

Copy link
Copy Markdown
Contributor

QHelp previews:

java/ql/src/Security/CWE/CWE-730/PolynomialReDoS.qhelp

Polynomial regular expression used on uncontrolled data

Some regular expressions take a long time to match certain input strings to the point where the time it takes to match a string of length n is proportional to nk or even 2n. Such regular expressions can negatively affect performance, or even allow a malicious user to perform a Denial of Service ("DoS") attack by crafting an expensive input string for the regular expression to match.

The regular expression engine provided by Java uses a backtracking non-deterministic finite automata to implement regular expression matching. While this approach is space-efficient and allows supporting advanced features like capture groups, it is not time-efficient in general. The worst-case time complexity of such an automaton can be polynomial or even exponential, meaning that for strings of a certain shape, increasing the input length by ten characters may make the automaton about 1000 times slower.

Typically, a regular expression is affected by this problem if it contains a repetition of the form r* or r+ where the sub-expression r is ambiguous in the sense that it can match some string in multiple ways. More information about the precise circumstances can be found in the references.

Note that Java versions 9 and above have some mitigations against ReDoS; however they aren't perfect and more complex regular expressions can still be affected by this problem.

Recommendation

Modify the regular expression to remove the ambiguity, or ensure that the strings matched with the regular expression are short enough that the time-complexity does not matter. Alternatively, an alternate regex library that guarantees linear time execution, such as Google's RE2J, may be used.

Example

Consider this use of a regular expression, which removes all leading and trailing whitespace in a string:

Pattern.compile("^\\s+|\\s+$").matcher(text).replaceAll("") // BAD

The sub-expression "\\s+$" will match the whitespace characters in text from left to right, but it can start matching anywhere within a whitespace sequence. This is problematic for strings that do not end with a whitespace character. Such a string will force the regular expression engine to process each whitespace sequence once per whitespace character in the sequence.

This ultimately means that the time cost of trimming a string is quadratic in the length of the string. So a string like "a b" will take milliseconds to process, but a similar string with a million spaces instead of just one will take several minutes.

Avoid this problem by rewriting the regular expression to not contain the ambiguity about when to start matching whitespace sequences. For instance, by using a negative look-behind ("^\\s+|(?<!\\s)\\s+$"), or just by using the built-in trim method (text.trim()).

Note that the sub-expression "^\\s+" is not problematic as the ^ anchor restricts when that sub-expression can start matching, and as the regular expression engine matches from left to right.

Example

As a similar, but slightly subtler problem, consider the regular expression that matches lines with numbers, possibly written using scientific notation:

"^0\\.\\d+E?\\d+$"" 

The problem with this regular expression is in the sub-expression \d+E?\d+ because the second \d+ can start matching digits anywhere after the first match of the first \d+ if there is no E in the input string.

This is problematic for strings that do not end with a digit. Such a string will force the regular expression engine to process each digit sequence once per digit in the sequence, again leading to a quadratic time complexity.

To make the processing faster, the regular expression should be rewritten such that the two \d+ sub-expressions do not have overlapping matches: "^0\\.\\d+(E\\d+)?$".

Example

Sometimes it is unclear how a regular expression can be rewritten to avoid the problem. In such cases, it often suffices to limit the length of the input string. For instance, the following regular expression is used to match numbers, and on some non-number inputs it can have quadratic time complexity:

Pattern.matches("^(\\+|-)?(\\d+|(\\d*\\.\\d*))?(E|e)?([-+])?(\\d+)?$", str); 

It is not immediately obvious how to rewrite this regular expression to avoid the problem. However, you can mitigate performance issues by limiting the length to 1000 characters, which will always finish in a reasonable amount of time.

if (str.length() > 1000) {
    throw new IllegalArgumentException("Input too long");
}

Pattern.matches("^(\\+|-)?(\\d+|(\\d*\\.\\d*))?(E|e)?([-+])?(\\d+)?$", str); 

References

java/ql/src/Security/CWE/CWE-730/ReDoS.qhelp

Inefficient regular expression

Some regular expressions take a long time to match certain input strings to the point where the time it takes to match a string of length n is proportional to nk or even 2n. Such regular expressions can negatively affect performance, or even allow a malicious user to perform a Denial of Service ("DoS") attack by crafting an expensive input string for the regular expression to match.

The regular expression engine provided by Java uses a backtracking non-deterministic finite automata to implement regular expression matching. While this approach is space-efficient and allows supporting advanced features like capture groups, it is not time-efficient in general. The worst-case time complexity of such an automaton can be polynomial or even exponential, meaning that for strings of a certain shape, increasing the input length by ten characters may make the automaton about 1000 times slower.

Typically, a regular expression is affected by this problem if it contains a repetition of the form r* or r+ where the sub-expression r is ambiguous in the sense that it can match some string in multiple ways. More information about the precise circumstances can be found in the references.

Note that Java versions 9 and above have some mitigations against ReDoS; however they aren't perfect and more complex regular expressions can still be affected by this problem.

Recommendation

Modify the regular expression to remove the ambiguity, or ensure that the strings matched with the regular expression are short enough that the time-complexity does not matter. Alternatively, an alternate regex library that guarantees linear time execution, such as Google's RE2J, may be used.

Example

Consider this regular expression:

^_(__|.)+_$

Its sub-expression "(__|.)+?" can match the string "__" either by the first alternative "__" to the left of the "|" operator, or by two repetitions of the second alternative "." to the right. Thus, a string consisting of an odd number of underscores followed by some other character will cause the regular expression engine to run for an exponential amount of time before rejecting the input.

This problem can be avoided by rewriting the regular expression to remove the ambiguity between the two branches of the alternative inside the repetition:

^_(__|[^_])+_$

References

javascript/ql/src/Performance/PolynomialReDoS.qhelp

Polynomial regular expression used on uncontrolled data

Some regular expressions take a long time to match certain input strings to the point where the time it takes to match a string of length n is proportional to nk or even 2n. Such regular expressions can negatively affect performance, or even allow a malicious user to perform a Denial of Service ("DoS") attack by crafting an expensive input string for the regular expression to match.

The regular expression engines provided by many popular JavaScript platforms use backtracking non-deterministic finite automata to implement regular expression matching. While this approach is space-efficient and allows supporting advanced features like capture groups, it is not time-efficient in general. The worst-case time complexity of such an automaton can be polynomial or even exponential, meaning that for strings of a certain shape, increasing the input length by ten characters may make the automaton about 1000 times slower.

Typically, a regular expression is affected by this problem if it contains a repetition of the form r* or r+ where the sub-expression r is ambiguous in the sense that it can match some string in multiple ways. More information about the precise circumstances can be found in the references.

Recommendation

Modify the regular expression to remove the ambiguity, or ensure that the strings matched with the regular expression are short enough that the time-complexity does not matter.

Example

Consider this use of a regular expression, which removes all leading and trailing whitespace in a string:

text.replace(/^\s+|\s+$/g, ''); // BAD

The sub-expression "\s+$" will match the whitespace characters in text from left to right, but it can start matching anywhere within a whitespace sequence. This is problematic for strings that do not end with a whitespace character. Such a string will force the regular expression engine to process each whitespace sequence once per whitespace character in the sequence.

This ultimately means that the time cost of trimming a string is quadratic in the length of the string. So a string like "a b" will take milliseconds to process, but a similar string with a million spaces instead of just one will take several minutes.

Avoid this problem by rewriting the regular expression to not contain the ambiguity about when to start matching whitespace sequences. For instance, by using a negative look-behind (/^\s+|(?<!\s)\s+$/g), or just by using the built-in trim method (text.trim()).

Note that the sub-expression "^\s+" is not problematic as the ^ anchor restricts when that sub-expression can start matching, and as the regular expression engine matches from left to right.

Example

As a similar, but slightly subtler problem, consider the regular expression that matches lines with numbers, possibly written using scientific notation:

/^0\.\d+E?\d+$/.test(str) // BAD

The problem with this regular expression is in the sub-expression \d+E?\d+ because the second \d+ can start matching digits anywhere after the first match of the first \d+ if there is no E in the input string.

This is problematic for strings that do not end with a digit. Such a string will force the regular expression engine to process each digit sequence once per digit in the sequence, again leading to a quadratic time complexity.

To make the processing faster, the regular expression should be rewritten such that the two \d+ sub-expressions do not have overlapping matches: ^0\.\d+(E\d+)?$.

Example

Sometimes it is unclear how a regular expression can be rewritten to avoid the problem. In such cases, it often suffices to limit the length of the input string. For instance, the following regular expression is used to match numbers, and on some non-number inputs it can have quadratic time complexity:

/^(\+|-)?(\d+|(\d*\.\d*))?(E|e)?([-+])?(\d+)?$/.test(str) // BAD

It is not immediately obvious how to rewrite this regular expression to avoid the problem. However, you can mitigate performance issues by limiting the length to 1000 characters, which will always finish in a reasonable amount of time.

if (str.length > 1000) {
    throw new Error("Input too long");
}

/^(\+|-)?(\d+|(\d*\.\d*))?(E|e)?([-+])?(\d+)?$/.test(str)

References

javascript/ql/src/Performance/ReDoS.qhelp

Inefficient regular expression

Some regular expressions take a long time to match certain input strings to the point where the time it takes to match a string of length n is proportional to nk or even 2n. Such regular expressions can negatively affect performance, or even allow a malicious user to perform a Denial of Service ("DoS") attack by crafting an expensive input string for the regular expression to match.

The regular expression engines provided by many popular JavaScript platforms use backtracking non-deterministic finite automata to implement regular expression matching. While this approach is space-efficient and allows supporting advanced features like capture groups, it is not time-efficient in general. The worst-case time complexity of such an automaton can be polynomial or even exponential, meaning that for strings of a certain shape, increasing the input length by ten characters may make the automaton about 1000 times slower.

Typically, a regular expression is affected by this problem if it contains a repetition of the form r* or r+ where the sub-expression r is ambiguous in the sense that it can match some string in multiple ways. More information about the precise circumstances can be found in the references.

Recommendation

Modify the regular expression to remove the ambiguity, or ensure that the strings matched with the regular expression are short enough that the time-complexity does not matter.

Example

Consider this regular expression:

/^_(__|.)+_$/

Its sub-expression "(__|.)+?" can match the string "__" either by the first alternative "__" to the left of the "|" operator, or by two repetitions of the second alternative "." to the right. Thus, a string consisting of an odd number of underscores followed by some other character will cause the regular expression engine to run for an exponential amount of time before rejecting the input.

This problem can be avoided by rewriting the regular expression to remove the ambiguity between the two branches of the alternative inside the repetition:

/^_(__|[^_])+_$/

References

python/ql/src/Security/CWE-730/PolynomialReDoS.qhelp

Polynomial regular expression used on uncontrolled data

Some regular expressions take a long time to match certain input strings to the point where the time it takes to match a string of length n is proportional to nk or even 2n. Such regular expressions can negatively affect performance, or even allow a malicious user to perform a Denial of Service ("DoS") attack by crafting an expensive input string for the regular expression to match.

The regular expression engine provided by Python uses a backtracking non-deterministic finite automata to implement regular expression matching. While this approach is space-efficient and allows supporting advanced features like capture groups, it is not time-efficient in general. The worst-case time complexity of such an automaton can be polynomial or even exponential, meaning that for strings of a certain shape, increasing the input length by ten characters may make the automaton about 1000 times slower.

Typically, a regular expression is affected by this problem if it contains a repetition of the form r* or r+ where the sub-expression r is ambiguous in the sense that it can match some string in multiple ways. More information about the precise circumstances can be found in the references.

Recommendation

Modify the regular expression to remove the ambiguity, or ensure that the strings matched with the regular expression are short enough that the time-complexity does not matter.

Example

Consider this use of a regular expression, which removes all leading and trailing whitespace in a string:

re.sub(r"^\s+|\s+$", "", text) # BAD

The sub-expression "\s+$" will match the whitespace characters in text from left to right, but it can start matching anywhere within a whitespace sequence. This is problematic for strings that do not end with a whitespace character. Such a string will force the regular expression engine to process each whitespace sequence once per whitespace character in the sequence.

This ultimately means that the time cost of trimming a string is quadratic in the length of the string. So a string like "a b" will take milliseconds to process, but a similar string with a million spaces instead of just one will take several minutes.

Avoid this problem by rewriting the regular expression to not contain the ambiguity about when to start matching whitespace sequences. For instance, by using a negative look-behind (^\s+|(?<!\s)\s+$), or just by using the built-in strip method (text.strip()).

Note that the sub-expression "^\s+" is not problematic as the ^ anchor restricts when that sub-expression can start matching, and as the regular expression engine matches from left to right.

Example

As a similar, but slightly subtler problem, consider the regular expression that matches lines with numbers, possibly written using scientific notation:

^0\.\d+E?\d+$ # BAD

The problem with this regular expression is in the sub-expression \d+E?\d+ because the second \d+ can start matching digits anywhere after the first match of the first \d+ if there is no E in the input string.

This is problematic for strings that do not end with a digit. Such a string will force the regular expression engine to process each digit sequence once per digit in the sequence, again leading to a quadratic time complexity.

To make the processing faster, the regular expression should be rewritten such that the two \d+ sub-expressions do not have overlapping matches: ^0\.\d+(E\d+)?$.

Example

Sometimes it is unclear how a regular expression can be rewritten to avoid the problem. In such cases, it often suffices to limit the length of the input string. For instance, the following regular expression is used to match numbers, and on some non-number inputs it can have quadratic time complexity:

match = re.search(r'^(\+|-)?(\d+|(\d*\.\d*))?(E|e)?([-+])?(\d+)?$', str) 

It is not immediately obvious how to rewrite this regular expression to avoid the problem. However, you can mitigate performance issues by limiting the length to 1000 characters, which will always finish in a reasonable amount of time.

if len(str) > 1000:
    raise ValueError("Input too long")

match = re.search(r'^(\+|-)?(\d+|(\d*\.\d*))?(E|e)?([-+])?(\d+)?$', str) 

References

python/ql/src/Security/CWE-730/ReDoS.qhelp

Inefficient regular expression

Some regular expressions take a long time to match certain input strings to the point where the time it takes to match a string of length n is proportional to nk or even 2n. Such regular expressions can negatively affect performance, or even allow a malicious user to perform a Denial of Service ("DoS") attack by crafting an expensive input string for the regular expression to match.

The regular expression engine provided by Python uses a backtracking non-deterministic finite automata to implement regular expression matching. While this approach is space-efficient and allows supporting advanced features like capture groups, it is not time-efficient in general. The worst-case time complexity of such an automaton can be polynomial or even exponential, meaning that for strings of a certain shape, increasing the input length by ten characters may make the automaton about 1000 times slower.

Typically, a regular expression is affected by this problem if it contains a repetition of the form r* or r+ where the sub-expression r is ambiguous in the sense that it can match some string in multiple ways. More information about the precise circumstances can be found in the references.

Recommendation

Modify the regular expression to remove the ambiguity, or ensure that the strings matched with the regular expression are short enough that the time-complexity does not matter.

Example

Consider this regular expression:

^_(__|.)+_$

Its sub-expression "(__|.)+?" can match the string "__" either by the first alternative "__" to the left of the "|" operator, or by two repetitions of the second alternative "." to the right. Thus, a string consisting of an odd number of underscores followed by some other character will cause the regular expression engine to run for an exponential amount of time before rejecting the input.

This problem can be avoided by rewriting the regular expression to remove the ambiguity between the two branches of the alternative inside the repetition:

^_(__|[^_])+_$

References

ruby/ql/src/queries/security/cwe-1333/PolynomialReDoS.qhelp

Polynomial regular expression used on uncontrolled data

Some regular expressions take a long time to match certain input strings to the point where the time it takes to match a string of length n is proportional to nk or even 2n. Such regular expressions can negatively affect performance, or even allow a malicious user to perform a Denial of Service ("DoS") attack by crafting an expensive input string for the regular expression to match.

The regular expression engine used by the Ruby interpreter (MRI) uses backtracking non-deterministic finite automata to implement regular expression matching. While this approach is space-efficient and allows supporting advanced features like capture groups, it is not time-efficient in general. The worst-case time complexity of such an automaton can be polynomial or even exponential, meaning that for strings of a certain shape, increasing the input length by ten characters may make the automaton about 1000 times slower.

Note that Ruby 3.2 and later have implemented a caching mechanism that completely eliminates the worst-case time complexity for the regular expressions flagged by this query. The regular expressions flagged by this query are therefore only problematic for Ruby versions prior to 3.2.

Typically, a regular expression is affected by this problem if it contains a repetition of the form r* or r+ where the sub-expression r is ambiguous in the sense that it can match some string in multiple ways. More information about the precise circumstances can be found in the references.

Recommendation

Modify the regular expression to remove the ambiguity, or ensure that the strings matched with the regular expression are short enough that the time-complexity does not matter.

Example

Consider this use of a regular expression, which removes all leading and trailing whitespace in a string:

text.gsub!(/^\s+|\s+$/, '') # BAD

The sub-expression "\s+$" will match the whitespace characters in text from left to right, but it can start matching anywhere within a whitespace sequence. This is problematic for strings that do not end with a whitespace character. Such a string will force the regular expression engine to process each whitespace sequence once per whitespace character in the sequence.

This ultimately means that the time cost of trimming a string is quadratic in the length of the string. So a string like "a b" will take milliseconds to process, but a similar string with a million spaces instead of just one will take several minutes.

Avoid this problem by rewriting the regular expression to not contain the ambiguity about when to start matching whitespace sequences. For instance, by using a negative look-behind (/^\s+|(?<!\s)\s+$/), or just by using the built-in strip method (text.strip!).

Note that the sub-expression "^\s+" is not problematic as the ^ anchor restricts when that sub-expression can start matching, and as the regular expression engine matches from left to right.

Example

As a similar, but slightly subtler problem, consider the regular expression that matches lines with numbers, possibly written using scientific notation:

/^0\.\d+E?\d+$/ # BAD

The problem with this regular expression is in the sub-expression \d+E?\d+ because the second \d+ can start matching digits anywhere after the first match of the first \d+ if there is no E in the input string.

This is problematic for strings that do not end with a digit. Such a string will force the regular expression engine to process each digit sequence once per digit in the sequence, again leading to a quadratic time complexity.

To make the processing faster, the regular expression should be rewritten such that the two \d+ sub-expressions do not have overlapping matches: /^0\.\d+(E\d+)?$/.

Example

Sometimes it is unclear how a regular expression can be rewritten to avoid the problem. In such cases, it often suffices to limit the length of the input string. For instance, the following regular expression is used to match numbers, and on some non-number inputs it can have quadratic time complexity:

is_matching = /^(\+|-)?(\d+|(\d*\.\d*))?(E|e)?([-+])?(\d+)?$/.match?(str)

It is not immediately obvious how to rewrite this regular expression to avoid the problem. However, you can mitigate performance issues by limiting the length to 1000 characters, which will always finish in a reasonable amount of time.

if str.length > 1000
    raise ArgumentError, "Input too long"
end

is_matching = /^(\+|-)?(\d+|(\d*\.\d*))?(E|e)?([-+])?(\d+)?$/.match?(str)

References

ruby/ql/src/queries/security/cwe-1333/ReDoS.qhelp

Inefficient regular expression

Some regular expressions take a long time to match certain input strings to the point where the time it takes to match a string of length n is proportional to nk or even 2n. Such regular expressions can negatively affect performance, or even allow a malicious user to perform a Denial of Service ("DoS") attack by crafting an expensive input string for the regular expression to match.

The regular expression engine used by the Ruby interpreter (MRI) uses backtracking non-deterministic finite automata to implement regular expression matching. While this approach is space-efficient and allows supporting advanced features like capture groups, it is not time-efficient in general. The worst-case time complexity of such an automaton can be polynomial or even exponential, meaning that for strings of a certain shape, increasing the input length by ten characters may make the automaton about 1000 times slower.

Note that Ruby 3.2 and later have implemented a caching mechanism that completely eliminates the worst-case time complexity for the regular expressions flagged by this query. The regular expressions flagged by this query are therefore only problematic for Ruby versions prior to 3.2.

Typically, a regular expression is affected by this problem if it contains a repetition of the form r* or r+ where the sub-expression r is ambiguous in the sense that it can match some string in multiple ways. More information about the precise circumstances can be found in the references.

Recommendation

Modify the regular expression to remove the ambiguity, or ensure that the strings matched with the regular expression are short enough that the time-complexity does not matter.

Example

Consider this regular expression:

/^_(__|.)+_$/

Its sub-expression "(__|.)+?" can match the string "__" either by the first alternative "__" to the left of the "|" operator, or by two repetitions of the second alternative "." to the right. Thus, a string consisting of an odd number of underscores followed by some other character will cause the regular expression engine to run for an exponential amount of time before rejecting the input.

This problem can be avoided by rewriting the regular expression to remove the ambiguity between the two branches of the alternative inside the repetition:

/^_(__|[^_])+_$/

References

swift/ql/src/queries/Security/CWE-1333/ReDoS.qhelp

Inefficient regular expression

Some regular expressions take a long time to match certain input strings to the point where the time it takes to match a string of length n is proportional to nk or even 2n. Such regular expressions can negatively affect performance, and potentially allow a malicious user to perform a Denial of Service ("DoS") attack by crafting an expensive input string for the regular expression to match.

The regular expression engine used by Swift uses backtracking non-deterministic finite automata to implement regular expression matching. While this approach is space-efficient and allows supporting advanced features like capture groups, it is not time-efficient in general. The worst-case time complexity of such an automaton can be polynomial or exponential, meaning that for strings of a certain shape, increasing the input length by ten characters may make the automaton about 1000 times slower.

Typically, a regular expression is affected by this problem if it contains a repetition of the form r* or r+ where the sub-expression r is ambiguous in the sense that it can match some string in multiple ways. More information about the precise circumstances can be found in the references.

Recommendation

Modify the regular expression to remove the ambiguity, or ensure that the strings matched with the regular expression are short enough that the time complexity does not matter.

Example

Consider the following regular expression:

/^_(__|.)+_$/

Its sub-expression "(__|.)+" can match the string "__" either by the first alternative "__" to the left of the "|" operator, or by two repetitions of the second alternative "." to the right. Therefore, a string consisting of an odd number of underscores followed by some other character will cause the regular expression engine to run for an exponential amount of time before rejecting the input.

This problem can be avoided by rewriting the regular expression to remove the ambiguity between the two branches of the alternative inside the repetition:

/^_(__|[^_])+_$/

References


let nsregex3 = try? NSRegularExpression(pattern: ".*") // GOOD (safe regex)
_ = nsregex3?.stringByReplacingMatches(in: tainted, range: NSRange(location: 0, length: tainted.utf16.count), withTemplate: "")
}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are many more test cases for the core logic of this query, i.e. identifying REDOS vulnerable regular expressions, in the library test file https://github.com/github/codeql/blob/main/swift/ql/test/library-tests/regex/redos_variants.swift . These include realistic cases and edge cases designed to test the limits of the engine.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. As long as this query delegates to the same implementation, there's no need to copy those test cases here.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, the query tests mostly just confirm that everything is wired up correctly and presents the results in a reasonable manner.

@d10c d10c left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with one minor suggestion. Will this be run as part of the standard query suite? If so, might want to do a DCA run before merging.

Comment thread swift/ql/src/queries/Security/CWE-1333/ReDoSIntroduction.inc.qhelp Outdated

let nsregex3 = try? NSRegularExpression(pattern: ".*") // GOOD (safe regex)
_ = nsregex3?.stringByReplacingMatches(in: tainted, range: NSRange(location: 0, length: tainted.utf16.count), withTemplate: "")
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. As long as this query delegates to the same implementation, there's no need to copy those test cases here.

@MathiasVP

MathiasVP commented Jul 3, 2023

Copy link
Copy Markdown
Contributor

LGTM with one minor suggestion. Will this be run as part of the standard query suite? If so, might want to do a DCA run before merging.

We should definitely have a DCA run for this. It'll go directly into the Code Scanning suite as it:

  • Is tagged as a security query, and
  • is tagged with @precision high.

This library also frequently appear as the cause of various timeouts in other languages 😄.

@geoffw0

geoffw0 commented Jul 3, 2023

Copy link
Copy Markdown
Contributor Author

Thanks for reviewing...

Will this be run as part of the standard query suite?

As @MathiasVP said this will run in the standard suite due to its metadata. We sometimes start queries at a lower @precision but this tends to be when there are known and common false positives, and I've not seen that.

If so, might want to do a DCA run before merging.

Definitely. I've just started a run.

This library also frequently appear as the cause of various timeouts in other languages 😄.

Yeah, we've already encountered (and fixed) a timeout in one of the test regexps in the library PR (#13470). I didn't encounter any problems on the MRVA top 1000 though. In any case I'm moderately confident any further problems we encounter will be isolated to specific forms of regex (ironically, given what this query is about) and can be investigated and solved.

@geoffw0 geoffw0 added the ready-for-doc-review This PR requires and is ready for review from the GitHub docs team. label Jul 3, 2023
@geoffw0

geoffw0 commented Jul 4, 2023

Copy link
Copy Markdown
Contributor Author

DCA LGTM. Analysis time is not measurably slower, which if anything is a slight surprise as this is the first query to use the new regex library.

d10c
d10c previously approved these changes Jul 4, 2023

@d10c d10c left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! (Oh, actually let's wait for doc review)

@geoffw0

geoffw0 commented Jul 10, 2023

Copy link
Copy Markdown
Contributor Author

@github/docs-content-codeql please could you confirm this one hasn't been forgotten?

@geoffw0

geoffw0 commented Jul 10, 2023

Copy link
Copy Markdown
Contributor Author

Note that the qhelp preview generated all sorts of stuff, only the Swift .qhelp is under review here.

subatoi
subatoi previously approved these changes Jul 13, 2023

@subatoi subatoi left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good 👍 —just had some very minor Docs suggestions

Comment thread swift/ql/src/queries/Security/CWE-1333/ReDoSIntroduction.inc.qhelp Outdated
Comment thread swift/ql/src/queries/Security/CWE-1333/ReDoS.qhelp Outdated
Comment thread swift/ql/src/queries/Security/CWE-1333/ReDoS.qhelp Outdated
@subatoi subatoi self-assigned this Jul 13, 2023
Co-authored-by: Ben Ahmady <32935794+subatoi@users.noreply.github.com>
@geoffw0 geoffw0 dismissed stale reviews from subatoi and d10c via 962c16d July 13, 2023 18:20
@geoffw0

geoffw0 commented Jul 13, 2023

Copy link
Copy Markdown
Contributor Author

Thanks @subatoi, all suggestions accepted!

Now this just needs a re-approval from someone and merging.

@geoffw0 geoffw0 merged commit 1c8297b into github:main Jul 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation ready-for-doc-review This PR requires and is ready for review from the GitHub docs team. Swift

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants