lexers: devicetree: Fix catastrophic backtracking bug#3057
Merged
birkenfeld merged 2 commits intopygments:masterfrom Mar 7, 2026
Merged
lexers: devicetree: Fix catastrophic backtracking bug#3057birkenfeld merged 2 commits intopygments:masterfrom
birkenfeld merged 2 commits intopygments:masterfrom
Conversation
The regex for parsing devicetree statements contains a property name lookahead that results in a near infinite loop if there are a lot of whitespace characters in the property value. Restructure the lookahead regex to avoid this scenario.
Member
|
Thanks for the PR! Can you add the example to the test cases? |
Update the devicetree example file to include the problematic syntax that causes catastrophic backtracking.
Contributor
Author
I updated the test case to include the problematic syntax, thanks! |
Member
|
Great! :) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The regex for parsing devicetree statements contains a property name lookahead that results in a near infinite loop if there are a lot of whitespace characters in the property value.
Problem can be seen by running this example script:
The script will appear to hang forever (technically it should eventually finish, maybe in a day or so).
Root Cause: Catastrophic Backtracking
The hang is caused by catastrophic backtracking in the property name lookahead regex inside
lexers/devicetree.py.The problematic pattern in the
statementstoken rule is:The second alternative inside the outer
*is\s*(?:/[*][^*/]*?[*]/\s*)*(the_wspattern), which can match an empty string. This creates the classic catastrophic backtracking pattern(A|B)*whereBcan match empty.More details: In the example the lookahead is applied to
\n\t\t\t 0x9 >;(~28 whitespace characters before0x9). Since0x9 >is not=or;, the lookahead must fail — but the regex engine tries to split those 28 whitespace characters among the outer*iterations in 2²⁸ ≈ 268 million different ways before giving up. When0x9is on the same line without the extra indentation, the whitespace is just one space (2¹ = 2 attempts), so it completes instantly.Fix applied: restructured the lookahead so the outer
*can only match when a,is present (it can never match empty string):This correctly handles all cases (comma-separated property names, comments before
=/;) while eliminating the exponential backtracking.