lexers: devicetree: Fix catastrophic backtracking bug by grahamroff-dev · Pull Request #3057 · pygments/pygments

grahamroff-dev · 2026-03-06T18:18:25Z

The regex for parsing devicetree statements contains a property name lookahead that results in a near infinite loop if there are a lot of whitespace characters in the property value.

Problem can be seen by running this example script:

from pygments import highlight
from pygments.lexers import DevicetreeLexer
from pygments.formatters import NullFormatter

def test():
    dts_contents = '''
/ {
	soc {
		plic: interrupt-controller@c000000 {
			interrupts-extended = < &hlic0
			                        0x9 >;
		};
    }
}
    '''
    print('Starting highlight')
    highlight(dts_contents, DevicetreeLexer(), NullFormatter())

if __name__ == "__main__":
    test()

The script will appear to hang forever (technically it should eventually finish, maybe in a day or so).

Root Cause: Catastrophic Backtracking

The hang is caused by catastrophic backtracking in the property name lookahead regex inside lexers/devicetree.py.

The problematic pattern in the statements token rule is:

(r'[a-zA-Z_][\w-]*(?=(?:\s*,\s*[a-zA-Z_][\w-]*|(?:' + _ws + r'))*\s*[=;])',
 Name),

The second alternative inside the outer * is \s*(?:/[*][^*/]*?[*]/\s*)* (the _ws pattern), which can match an empty string. This creates the classic catastrophic backtracking pattern (A|B)* where B can match empty.

More details: In the example the lookahead is applied to \n\t\t\t 0x9 >; (~28 whitespace characters before 0x9). Since 0x9 > is not = or ;, the lookahead must fail — but the regex engine tries to split those 28 whitespace characters among the outer * iterations in 2²⁸ ≈ 268 million different ways before giving up. When 0x9 is on the same line without the extra indentation, the whitespace is just one space (2¹ = 2 attempts), so it completes instantly.

Fix applied: restructured the lookahead so the outer * can only match when a , is present (it can never match empty string):

(r'[a-zA-Z_][\w-]*(?=(?:\s*,' + _ws + r'[a-zA-Z_][\w-]*)*' + _ws + r'[=;])',
 Name),

This correctly handles all cases (comma-separated property names, comments before =/;) while eliminating the exponential backtracking.

The regex for parsing devicetree statements contains a property name lookahead that results in a near infinite loop if there are a lot of whitespace characters in the property value. Restructure the lookahead regex to avoid this scenario.

birkenfeld · 2026-03-07T08:02:04Z

Thanks for the PR! Can you add the example to the test cases?

Update the devicetree example file to include the problematic syntax that causes catastrophic backtracking.

grahamroff-dev · 2026-03-07T17:21:08Z

Thanks for the PR! Can you add the example to the test cases?

I updated the test case to include the problematic syntax, thanks!

birkenfeld · 2026-03-07T17:39:11Z

Great! :)

grahamroff-dev mentioned this pull request Mar 6, 2026

scripts: Add HTML build dashboard script zephyrproject-rtos/zephyr#104431

Merged

tests: devicetree: Update example file

8d50cdc

Update the devicetree example file to include the problematic syntax that causes catastrophic backtracking.

birkenfeld merged commit 524b5d3 into pygments:master Mar 7, 2026
15 checks passed

grahamroff-dev mentioned this pull request Mar 19, 2026

Request for permissions to support maintainance of scripts/dashboard zephyrproject-rtos/zephyr#105883

Closed

Anteru added this to the 2.20.0 milestone Mar 26, 2026

Anteru added the A-lexing area: changes to individual lexers label Mar 26, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

lexers: devicetree: Fix catastrophic backtracking bug#3057

lexers: devicetree: Fix catastrophic backtracking bug#3057
birkenfeld merged 2 commits intopygments:masterfrom
grahamroff-dev:devicetree-backtracking-fix

grahamroff-dev commented Mar 6, 2026

Uh oh!

birkenfeld commented Mar 7, 2026

Uh oh!

grahamroff-dev commented Mar 7, 2026

Uh oh!

Uh oh!

birkenfeld commented Mar 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

grahamroff-dev commented Mar 6, 2026

Uh oh!

birkenfeld commented Mar 7, 2026

Uh oh!

grahamroff-dev commented Mar 7, 2026

Uh oh!

Uh oh!

birkenfeld commented Mar 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants