Fix catastrophic backtracking in Lua/Luau lexer#3047
Merged
birkenfeld merged 1 commit intopygments:masterfrom Feb 22, 2026
Merged
Fix catastrophic backtracking in Lua/Luau lexer#3047birkenfeld merged 1 commit intopygments:masterfrom
birkenfeld merged 1 commit intopygments:masterfrom
Conversation
The _s pattern (which matches whitespace, single-line comments, and
multi-line block comments) was used in lookaheads with a * quantifier:
(?={_s}*[.:])
(?={_s}*\()
The _comment_multiline component contains [\w\W]*? which, combined with
the outer * quantifier, creates catastrophic backtracking (exponential
time) on inputs with multiple consecutive comments.
Fix: introduce _s_la (lookahead-safe) that matches only whitespace (\s),
avoiding the ambiguous backtracking between the comment pattern and the
outer repetition. This changes tokenization only for the rare case of
block comments between an identifier and its following [.:] or ( -- those
identifiers are now classified as plain Name.Variable instead of being
tracked through the varname/funcname state.
Fixes pygments#3036
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Member
|
Looks good to me! Thanks for the fix. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes #3036 — the Lua and Luau lexers hang indefinitely on certain inputs due to catastrophic regex backtracking.
Root cause: The
_spattern matches whitespace, single-line comments, and multi-line block comments. Its_comment_multilinecomponent contains[\w\W]*?. When_sis used in lookaheads with a*quantifier — e.g.(?={_s}*[.:])— the regex engine tries all possible partitions of the input between[\w\W]*?and the outer*, causing exponential backtracking.Fix: Introduce
_s_la(lookahead-safe) that matches only whitespace (\s), avoiding the ambiguous alternation entirely. This changes tokenization only for the rare case of multi-line block comments between an identifier and its following./:/(— those identifiers are now classified as plainName.Variableinstead of being tracked through the varname/funcname state.Changes:
_s_la = r'\s'to bothLuaLexerandLuauLexer_swith_s_lain all lookahead patterns (6 locations in LuaLexer, 2 in LuauLexer, 1 in_luau_make_expression)tests/snippets/lua/test_no_hang_comments.txt)Before: The reproduction case from #3036 hangs indefinitely
After: Tokenizes in <1ms (201 tokens)