Skip to content

Fix catastrophic backtracking in Lua/Luau lexer#3047

Merged
birkenfeld merged 1 commit intopygments:masterfrom
worksbyfriday:fix-lua-lexer-backtracking
Feb 22, 2026
Merged

Fix catastrophic backtracking in Lua/Luau lexer#3047
birkenfeld merged 1 commit intopygments:masterfrom
worksbyfriday:fix-lua-lexer-backtracking

Conversation

@worksbyfriday
Copy link
Copy Markdown
Contributor

Summary

Fixes #3036 — the Lua and Luau lexers hang indefinitely on certain inputs due to catastrophic regex backtracking.

Root cause: The _s pattern matches whitespace, single-line comments, and multi-line block comments. Its _comment_multiline component contains [\w\W]*?. When _s is used in lookaheads with a * quantifier — e.g. (?={_s}*[.:]) — the regex engine tries all possible partitions of the input between [\w\W]*? and the outer *, causing exponential backtracking.

Fix: Introduce _s_la (lookahead-safe) that matches only whitespace (\s), avoiding the ambiguous alternation entirely. This changes tokenization only for the rare case of multi-line block comments between an identifier and its following ./:/( — those identifiers are now classified as plain Name.Variable instead of being tracked through the varname/funcname state.

Changes:

  • Added _s_la = r'\s' to both LuaLexer and LuauLexer
  • Replaced _s with _s_la in all lookahead patterns (6 locations in LuaLexer, 2 in LuauLexer, 1 in _luau_make_expression)
  • Updated golden test outputs for the minor reclassification
  • Added regression test snippet (tests/snippets/lua/test_no_hang_comments.txt)

Before: The reproduction case from #3036 hangs indefinitely
After: Tokenizes in <1ms (201 tokens)

The _s pattern (which matches whitespace, single-line comments, and
multi-line block comments) was used in lookaheads with a * quantifier:
  (?={_s}*[.:])
  (?={_s}*\()

The _comment_multiline component contains [\w\W]*? which, combined with
the outer * quantifier, creates catastrophic backtracking (exponential
time) on inputs with multiple consecutive comments.

Fix: introduce _s_la (lookahead-safe) that matches only whitespace (\s),
avoiding the ambiguous backtracking between the comment pattern and the
outer repetition. This changes tokenization only for the rare case of
block comments between an identifier and its following [.:] or ( -- those
identifiers are now classified as plain Name.Variable instead of being
tracked through the varname/funcname state.

Fixes pygments#3036

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@birkenfeld
Copy link
Copy Markdown
Member

Looks good to me! Thanks for the fix.

@birkenfeld birkenfeld merged commit 8e4ff26 into pygments:master Feb 22, 2026
15 checks passed
@Anteru Anteru added this to the 2.20.0 milestone Mar 26, 2026
@Anteru Anteru added the A-lexing area: changes to individual lexers label Mar 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-lexing area: changes to individual lexers

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Lua lexer bug

3 participants