Skip to content

Fix Haskell lexer: handle escape sequences in character literals#3069

Merged
birkenfeld merged 3 commits intopygments:masterfrom
mvanhorn:osc/1795-fix-haskell-char-escape
Mar 28, 2026
Merged

Fix Haskell lexer: handle escape sequences in character literals#3069
birkenfeld merged 3 commits intopygments:masterfrom
mvanhorn:osc/1795-fix-haskell-char-escape

Conversation

@mvanhorn
Copy link
Copy Markdown
Contributor

Summary

Fixes incorrect tokenization of Haskell escape character literals like '\n', '\t', '\\'.

Why this matters

The root-state regex '[^\\]' only matched character literals containing a single non-backslash character. Escape sequences like '\n' were split across tokens - '\ became Keyword.Type and n' became Name - producing wrong highlighting.

Changes

pygments/lexers/haskell.py line 57: Added pattern '\\.' to match escape character literals. Placed after the existing non-escape pattern so simple chars like 'a' still match first.

Testing

Verified all common cases tokenize as Token.Literal.String.Char:

  • '\n' (newline), '\t' (tab), '\\' (backslash), 'a' (simple), 'A' (uppercase)

Fixes #1795

This contribution was developed with AI assistance (Claude Code).

The root-state pattern for character literals only matched single
non-backslash characters like 'a'. Escape sequences like '\n', '\t',
'\\' were incorrectly tokenized as Keyword.Type + Name fragments.

Added a pattern for '\.' (backslash + any char) to match escape
character literals, placed after the existing non-escape pattern.

Fixes pygments#1795

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@birkenfeld
Copy link
Copy Markdown
Member

Thanks for the PR, can you add a test case?

Covers '\n', '\t', '\\', and 'a' tokenization as Literal.String.Char.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@mvanhorn
Copy link
Copy Markdown
Contributor Author

Added in 0088b19 - snippet test covering '\n', '\t', '\\', and 'a' all tokenizing as Literal.String.Char.

@birkenfeld
Copy link
Copy Markdown
Member

Looks like outputs for the existing tests need to be adjusted as well.

Update expected token outputs for example.hs and Sudoku.lhs
to reflect the new escape character literal tokenization.
@mvanhorn
Copy link
Copy Markdown
Contributor Author

Regenerated the golden outputs for example.hs and Sudoku.lhs in 35edbc8. All Haskell tests passing locally.

@birkenfeld
Copy link
Copy Markdown
Member

LGTM now, thanks!

@birkenfeld birkenfeld merged commit e3a3c54 into pygments:master Mar 28, 2026
15 checks passed
@Anteru Anteru added this to the 2.20.0 milestone Mar 29, 2026
@Anteru Anteru added the A-lexing area: changes to individual lexers label Mar 29, 2026
@mvanhorn
Copy link
Copy Markdown
Contributor Author

Thanks for the merge!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-lexing area: changes to individual lexers

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Haskell lexer doesn't work for '\n'

3 participants