Fix Haskell lexer: handle escape sequences in character literals#3069
Merged
birkenfeld merged 3 commits intopygments:masterfrom Mar 28, 2026
Merged
Conversation
The root-state pattern for character literals only matched single non-backslash characters like 'a'. Escape sequences like '\n', '\t', '\\' were incorrectly tokenized as Keyword.Type + Name fragments. Added a pattern for '\.' (backslash + any char) to match escape character literals, placed after the existing non-escape pattern. Fixes pygments#1795 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Member
|
Thanks for the PR, can you add a test case? |
Covers '\n', '\t', '\\', and 'a' tokenization as Literal.String.Char. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Contributor
Author
|
Added in 0088b19 - snippet test covering |
Member
|
Looks like outputs for the existing tests need to be adjusted as well. |
Update expected token outputs for example.hs and Sudoku.lhs to reflect the new escape character literal tokenization.
Contributor
Author
|
Regenerated the golden outputs for example.hs and Sudoku.lhs in 35edbc8. All Haskell tests passing locally. |
Member
|
LGTM now, thanks! |
Contributor
Author
|
Thanks for the merge! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes incorrect tokenization of Haskell escape character literals like
'\n','\t','\\'.Why this matters
The root-state regex
'[^\\]'only matched character literals containing a single non-backslash character. Escape sequences like'\n'were split across tokens -'\becameKeyword.Typeandn'becameName- producing wrong highlighting.Changes
pygments/lexers/haskell.pyline 57: Added pattern'\\.'to match escape character literals. Placed after the existing non-escape pattern so simple chars like'a'still match first.Testing
Verified all common cases tokenize as
Token.Literal.String.Char:'\n'(newline),'\t'(tab),'\\'(backslash),'a'(simple),'A'(uppercase)Fixes #1795
This contribution was developed with AI assistance (Claude Code).