fix(highlight): account for carriage return at EOF and chunk ends#4375
Merged
amaanq merged 2 commits intotree-sitter:masterfrom Jun 5, 2025
Merged
fix(highlight): account for carriage return at EOF and chunk ends#4375amaanq merged 2 commits intotree-sitter:masterfrom
amaanq merged 2 commits intotree-sitter:masterfrom
Conversation
Highlighting a carriage return is only attempted when it is not the last character in a `Source` chunk. This skips highlighting for CR in two cases: CR as the last character in a file and CR just before a highlight event. For the first, although EOF can be considered a line ending, in a sense, CR+EOF shouldn't count as a CRLF line ending, so such a CR should be highlighted. For the second, it is more complicated. Since text is highlighted as a stream of events (`HighlightStart`, `HighlightEnd`, and `Source`), we cannot know while in a `Source` event whether we are the last chunk without peeking forward through `HighlightStart` and -`End` events to the next `Source`. The API takes an `Iterator`, so this isn't possible. Luckily, it probably doesn't matter in practice, because grammars are highly unlikely to want to split CR and LF with highlighting, so the next `Source` won't start with LF anyways. The updated test tests both of these cases. Commit 422e74f (chore: update javascript-relevant tests, 2024-02-02) added the FIXME after updating the JavaScript grammar, which caused `b` to become parsed as a variable, triggering the second case.
If a CRLF is split between two `HighlightEvent`s, the previous commit would highlight the CR, although it shouldn't. Fix that by tracking the offset in the rendered HTML of the last CR and insert there once known.
Contributor
Author
|
The second commit builds upon the first to fix CR and LF straddling two |
Contributor
Author
|
The failed test is due to a timeout:
It seems spurious. |
noorosa
added a commit
to noorosa/tree-sitter
that referenced
this pull request
May 8, 2025
Closed
Member
|
Nice, thank you! |
amaanq
approved these changes
Jun 5, 2025
|
Successfully created backport PR for |
|
Git push to origin failed for release-0.25 with exitcode 1 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Highlighting a carriage return is only attempted when it is not the last character in a
Sourcechunk. This skips highlighting for CR in two cases: CR as the last character in a file and CR just before a highlight event.For the first, although EOF can be considered a line ending, in a sense, CR+EOF shouldn't count as a CRLF line ending, so such a CR should be highlighted.
For the second, it is more complicated. Since text is highlighted as a stream of events (
HighlightStart,HighlightEnd, andSource), we cannot know while in aSourceevent whether we are the last chunk without peeking forward throughHighlightStartand -Endevents to the nextSource. Store the offset in the rendered HTML of the last CR so we can render it once we get to the nextSource, if it's not part of a CRLF.The updated test tests both of these cases. Commit 422e74f (chore: update javascript-relevant tests, 2024-02-02) added the FIXME after updating the JavaScript grammar, which caused
bto become parsed as a variable, triggering the second case.