This repository was archived by the owner on Sep 30, 2024. It is now read-only.
Syntax highlighting for CodeMirror based blob view#39116
Closed
fkling wants to merge 7 commits into
Closed
Conversation
This commit hacks the syntax-highligher server and Go backend code to return token ranges instead of highlighted HTML. This is then used in the frontend to generate decorations.
Still overloads the `html` property in the response to return the JSON encoded range data...
2e5d784 to
d8221dd
Compare
This commit updates the syntax highlighting extension to only highlight the lines that are currently rendered by CodeMirror. A binary search is used to find the ranges that that apply to the rendered lines.
… range CodeMirror works counts UTF-16 characters, so we also need to count characters in the syntax highlighter. However there is still a problem that after a line with a unicode charater, all ranges seem to be off by 1.
Some callsites, such as the blob view, need to update the value together with additional editor state. At the moment this is not possible and thus the editor is temporarily in an inconsistent state where the highlighting information doesn't belong to the loaded content.
This was referenced Jul 25, 2022
fkling
added a commit
that referenced
this pull request
Jul 26, 2022
This is the follow up/improved version of #39116 This commit adds support for syntax highlighting from lsif/scip data that is returned for some languages. Updating the backend to return lsif/scip data for every language is done in a separate PR (#39264), but this PR does not depend on it (if treesitter highlighting is not configured there simply won't be any highlighting). Syntax highlighting is done by an extension that takes the JSON encoded scip data and converts it to something CodeMirror can understand. Decorations are only generated for the lines that are currently rendered. I originally converted line/column ranges to document-offset ranges and used binary search to find the relevant ranges for the currently rendered lines. However the overhead of doing the line/column -> offset conversion was noticeable for large documents, but especially unnecessary for those because only a small subset of lines would be visible. Then I changed to an approach that would use the data sent from the server as is (to avoid creating additional objects in memory and GC-ing the JSON decoded data) and only add a simple line index. The conversion from line/column to document offset is now delayed until the moment the decorations are created. This only works under the assumption that the server sends back the ranges order and without overlap. Some of the changes I made (e.g. exporting the replaceValue function) are not relevant anymore for the final version, but I'll leave them in for completeness. Other auxiliary changes in this commit: Changed the base CodeMirror hook to allow for "manually" dispatching a transaction to update the value. Without this we would trigger at least two transactions when loading a file: One for updating the document and one for updating the syntax highlighting. Now we can do both in a single transaction. Added a folder for CodeMirror blob extensions and moved the line numbers extension there as well.
efritz
pushed a commit
that referenced
this pull request
Jul 26, 2022
This is the follow up/improved version of #39116 This commit adds support for syntax highlighting from lsif/scip data that is returned for some languages. Updating the backend to return lsif/scip data for every language is done in a separate PR (#39264), but this PR does not depend on it (if treesitter highlighting is not configured there simply won't be any highlighting). Syntax highlighting is done by an extension that takes the JSON encoded scip data and converts it to something CodeMirror can understand. Decorations are only generated for the lines that are currently rendered. I originally converted line/column ranges to document-offset ranges and used binary search to find the relevant ranges for the currently rendered lines. However the overhead of doing the line/column -> offset conversion was noticeable for large documents, but especially unnecessary for those because only a small subset of lines would be visible. Then I changed to an approach that would use the data sent from the server as is (to avoid creating additional objects in memory and GC-ing the JSON decoded data) and only add a simple line index. The conversion from line/column to document offset is now delayed until the moment the decorations are created. This only works under the assumption that the server sends back the ranges order and without overlap. Some of the changes I made (e.g. exporting the replaceValue function) are not relevant anymore for the final version, but I'll leave them in for completeness. Other auxiliary changes in this commit: Changed the base CodeMirror hook to allow for "manually" dispatching a transaction to update the value. Without this we would trigger at least two transactions when loading a file: One for updating the document and one for updating the syntax highlighting. Now we can do both in a single transaction. Added a folder for CodeMirror blob extensions and moved the line numbers extension there as well.
Contributor
Author
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
With the CodeMirror-based blob view we don't need syntax highlighting as annotated HTML, we need the raw token ranges so that we can created CodeMirror decorations from them.
Comparison of of current view (left) vs CodeMirror (right) when opening a ~15k line document:
sg-cm-blob-view-syntax.mp4
CAVEAT: I have no idea what I'm doing because I'm not familiar with these parts of the code base and I'm not familiar with Rust.
This PR adds a new query parameter that informs the syntax highlighter to return the ranges only. At the moment the response is returned in the same HTML field, which is not great. Also there is no way to let the client set the query parameter (it's hardcoded).
I've implemented syntax highlighting by only creating decorations for the lines that are currently rendered. Compared to creating all decorations up-front I feel like the "on demand" version loads a tad faster when opening a ~15k line file.
htmlfield)Test plan