Skip to content
This repository was archived by the owner on Sep 30, 2024. It is now read-only.

Syntax highlighting for CodeMirror based blob view#39116

Closed
fkling wants to merge 7 commits into
mainfrom
fkling/cm-file-view-syntax-highlighting
Closed

Syntax highlighting for CodeMirror based blob view#39116
fkling wants to merge 7 commits into
mainfrom
fkling/cm-file-view-syntax-highlighting

Conversation

@fkling

@fkling fkling commented Jul 19, 2022

Copy link
Copy Markdown
Contributor

With the CodeMirror-based blob view we don't need syntax highlighting as annotated HTML, we need the raw token ranges so that we can created CodeMirror decorations from them.

Comparison of of current view (left) vs CodeMirror (right) when opening a ~15k line document:

sg-cm-blob-view-syntax.mp4

CAVEAT: I have no idea what I'm doing because I'm not familiar with these parts of the code base and I'm not familiar with Rust.

This PR adds a new query parameter that informs the syntax highlighter to return the ranges only. At the moment the response is returned in the same HTML field, which is not great. Also there is no way to let the client set the query parameter (it's hardcoded).

I've implemented syntax highlighting by only creating decorations for the lines that are currently rendered. Compared to creating all decorations up-front I feel like the "on demand" version loads a tad faster when opening a ~15k line file.

  • Extend GraphQL to allow passing the parameter (unless there is a better way)
  • Extend the response type to include token ranges directly (vs piggybacking on the html field)
  • Fix tests (where necessary)
  • Fix ranges (there seems to be an issue with ranges following a line that contains a unicode character; see screenshot)

2022-07-20_14-00

Test plan

This commit hacks the syntax-highligher server and Go backend code to
return token ranges instead of highlighted HTML. This is then used in
the frontend to generate decorations.
@cla-bot cla-bot Bot added the cla-signed label Jul 19, 2022
Still overloads the `html` property in the response to return the JSON
encoded range data...
@fkling fkling force-pushed the fkling/cm-file-view-syntax-highlighting branch from 2e5d784 to d8221dd Compare July 19, 2022 23:15
fkling added 4 commits July 20, 2022 11:09
This commit updates the syntax highlighting extension to only highlight
the lines that are currently rendered by CodeMirror. A binary search is
used to find the ranges that that apply to the rendered lines.
… range

CodeMirror works counts UTF-16 characters, so we also need to count
characters in the syntax highlighter. However there is still a
problem that after a line with a unicode charater, all ranges seem to be
off by 1.
Some callsites, such as the blob view, need to update the value together
with additional editor state. At the moment this is not possible and
thus the editor is temporarily in an inconsistent state where the
highlighting information doesn't belong to the loaded content.
fkling added a commit that referenced this pull request Jul 26, 2022
This is the follow up/improved version of #39116

This commit adds support for syntax highlighting from lsif/scip data that is
returned for some languages. Updating the backend to return lsif/scip
data for every language is done in a separate PR (#39264), but this
PR does not depend on it (if treesitter highlighting is not configured
there simply won't be any highlighting).

Syntax highlighting is done by an extension that takes the JSON encoded
scip data and converts it to something CodeMirror can understand.
Decorations are only generated for the lines that are currently
rendered.

I originally converted line/column ranges to document-offset ranges and
used binary search to find the relevant ranges for the currently
rendered lines. However the overhead of doing the line/column -> offset
conversion was noticeable for large documents, but especially unnecessary
for those because only a small subset of lines would be visible.

Then I changed to an approach that would use the data sent from the
server as is (to avoid creating additional objects in memory and GC-ing
the JSON decoded data) and only add a simple line index. The conversion
from line/column to document offset is now delayed until the moment the
decorations are created. This only works under the assumption that the
server sends back the ranges order and without overlap.

Some of the changes I made (e.g. exporting the replaceValue function)
are not relevant anymore for the final version, but I'll leave them in
for completeness.

Other auxiliary changes in this commit:

Changed the base CodeMirror hook to allow for "manually" dispatching a
transaction to update the value. Without this we would trigger at
least two transactions when loading a file: One for updating the
document and one for updating the syntax highlighting.
Now we can do both in a single transaction.
Added a folder for CodeMirror blob extensions and moved the line numbers
extension there as well.
efritz pushed a commit that referenced this pull request Jul 26, 2022
This is the follow up/improved version of #39116

This commit adds support for syntax highlighting from lsif/scip data that is
returned for some languages. Updating the backend to return lsif/scip
data for every language is done in a separate PR (#39264), but this
PR does not depend on it (if treesitter highlighting is not configured
there simply won't be any highlighting).

Syntax highlighting is done by an extension that takes the JSON encoded
scip data and converts it to something CodeMirror can understand.
Decorations are only generated for the lines that are currently
rendered.

I originally converted line/column ranges to document-offset ranges and
used binary search to find the relevant ranges for the currently
rendered lines. However the overhead of doing the line/column -> offset
conversion was noticeable for large documents, but especially unnecessary
for those because only a small subset of lines would be visible.

Then I changed to an approach that would use the data sent from the
server as is (to avoid creating additional objects in memory and GC-ing
the JSON decoded data) and only add a simple line index. The conversion
from line/column to document offset is now delayed until the moment the
decorations are created. This only works under the assumption that the
server sends back the ranges order and without overlap.

Some of the changes I made (e.g. exporting the replaceValue function)
are not relevant anymore for the final version, but I'll leave them in
for completeness.

Other auxiliary changes in this commit:

Changed the base CodeMirror hook to allow for "manually" dispatching a
transaction to update the value. Without this we would trigger at
least two transactions when loading a file: One for updating the
document and one for updating the syntax highlighting.
Now we can do both in a single transaction.
Added a folder for CodeMirror blob extensions and moved the line numbers
extension there as well.
@fkling fkling closed this Jul 26, 2022
@fkling

fkling commented Jul 26, 2022

Copy link
Copy Markdown
Contributor Author

This is being implemented as part of #39386 and #39264

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants