Syntax highlighting for CodeMirror based blob view by fkling · Pull Request #39116 · sourcegraph/sourcegraph-public-snapshot

fkling · 2022-07-19T23:14:20Z

With the CodeMirror-based blob view we don't need syntax highlighting as annotated HTML, we need the raw token ranges so that we can created CodeMirror decorations from them.

Comparison of of current view (left) vs CodeMirror (right) when opening a ~15k line document:

sg-cm-blob-view-syntax.mp4

CAVEAT: I have no idea what I'm doing because I'm not familiar with these parts of the code base and I'm not familiar with Rust.

This PR adds a new query parameter that informs the syntax highlighter to return the ranges only. At the moment the response is returned in the same HTML field, which is not great. Also there is no way to let the client set the query parameter (it's hardcoded).

I've implemented syntax highlighting by only creating decorations for the lines that are currently rendered. Compared to creating all decorations up-front I feel like the "on demand" version loads a tad faster when opening a ~15k line file.

Extend GraphQL to allow passing the parameter (unless there is a better way)
Extend the response type to include token ranges directly (vs piggybacking on the html field)
Fix tests (where necessary)
Fix ranges (there seems to be an issue with ranges following a line that contains a unicode character; see screenshot)

Test plan

This commit hacks the syntax-highligher server and Go backend code to return token ranges instead of highlighted HTML. This is then used in the frontend to generate decorations.

Still overloads the `html` property in the response to return the JSON encoded range data...

This commit updates the syntax highlighting extension to only highlight the lines that are currently rendered by CodeMirror. A binary search is used to find the ranges that that apply to the rendered lines.

… range CodeMirror works counts UTF-16 characters, so we also need to count characters in the syntax highlighter. However there is still a problem that after a line with a unicode charater, all ranges seem to be off by 1.

Some callsites, such as the blob view, need to update the value together with additional editor state. At the moment this is not possible and thus the editor is temporarily in an inconsistent state where the highlighting information doesn't belong to the loaded content.

This is the follow up/improved version of #39116 This commit adds support for syntax highlighting from lsif/scip data that is returned for some languages. Updating the backend to return lsif/scip data for every language is done in a separate PR (#39264), but this PR does not depend on it (if treesitter highlighting is not configured there simply won't be any highlighting). Syntax highlighting is done by an extension that takes the JSON encoded scip data and converts it to something CodeMirror can understand. Decorations are only generated for the lines that are currently rendered. I originally converted line/column ranges to document-offset ranges and used binary search to find the relevant ranges for the currently rendered lines. However the overhead of doing the line/column -> offset conversion was noticeable for large documents, but especially unnecessary for those because only a small subset of lines would be visible. Then I changed to an approach that would use the data sent from the server as is (to avoid creating additional objects in memory and GC-ing the JSON decoded data) and only add a simple line index. The conversion from line/column to document offset is now delayed until the moment the decorations are created. This only works under the assumption that the server sends back the ranges order and without overlap. Some of the changes I made (e.g. exporting the replaceValue function) are not relevant anymore for the final version, but I'll leave them in for completeness. Other auxiliary changes in this commit: Changed the base CodeMirror hook to allow for "manually" dispatching a transaction to update the value. Without this we would trigger at least two transactions when loading a file: One for updating the document and one for updating the syntax highlighting. Now we can do both in a single transaction. Added a folder for CodeMirror blob extensions and moved the line numbers extension there as well.

fkling · 2022-07-26T17:38:52Z

This is being implemented as part of #39386 and #39264

Hack: Support syntax highlighting in CodeMirror file view

092a0e1

This commit hacks the syntax-highligher server and Go backend code to return token ranges instead of highlighted HTML. This is then used in the frontend to generate decorations.

cla-bot Bot added the cla-signed label Jul 19, 2022

Slightly less horrible implementation

d8221dd

Still overloads the `html` property in the response to return the JSON encoded range data...

fkling force-pushed the fkling/cm-file-view-syntax-highlighting branch from 2e5d784 to d8221dd Compare July 19, 2022 23:15

fkling added 4 commits July 20, 2022 11:09

Add comments

a7318b9

Only highlight lines which are currently rendered

a9dc1d3

This commit updates the syntax highlighting extension to only highlight the lines that are currently rendered by CodeMirror. A binary search is used to find the ranges that that apply to the rendered lines.

fkling requested a review from olafurpg July 20, 2022 13:19

olafurpg assigned fkling Jul 20, 2022

olafurpg added the team/code-navigation label Jul 20, 2022

WIP

8dd1df1

This was referenced Jul 25, 2022

blob: Syntax highlighting for CodeMirror blob view #39386

Merged

Validate CodeMirror as alternative blob view implementation #39408

Closed

fkling closed this Jul 26, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Syntax highlighting for CodeMirror based blob view#39116

Syntax highlighting for CodeMirror based blob view#39116
fkling wants to merge 7 commits into
mainfrom
fkling/cm-file-view-syntax-highlighting

fkling commented Jul 19, 2022 •

edited

Loading

Uh oh!

fkling commented Jul 26, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

fkling commented Jul 19, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Test plan

Uh oh!

fkling commented Jul 26, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

fkling commented Jul 19, 2022 •

edited

Loading