emit SCIP from syntect#39264
Conversation
This is the follow up/improved version of #39116 This commit adds support for syntax highlighting from lsif/scip data that is returned for some languages. Updating the backend to return lsif/scip data for every language is done in a separate PR (#39264), but this PR does not depend on it (if treesitter highlighting is not configured there simply won't be any highlighting). Syntax highlighting is done by an extension that takes the JSON encoded scip data and converts it to something CodeMirror can understand. Decorations are only generated for the lines that are currently rendered. I originally converted line/column ranges to document-offset ranges and used binary search to find the relevant ranges for the currently rendered lines. However the overhead of doing the line/column -> offset conversion was noticeable for large documents, but especially unnecessary for those because only a small subset of lines would be visible. Then I changed to an approach that would use the data sent from the server as is (to avoid creating additional objects in memory and GC-ing the JSON decoded data) and only add a simple line index. The conversion from line/column to document offset is now delayed until the moment the decorations are created. This only works under the assumption that the server sends back the ranges order and without overlap. Some of the changes I made (e.g. exporting the replaceValue function) are not relevant anymore for the final version, but I'll leave them in for completeness. Other auxiliary changes in this commit: Changed the base CodeMirror hook to allow for "manually" dispatching a transaction to update the value. Without this we would trigger at least two transactions when loading a file: One for updating the document and one for updating the syntax highlighting. Now we can do both in a single transaction. Added a folder for CodeMirror blob extensions and moved the line numbers extension there as well.
This is the follow up/improved version of #39116 This commit adds support for syntax highlighting from lsif/scip data that is returned for some languages. Updating the backend to return lsif/scip data for every language is done in a separate PR (#39264), but this PR does not depend on it (if treesitter highlighting is not configured there simply won't be any highlighting). Syntax highlighting is done by an extension that takes the JSON encoded scip data and converts it to something CodeMirror can understand. Decorations are only generated for the lines that are currently rendered. I originally converted line/column ranges to document-offset ranges and used binary search to find the relevant ranges for the currently rendered lines. However the overhead of doing the line/column -> offset conversion was noticeable for large documents, but especially unnecessary for those because only a small subset of lines would be visible. Then I changed to an approach that would use the data sent from the server as is (to avoid creating additional objects in memory and GC-ing the JSON decoded data) and only add a simple line index. The conversion from line/column to document offset is now delayed until the moment the decorations are created. This only works under the assumption that the server sends back the ranges order and without overlap. Some of the changes I made (e.g. exporting the replaceValue function) are not relevant anymore for the final version, but I'll leave them in for completeness. Other auxiliary changes in this commit: Changed the base CodeMirror hook to allow for "manually" dispatching a transaction to update the value. Without this we would trigger at least two transactions when loading a file: One for updating the document and one for updating the syntax highlighting. Now we can do both in a single transaction. Added a folder for CodeMirror blob extensions and moved the line numbers extension there as well.
olafurpg
left a comment
There was a problem hiding this comment.
The comments are mostly for improvements that we can do in a separate PR. LGTM 👍 since there's already a ton of great improvements in the diff that we can go ahead and merge. It should be safe to merge this since it only adds the new /scip endpoint
|
This doesn't actually call the "sciptect" implementation from the HTTP endpoint yet, right? Or am I missing something? How do you plan to decide between using treesitter and sciptect (or how/where do you do that if I'm missing it)? |
|
|
||
| #[post("/scip", format = "application/json", data = "<q>")] | ||
| fn scip(q: Json<SourcegraphQuery>) -> JsonValue { | ||
| match sg_syntax::scip_highlight(q.into_inner()) { |
There was a problem hiding this comment.
IIUC, the /scip endpoint doesn't use syntect yet. Should we add a new property to SourcegraphQuery to allow the client to determine whether to use treesitter or syntect?
28a234b to
fdc52a8
Compare
|
Ok @fkling / @olafurpg I connected all the endpoints for Codemirror and tried to separate out as much logic as I could for the old blob view so that I we could have two separate code paths as much as possible (without directly copying & pasting tons of stuff). Let me know if I did the graphql stuff right. I ran this locally and was able to try:
and got the results I was expecting. However, there is definitely quite a bit of work left to twiddle the highlighting bits (which we could do in this PR or just do in a follow PR, as that's pretty easy to do). I won't be here friday, but we can probably merge this monday (assuming no one sees anything that looks off to them) |
olafurpg
left a comment
There was a problem hiding this comment.
I got this working locally 👍 The only blocking questions we should resolve before merging relate to the new GraphQL endpoint. Is this an API we are comfortable supporting indefinitely? Are the names future proof and self-explanatory?
LGTM but will leave it to somebody else to stamp an approval since I won't be around to comment on PRs until I'm back from PTO.
| """ | ||
| Base64 encoded JSON payload of a SCIP Document | ||
| """ | ||
| scip: String! |
There was a problem hiding this comment.
What do you think about including the encoding in the field name? That allows us to return a more efficient encoding in the future.
| scip: String! | |
| base64_scip: String! |
|
|
||
| var document *scip.Document | ||
| if resp.Scip != "" { | ||
| fmt.Println("=============================") |
There was a problem hiding this comment.
Delete printlns
| fmt.Println("=============================") |
|
Sorry, I won't get to reviewing this anymore today. I'll read up on some GraphQL schema design best practices and come back to this on Monday. I almost think we don't want to introduce a new field for that, at least not this way. |
|
Alright, I didn't actually end up reading too much about GraphQL. On the other PR you said you don't know how important backwards compatibility is and I agree. Still, I think we already have all the fields we need ( If you don't want to look into that I'm happy to do it. In that case we can merge this PR without the GraphQL changes. The PR is still in draft mode, is there anything else you want to do? I can't comment on the Rust implementation unfortunately. |
|
Codenotify: Notifying subscribers in CODENOTIFY files for diff 8b1dce7...6a929bd.
|
|
@tjdevries I guess this can be closed? |
Test plan