Skip to content
This repository was archived by the owner on Sep 30, 2024. It is now read-only.

emit SCIP from syntect#39264

Closed
tjdevries wants to merge 31 commits into
mainfrom
tjdevries/syntax-scip
Closed

emit SCIP from syntect#39264
tjdevries wants to merge 31 commits into
mainfrom
tjdevries/syntax-scip

Conversation

@tjdevries

Copy link
Copy Markdown
Contributor

Test plan

  • Multi-byte stuff
  • other languages
  • more files
  • etc :)

@cla-bot cla-bot Bot added the cla-signed label Jul 21, 2022

@olafurpg olafurpg left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great!

Comment thread docker-images/syntax-highlighter/crates/sg-syntax/src/sg_sciptect.rs Outdated
Comment thread docker-images/syntax-highlighter/crates/sg-syntax/src/sg_sciptect.rs Outdated
Comment thread docker-images/syntax-highlighter/crates/sg-syntax/src/sg_sciptect.rs Outdated
Comment thread docker-images/syntax-highlighter/crates/sg-syntax/src/sg_sciptect.rs Outdated
fkling added a commit that referenced this pull request Jul 26, 2022
This is the follow up/improved version of #39116

This commit adds support for syntax highlighting from lsif/scip data that is
returned for some languages. Updating the backend to return lsif/scip
data for every language is done in a separate PR (#39264), but this
PR does not depend on it (if treesitter highlighting is not configured
there simply won't be any highlighting).

Syntax highlighting is done by an extension that takes the JSON encoded
scip data and converts it to something CodeMirror can understand.
Decorations are only generated for the lines that are currently
rendered.

I originally converted line/column ranges to document-offset ranges and
used binary search to find the relevant ranges for the currently
rendered lines. However the overhead of doing the line/column -> offset
conversion was noticeable for large documents, but especially unnecessary
for those because only a small subset of lines would be visible.

Then I changed to an approach that would use the data sent from the
server as is (to avoid creating additional objects in memory and GC-ing
the JSON decoded data) and only add a simple line index. The conversion
from line/column to document offset is now delayed until the moment the
decorations are created. This only works under the assumption that the
server sends back the ranges order and without overlap.

Some of the changes I made (e.g. exporting the replaceValue function)
are not relevant anymore for the final version, but I'll leave them in
for completeness.

Other auxiliary changes in this commit:

Changed the base CodeMirror hook to allow for "manually" dispatching a
transaction to update the value. Without this we would trigger at
least two transactions when loading a file: One for updating the
document and one for updating the syntax highlighting.
Now we can do both in a single transaction.
Added a folder for CodeMirror blob extensions and moved the line numbers
extension there as well.
efritz pushed a commit that referenced this pull request Jul 26, 2022
This is the follow up/improved version of #39116

This commit adds support for syntax highlighting from lsif/scip data that is
returned for some languages. Updating the backend to return lsif/scip
data for every language is done in a separate PR (#39264), but this
PR does not depend on it (if treesitter highlighting is not configured
there simply won't be any highlighting).

Syntax highlighting is done by an extension that takes the JSON encoded
scip data and converts it to something CodeMirror can understand.
Decorations are only generated for the lines that are currently
rendered.

I originally converted line/column ranges to document-offset ranges and
used binary search to find the relevant ranges for the currently
rendered lines. However the overhead of doing the line/column -> offset
conversion was noticeable for large documents, but especially unnecessary
for those because only a small subset of lines would be visible.

Then I changed to an approach that would use the data sent from the
server as is (to avoid creating additional objects in memory and GC-ing
the JSON decoded data) and only add a simple line index. The conversion
from line/column to document offset is now delayed until the moment the
decorations are created. This only works under the assumption that the
server sends back the ranges order and without overlap.

Some of the changes I made (e.g. exporting the replaceValue function)
are not relevant anymore for the final version, but I'll leave them in
for completeness.

Other auxiliary changes in this commit:

Changed the base CodeMirror hook to allow for "manually" dispatching a
transaction to update the value. Without this we would trigger at
least two transactions when loading a file: One for updating the
document and one for updating the syntax highlighting.
Now we can do both in a single transaction.
Added a folder for CodeMirror blob extensions and moved the line numbers
extension there as well.

@olafurpg olafurpg left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comments are mostly for improvements that we can do in a separate PR. LGTM 👍 since there's already a ton of great improvements in the diff that we can go ahead and merge. It should be safe to merge this since it only adds the new /scip endpoint

@fkling

fkling commented Jul 26, 2022

Copy link
Copy Markdown
Contributor

This doesn't actually call the "sciptect" implementation from the HTTP endpoint yet, right? Or am I missing something? How do you plan to decide between using treesitter and sciptect (or how/where do you do that if I'm missing it)?

@olafurpg olafurpg left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fkling this PR only updates syntax-highlighter, and we can followup later by adding support to actually start using the new /scip endpoint


#[post("/scip", format = "application/json", data = "<q>")]
fn scip(q: Json<SourcegraphQuery>) -> JsonValue {
match sg_syntax::scip_highlight(q.into_inner()) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, the /scip endpoint doesn't use syntect yet. Should we add a new property to SourcegraphQuery to allow the client to determine whether to use treesitter or syntect?

@tjdevries tjdevries force-pushed the tjdevries/syntax-scip branch from 28a234b to fdc52a8 Compare July 29, 2022 02:55
@tjdevries

Copy link
Copy Markdown
Contributor Author

Ok @fkling / @olafurpg I connected all the endpoints for Codemirror and tried to separate out as much logic as I could for the old blob view so that I we could have two separate code paths as much as possible (without directly copying & pasting tons of stuff). Let me know if I did the graphql stuff right.

I ran this locally and was able to try:

  • code mirror w/out tree-sitter
  • code mirror w/ tree-sitter
  • no codemirror w/out tree-sitter
  • no codemirror w/ tree-sitter

and got the results I was expecting. However, there is definitely quite a bit of work left to twiddle the highlighting bits (which we could do in this PR or just do in a follow PR, as that's pretty easy to do).

I won't be here friday, but we can probably merge this monday (assuming no one sees anything that looks off to them)

@olafurpg olafurpg left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got this working locally 👍 The only blocking questions we should resolve before merging relate to the new GraphQL endpoint. Is this an API we are comfortable supporting indefinitely? Are the names future proof and self-explanatory?

LGTM but will leave it to somebody else to stamp an approval since I won't be around to comment on PRs until I'm back from PTO.

"""
Base64 encoded JSON payload of a SCIP Document
"""
scip: String!

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about including the encoding in the field name? That allows us to return a more efficient encoding in the future.

Suggested change
scip: String!
base64_scip: String!


var document *scip.Document
if resp.Scip != "" {
fmt.Println("=============================")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Delete printlns

Suggested change
fmt.Println("=============================")

@olafurpg olafurpg requested a review from fkling July 29, 2022 10:28
@fkling

fkling commented Jul 29, 2022

Copy link
Copy Markdown
Contributor

Sorry, I won't get to reviewing this anymore today. I'll read up on some GraphQL schema design best practices and come back to this on Monday. I almost think we don't want to introduce a new field for that, at least not this way.
We can also remove the GraphQL changes for now and just merge the highlighting changes.

@fkling

fkling commented Aug 1, 2022

Copy link
Copy Markdown
Contributor

Alright, I didn't actually end up reading too much about GraphQL. On the other PR you said you don't know how important backwards compatibility is and I agree. Still, I think we already have all the fields we need (highlight {html, lsif}) we would just have to update the implementation not generate both the frontend to request only one or the other.

If you don't want to look into that I'm happy to do it. In that case we can merge this PR without the GraphQL changes.

The PR is still in draft mode, is there anything else you want to do? I can't comment on the Rust implementation unfortunately.

@tjdevries tjdevries marked this pull request as ready for review August 2, 2022 20:08
@sourcegraph-bot

sourcegraph-bot commented Aug 2, 2022

Copy link
Copy Markdown
Contributor

Codenotify: Notifying subscribers in CODENOTIFY files for diff 8b1dce7...6a929bd.

Notify File(s)
@efritz lib/codeintel/highlights/types.go

@tjdevries tjdevries mentioned this pull request Aug 2, 2022
3 tasks
@fkling

fkling commented Aug 11, 2022

Copy link
Copy Markdown
Contributor

@tjdevries I guess this can be closed?

@tjdevries tjdevries closed this Aug 14, 2022
@eseliger eseliger deleted the tjdevries/syntax-scip branch February 4, 2024 20:38
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants