Skip to content
This repository was archived by the owner on Sep 30, 2024. It is now read-only.

perf(scip-syntax): uses the same parse tree for globals and locals#63564

Merged
kritzcreek merged 2 commits into
mainfrom
christoph/scip-syntax-single-parse
Jul 1, 2024
Merged

perf(scip-syntax): uses the same parse tree for globals and locals#63564
kritzcreek merged 2 commits into
mainfrom
christoph/scip-syntax-single-parse

Conversation

@kritzcreek

Copy link
Copy Markdown
Contributor

Closes https://linear.app/sourcegraph/issue/GRAPH-718/dont-parse-every-file-twice-in-scip-syntax

While working on parallel scip-syntax I noticed we're re-parsing every file for both globals and locals generation. This PR makes it so we use the same parse tree for both. I've also made it so we don't re-initialize the language parsers for every file. The thread-local-storage aspect of this matters after merging #63536

Test plan

spring-framework on main [?]
❯ hyperfine --warmup 1 'git archive HEAD | scip-syntax-main index tar - -l java -o main.scip' 'git archive HEAD | scip-syntax index tar - -l java -o single-parse.scip'
Benchmark 1: git archive HEAD | scip-syntax-main index tar - -l java -o main.scip
  Time (mean ± σ):      6.894 s ±  0.033 s    [User: 6.877 s, System: 0.093 s]
  Range (min … max):    6.814 s …  6.921 s    10 runs

Benchmark 2: git archive HEAD | scip-syntax index tar - -l java -o single-parse.scip
  Time (mean ± σ):      4.675 s ±  0.008 s    [User: 4.693 s, System: 0.084 s]
  Range (min … max):    4.663 s …  4.688 s    10 runs

Summary
  git archive HEAD | scip-syntax index tar - -l java -o single-parse.scip ran
    1.47 ± 0.01 times faster than git archive HEAD | scip-syntax-main index tar - -l java -o main.scip

spring-framework on  main [?]
❯ scip-syntax scip-evaluate --ground-truth main.scip --candidate single-parse.scip
{"precision_percent":"100.0","recall_percent":"100.0","true_positives":"569982.0","false_positives":"0.0","false_negatives":"0.0"}

@cla-bot cla-bot Bot added the cla-signed label Jul 1, 2024
@github-actions github-actions Bot added team/graph Graph Team (previously Code Intel/Language Tools/Language Platform) team/product-platform labels Jul 1, 2024
@kritzcreek kritzcreek force-pushed the christoph/scip-syntax-single-parse branch from c90a6dc to 4d80665 Compare July 1, 2024 03:19
Comment on lines 289 to 295

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My typical rule of thumb is that if both branches of a pattern-match are being modified through some methods, it's clearer to replace that with an explicit pattern match. So something like:

let parser = match parser.entry(parser_id) {
    Occupied(p) => p.reset(); ...
    Vacant(v) => v.insert(parser_id.get_parser())
}

Otherwise, a method name like and_modify doesn't make it super clear what is happening, and a comment shouldn't be necessary for what is otherwise boring code.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to make the change, though the comment wasn't really about and_modify and more about the tree-sitter API requiring the use of .reset() when parsing a new document.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case, a clearer comment would be something like Tree-sitter parsing is stateful, so reset the parser state explicitly.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I take that back. This is one of the cases where the entry API seems to be the easiest way to make the lifetimes work out. (I can't get a mutable reference from Occupied which I need to call .reset() on it)

I've removed the comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Damnit, the UI didn't reload so I missed your comment.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't work? https://doc.rust-lang.org/std/collections/hash_map/struct.OccupiedEntry.html#method.get_mut

It does give you a mutable reference, but you can't return it from the match after resetting, as it's bound to the lifetime of the pattern matched entry value.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neat, that one worked.

@kritzcreek kritzcreek force-pushed the christoph/scip-syntax-single-parse branch from 4d80665 to 0a6222c Compare July 1, 2024 04:34
@kritzcreek kritzcreek force-pushed the christoph/scip-syntax-single-parse branch from 0a6222c to 5e0d179 Compare July 1, 2024 04:35
@kritzcreek kritzcreek enabled auto-merge (squash) July 1, 2024 04:36
@kritzcreek kritzcreek disabled auto-merge July 1, 2024 04:43
@kritzcreek kritzcreek enabled auto-merge (squash) July 1, 2024 04:46
@kritzcreek kritzcreek merged commit afea138 into main Jul 1, 2024
@kritzcreek kritzcreek deleted the christoph/scip-syntax-single-parse branch July 1, 2024 04:51
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

cla-signed team/graph Graph Team (previously Code Intel/Language Tools/Language Platform) team/product-platform

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants