Skip to content
This repository was archived by the owner on Sep 30, 2024. It is now read-only.

codeintel: Speed up syntactic and search-based usages using batch APIs#64078

Merged
kritzcreek merged 5 commits into
mainfrom
christoph/chunk-and-batch-syntactic-usages
Jul 29, 2024
Merged

codeintel: Speed up syntactic and search-based usages using batch APIs#64078
kritzcreek merged 5 commits into
mainfrom
christoph/chunk-and-batch-syntactic-usages

Conversation

@kritzcreek

Copy link
Copy Markdown
Contributor

Closes https://linear.app/sourcegraph/issue/GRAPH-771/chunk-syntactic-usage-tasks-and-use-batch-apis-to-handle-chunks

This is the last step for properly implementing #63971. We split the candidateFiles search produces into chunks and process these in parallel using the new batch api on MappedIndex.

Test plan

Existing tests continue passing

@cla-bot cla-bot Bot added the cla-signed label Jul 25, 2024
@github-actions github-actions Bot added team/graph Graph Team (previously Code Intel/Language Tools/Language Platform) team/product-platform labels Jul 25, 2024
@kritzcreek kritzcreek force-pushed the christoph/chunk-and-batch-syntactic-usages branch 2 times, most recently from c53e4b8 to 5603258 Compare July 25, 2024 12:02
Christoph Hegemann added 3 commits July 25, 2024 14:05
@kritzcreek kritzcreek force-pushed the christoph/chunk-and-batch-syntactic-usages branch from 2154933 to 1fdb30d Compare July 25, 2024 12:05
Comment thread internal/codeintel/codenav/syntactic.go Outdated
)

// SU_CHUNK_SIZE is the batch size for SCIP documents and git diffs we load at a time.
var SU_CHUNK_SIZE = 20

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. This is not used in a lot of places, please use a clearer name instead of an abbreviation like SU_.
  2. This should be a const not a var unless it needs to be modified somewhere.
  3. Please document the reason for choosing this value. Did you do some napkin math/apply some heuristics? Did you do any benchmarks? Was it arbitrarily chosen?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not used in a lot of places, please use a clearer name instead of an abbreviation like SU_.

👍 I thought I'd match the SU_ prefix on the SyntacticUsages error enum, but I can spell out SYNTACTIC_USAGES instead.

Please document the reason for choosing this value. Did you do some napkin math/apply some heuristics? Did you do any benchmarks? Was it arbitrarily chosen?

I collected traces for various sizes (on my local machine) and 20 gave me "nice looking" ones. In general I expect 100 documents to be on the "higher end" of the number of documents to retrieve for a single syntactic usage search and 5 concurrent queries and git requests seems like a reasonable trade-off for concurrency vs load.
Happy to tweak this number. I don't think it makes sense to configure this at the instance level, but maybe we could accept a query parameter to let us easily test different amount of concurrency on S2.

Comment thread internal/codeintel/codenav/syntactic.go Outdated
Christoph Hegemann and others added 2 commits July 29, 2024 03:18
Co-authored-by: Varun Gandhi <varun.gandhi@sourcegraph.com>
@kritzcreek kritzcreek merged commit 3f86205 into main Jul 29, 2024
@kritzcreek kritzcreek deleted the christoph/chunk-and-batch-syntactic-usages branch July 29, 2024 02:09
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

cla-signed team/graph Graph Team (previously Code Intel/Language Tools/Language Platform) team/product-platform

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants