Skip to content

bindings/go: Add memory usage test for streaming parser#181

Merged
varungandhi-src merged 4 commits into
mainfrom
vg/memtest
Jul 3, 2023
Merged

bindings/go: Add memory usage test for streaming parser#181
varungandhi-src merged 4 commits into
mainfrom
vg/memtest

Conversation

@varungandhi-src

Copy link
Copy Markdown
Contributor

We rely on low memory usage in Sourcegraph. https://github.com/sourcegraph/sourcegraph/pull/53828

So add a test to avoid regressing memory usage.

Test plan

Added a new test

@varungandhi-src varungandhi-src force-pushed the vg/memtest branch 2 times, most recently from 926abaa to c3efd5a Compare June 28, 2023 05:02
@varungandhi-src varungandhi-src requested a review from efritz June 28, 2023 05:02
@varungandhi-src varungandhi-src changed the title test: Add memory usage test for streaming parser bindings/go: Add memory usage test for streaming parser Jun 28, 2023

@efritz efritz left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments are just doc nits

Comment thread bindings/go/scip/memtest/low_mem_test.go
Comment thread bindings/go/scip/memtest/low_mem_test.go
Comment thread bindings/go/scip/memtest/low_mem_test.go
@varungandhi-src varungandhi-src merged commit ca0c0bc into main Jul 3, 2023
@varungandhi-src varungandhi-src deleted the vg/memtest branch July 3, 2023 02:05
coury-clark pushed a commit to sourcegraph/sourcegraph-public-snapshot that referenced this pull request Jul 24, 2023
…55160)

This should reduce memory usage significantly, as in the common
case (no two documents share the same relative path), we will
end up processing one document at a time.

I've tested the new code with a 5.7GB Chromium index, and we're
able to process it with even 100MB of memory (at the cost of
increased GC pressure).

We need to iterate over the index twice, first to get all external
symbols,
and then to process documents, as document processing requires access
to the external symbols list. This means we need the ability
to seek to the start again. I've implemented that as follows:
- For small indexes, just read the index into a slice.
- For large indexes, save the compressed index to a temporary file
   on disk, and rely on the GC and page cache to transparently
   drop pages earlier in the file when under memory pressure.
   I figured decompressing is cheap enough, that it doesn't make
   sense to have the extra I/O overhead of reading/writing the
   uncompressed index.

Other changes:
- Documents are no longer sorted by relative path during iteration
  The order of iteration is still deterministic though as it matches the
  order of documents in the index.

Questions:
- Should we add instrumentation see the memory usage at different
  stages of processing an index?
- How do we add a memory usage test (either here or in the upstream
scip bindings)? ([internal Slack
discussion](https://sourcegraph.slack.com/archives/C3B3SDBMY/p1687751363185919))

## Test plan

- [x] Update existing tests
- [x] Add low memory usage test -- added upstream.
scip-code/scip#181 <br> Backport
b98ca76 from #53828

Co-authored-by: Varun Gandhi <varun.gandhi@sourcegraph.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants