Embeddings: optimize memory usage during decode by camdencheek · Pull Request #54972 · sourcegraph/sourcegraph-public-snapshot

camdencheek · 2023-07-14T18:58:26Z

This makes memory usage ~constant during the decode process. Previously, we were undersizing the large allocation (which caused it to reallocate ~10 extra times) for the index and making a bunch of additional smaller allocations per loop. This meant we were spiking memory usage during decoding, and even if the garbage collector could keep up (which I wouldn't expect it to), we'd still need 2x the index size at the peak for live memory. This was causing an OOM.

Before:

BenchmarkCustomRepoEmbeddingIndexDownload-10                   1        1212576041 ns/op        1299284528 B/op    96393 allocs/op

After:

BenchmarkCustomRepoEmbeddingIndexDownload-10                   1        1193088500 ns/op        589103720 B/op     65660 allocs/op

That's a 2.2x reduction in allocated bytes and a 1.4x reduction in allocation count. This is quite significant when our indexes are many GBs in memory.

Test plan

A new benchmark to demonstrate better memory behavior, a new quick test on the quantize changes, existing tests on encode/decode, and manual E2E tests for generating and searching with an embeddings index.

camdencheek · 2023-07-14T19:11:30Z

 			return nil, err
 		}

-		ei.Embeddings = make([]int8, 0, numChunks*ei.ColumnDimension)


First change: this we should be using embeddingsChunkSize instead of ColumnDimension because each chunk is not a vector, but rather a chunk of 10_000 floats.

camdencheek · 2023-07-14T19:12:09Z


-		ei.Embeddings = make([]int8, 0, numChunks*ei.ColumnDimension)
+		ei.Embeddings = make([]int8, 0, numChunks*embeddingsChunkSize)
+		embeddingsBuf := make([]float32, 0, embeddingsChunkSize)


Second change: move the decode buffer outside the loop so we don't allocate on every iteration.

camdencheek · 2023-07-14T19:12:40Z

 				return nil, err
 			}
-			ei.Embeddings = append(ei.Embeddings, Quantize(embeddingSlice)...)
+			ei.Embeddings = append(ei.Embeddings, Quantize(embeddingsBuf, quantizeBuf)...)


Third change: update Quantize to accept an optional buffer, which it will use if it's large enough to fit the output.

camdencheek · 2023-07-14T19:13:45Z

 	}
 }
+
+func BenchmarkCustomRepoEmbeddingIndexDownload(b *testing.B) {


I added this benchmark to prove to myself that this cuts down the allocations. See PR description for results.

keegancsmith

I guess on the switch to a third party vector store we won't keep this store around? IE it isn't worth making improvements to how we store stuff?

camdencheek · 2023-07-17T14:05:38Z

Yeah, there is a lot more room for improvement here, but hopefully (🤞) this won't be around too much longer

github-actions · 2023-07-17T14:48:04Z

The backport to 5.1 failed:

The process '/usr/bin/git' failed with exit code 1

To backport manually, run these commands in your terminal:

# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add .worktrees/backport-5.1 5.1
# Navigate to the new working tree
cd .worktrees/backport-5.1
# Create a new branch
git switch --create backport-54972-to-5.1
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 2880b4ffa0b8591d7a46ace259c8f522931dd617
# Push it to GitHub
git push --set-upstream origin backport-54972-to-5.1
# Go back to the original working tree
cd ../..
# Delete the working tree
git worktree remove .worktrees/backport-5.1

Then, create a pull request where the base branch is 5.1 and the compare/head branch is backport-54972-to-5.1.

This makes memory usage ~constant during the decode process. Previously, we were undersizing the large allocation (which caused it to reallocate ~10 extra times) for the index and making a bunch of additional smaller allocations per loop. This meant we were spiking memory usage during decoding, and even if the garbage collector could keep up (which I wouldn't expect it to), we'd still need 2x the index size at the peak for live memory. This was causing an OOM. This PR accounts for a 2.2x reduction in allocated bytes and a 1.4x reduction in allocation count. This is quite significant when our indexes are many GBs in memory.

camdencheek added 2 commits July 14, 2023 12:55

optimize memory during decoding

8a59ca9

add comment

40c597c

cla-bot Bot added the cla-signed label Jul 14, 2023

camdencheek changed the title ~~Cc/optimize decode memory~~ Embeddings: optimize memory usage during decode Jul 14, 2023

update naming

571e662

camdencheek commented Jul 14, 2023

View reviewed changes

camdencheek added 2 commits July 14, 2023 13:34

add quantize quick tests

b93c72d

bazel configure

1d10960

camdencheek requested a review from a team July 14, 2023 19:38

camdencheek marked this pull request as ready for review July 14, 2023 19:39

gl-srgr approved these changes Jul 17, 2023

View reviewed changes

keegancsmith approved these changes Jul 17, 2023

View reviewed changes

camdencheek merged commit 2880b4f into main Jul 17, 2023

camdencheek deleted the cc/optimize-decode-memory branch July 17, 2023 14:06

camdencheek added the backport 5.1 label Jul 17, 2023

github-actions Bot added backports failed-backport-to-5.1 release-blocker Prevents us from releasing: https://about.sourcegraph.com/handbook/engineering/releases labels Jul 17, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Embeddings: optimize memory usage during decode#54972

Embeddings: optimize memory usage during decode#54972
camdencheek merged 5 commits into
mainfrom
cc/optimize-decode-memory

camdencheek commented Jul 14, 2023 •

edited

Loading

Uh oh!

camdencheek Jul 14, 2023

Uh oh!

camdencheek Jul 14, 2023

Uh oh!

camdencheek Jul 14, 2023

Uh oh!

camdencheek Jul 14, 2023

Uh oh!

keegancsmith left a comment

Uh oh!

camdencheek commented Jul 17, 2023

Uh oh!

github-actions Bot commented Jul 17, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

camdencheek commented Jul 14, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Test plan

Uh oh!

camdencheek Jul 14, 2023

Choose a reason for hiding this comment

Uh oh!

camdencheek Jul 14, 2023

Choose a reason for hiding this comment

Uh oh!

camdencheek Jul 14, 2023

Choose a reason for hiding this comment

Uh oh!

camdencheek Jul 14, 2023

Choose a reason for hiding this comment

Uh oh!

keegancsmith left a comment

Choose a reason for hiding this comment

Uh oh!

camdencheek commented Jul 17, 2023

Uh oh!

github-actions Bot commented Jul 17, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

camdencheek commented Jul 14, 2023 •

edited

Loading