Skip to content
This repository was archived by the owner on Sep 30, 2024. It is now read-only.

fix: De-dup and concurrent-ify file content requests & splitting#64169

Merged
varungandhi-src merged 3 commits into
mainfrom
vg/batch-fetches
Jul 31, 2024
Merged

fix: De-dup and concurrent-ify file content requests & splitting#64169
varungandhi-src merged 3 commits into
mainfrom
vg/batch-fetches

Conversation

@varungandhi-src

Copy link
Copy Markdown
Contributor

This speeds up precise find usages from about 600ms to 400ms
for a test workload involving 16 precise usages across 5 files.

(Specifically for the codenav.Service type)

Fixes GRAPH-781

Stacked on top of:

I have some more thoughts on speeding this up further, but
I'll likely look into that after adding tests.

Test plan

Manually tested. Will add automated tests in a follow-up PR.

@cla-bot cla-bot Bot added the cla-signed label Jul 31, 2024
@github-actions github-actions Bot added team/graph Graph Team (previously Code Intel/Language Tools/Language Platform) team/product-platform labels Jul 31, 2024

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the iterator (and mixing the grouping and fetching logic) makes this very hard to read.

Can't we split out the grouping logic as its own synchronous step and then iterate over neededContents with conc.Map?

Any worries that accumulating whole files for all usages before starting to slice the contents could be a problem? If we grouped usages by file before we should be able to resolve the lines as soon as we get the file content and deallocate the remaining file.

@varungandhi-src varungandhi-src Jul 31, 2024

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've simplified the iterator logic.

Can't we split out the grouping logic as its own synchronous step and then iterate over neededContents with conc.Map?

We can't iterate over a map using conc.Map, so we'd need to maintain a separate slice (I was initially going down that route). So it has its own downside where you have the state in two different places, so I chose to implement it this way.

Any worries that accumulating whole files for all usages before starting to slice the contents could be a problem?

I don't think so, no, unless we get a lot of large files. Most files should be a few KB, multiply that by 1K tops you get a few MB at most. We can pass in the lines of interest if needed. If we do that, doing a 2-pass approach would be simpler using conc.Map.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I gave it a quick try. Shouldn't this do what we need?

func NewPreciseUsageResolvers(ctx context.Context, gitserverClient gitserver.Client, usages []shared.UploadUsage) ([]resolverstubs.UsageResolver, error) {
	uniquePaths := genslices.Uniq(genslices.Map(usages, func(usage shared.UploadUsage) types.RepoCommitPath {
		return types.RepoCommitPath{
			Repo:   usage.Upload.RepositoryName,
			Commit: usage.TargetCommit,
			Path:   usage.Path.RawValue(),
		}
	}))
	contentMap := map[types.RepoCommitPath][]string{}
	contentMapLock := sync.Mutex{}
	conciter.Iterator[types.RepoCommitPath]{MaxGoroutines: 4}.ForEach(uniquePaths, func(rcp *types.RepoCommitPath) {
		byteContents, err := gitresolvers.GetFullContents(ctx, gitserverClient,
			api.RepoName(rcp.Repo), api.CommitID(rcp.Commit), core.NewRepoRelPathUnchecked(rcp.Path))
		if err != nil {
			return
		}
		contentMapLock.Lock()
		contentMap[*rcp] = strings.Split(string(byteContents), "\n")
		contentMapLock.Unlock()
	})
	out := make([]resolverstubs.UsageResolver, 0, len(usages))
	for _, usage := range usages {
		contents, ok := contentMap[types.RepoCommitPath{
			Repo:   usage.Upload.RepositoryName,
			Commit: usage.TargetCommit,
			Path:   usage.Path.RawValue(),
		}]
		if !ok {
			continue
		}
		...
	}

@varungandhi-src varungandhi-src Jul 31, 2024

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, but (1) that needlessly does two passes with genslices.Uniq(genslices.Map and (2) it's tied to the implementation of constructing precise resolvers.

Honestly, at this point, I feel like there isn't a big difference except what I've written is a bit more low-level/imperative, whereas your implementation is more functional.

That said, I will change the code because I don't think we should spend more time debating this.

Base automatically changed from vg/more-docs to main July 31, 2024 06:53
@varungandhi-src varungandhi-src force-pushed the vg/batch-fetches branch 2 times, most recently from befd877 to fe5531e Compare July 31, 2024 09:18

@kritzcreek kritzcreek left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@varungandhi-src varungandhi-src merged commit ab2697a into main Jul 31, 2024
@varungandhi-src varungandhi-src deleted the vg/batch-fetches branch July 31, 2024 09:46
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

cla-signed team/graph Graph Team (previously Code Intel/Language Tools/Language Platform) team/product-platform

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants