searcher: use the same gitserver fetch timeout as symbols#47469
Conversation
Searcher has a hardcoded timeout of 10 minutes. Interestingly symbols has a hardcoded timeout of 2 hours. Now this is likely due to fetching more history for a repository for rockskip. However, I suspect 10 minutes is too aggressive for large monorepos. As such we use a much more conservative timeout of 2h. The downside of this approach is we have a relatively naive semaphore controlling concurrent fetches. So if we have a large burst of activity that never gets retried it will take a lot longer for us to cancel out the backlog. However, in practice this feels like the better tradeoff as well as this being consistent with symbols. Test Plan: go test
|
@chwarwick I got started on implementing the stub API I promised, but noticed this along the way. This may be the root cause of why we never end up processing historical insights on large monorepos sometimes. |
Thanks @keegancsmith. Do you have any recommendations about what would typically be the best way to parallelize the insights historic searches? We can pretty easily change between the following:
|
There is no good answer here since it depends on how many searchers they have/etc. I would make your semaphore scale based on the number of searchers there are. This is something we do today https://sourcegraph.com/github.com/sourcegraph/sourcegraph@51442394e48f51334677bc91d69fbf4a7909e76f/-/blob/internal/search/searcher/search.go?L85-92 Let N be the number of searchers. Then I would do multiple repos in parallel, searching N revision at a time. |
Searcher has a hardcoded timeout of 10 minutes. Interestingly symbols has a hardcoded timeout of 2 hours. Now this is likely due to fetching more history for a repository for rockskip. However, I suspect 10 minutes is too aggressive for large monorepos. As such we use a much more conservative timeout of 2h.
The downside of this approach is we have a relatively naive semaphore controlling concurrent fetches. So if we have a large burst of activity that never gets retried it will take a lot longer for us to cancel out the backlog. However, in practice this feels like the better tradeoff as well as this being consistent with symbols.
Test Plan: go test