-
Notifications
You must be signed in to change notification settings - Fork 25.8k
Optimising sorted scroll requests #23022
Copy link
Copy link
Closed
Labels
:Search/SearchSearch-related issues that do not fall into other categoriesSearch-related issues that do not fall into other categories>enhancementTeam:SearchMeta label for search teamMeta label for search teamhelp wantedadoptmeadoptmehigh hanging fruit
Metadata
Metadata
Assignees
Labels
:Search/SearchSearch-related issues that do not fall into other categoriesSearch-related issues that do not fall into other categories>enhancementTeam:SearchMeta label for search teamMeta label for search teamhelp wantedadoptmeadoptmehigh hanging fruit
Type
Fields
Give feedbackNo fields configured for issues without a type.
The performance of sorted scroll requests can be dominated by the time it takes to sort all documents on each tranche of hits. This can partially be amortised by increasing the
sizeof the scroll request, but that strategy soon starts to fail for other reasons. Ultimately the more documents you have, the longer it takes to sort them.When sorting by e.g. date, it can be much more efficient to break a single scroll request up into chunks, so that each scroll request deals with a subset of docs within a certain date range. Anecdotal evidence on an index of 50M docs reports an improvement from 7h to 10 mins!
It would be nice to be able to automate this internally within a single scroll request. The trickiest part is to figure out how big a chunk should be, given that data can be non-uniform. Simply asking the user wouldn't be sufficient as they may set a chunk of 1 hour, but an hour of missing data would simply return no results, indicating the end of the scroll request.
Here are a few possibilities:
gtbut notlt- the deeper you get the fewer documents you would match