Synthetic source numbers in columns#88025
Conversation
|
Pinging @elastic/es-analytics-geo (Team:Analytics) |
romseygeek
left a comment
There was a problem hiding this comment.
LGTM, one nit and one question left.
| /** | ||
| * Load all values for all docs up front. This should be much more | ||
| * disk and cpu-friendly than {@link ImmediateLeaf} because it resolves | ||
| * the values all at once, keeping the disk . |
There was a problem hiding this comment.
keeping the disk ... in suspense?
There was a problem hiding this comment.
| @Override | ||
| public boolean advanceToDoc(int docId) throws IOException { | ||
| return hasValue = leaf.advanceExact(docId); | ||
| idx = Arrays.binarySearch(docIdsInLeaf, docId); |
There was a problem hiding this comment.
Do we need to do a binary search here, given that we know we're visiting the docs from docIdsInLeaf in-order?
There was a problem hiding this comment.
Are you saying everytime folks call this we just advance idx and check? I think that's fine, yes.
There was a problem hiding this comment.
Yes, exactly. Won't make a big difference for small sets of documents, but with size=1000 and lots of fields then I think avoiding the binary search is a win.
We always call in increasing order
|
Pinging @elastic/es-search (Team:Search) |
This speeds up synthetic source for numbers and dates by loading them
column by column. On cached disk blocks, on average it's only 1ms per
for 1k documents, but it seems to help a fair bit in the worst case
and I expect it'll help much more on non-cached disk blocks.