Improvements skip index final exact mode (follow up to PR #78350)#79661
Conversation
|
Where this optimization helps - As an example, assume key range 30-60 is shortlisted by secondary index. Now extrapolate above for 1000's of parts and millions of ranges. The first cut implementation considered all ranges from all parts for intersection evaluation. This follow up PR keeps in mind typical real-world table layout and improves latency significantly by cutting down memory copies, sorting time, and range intersection evaluation. |
|
No code changes, just refreshed to master. |
|
Too many failures, hopefully nothing related to PR because it is self-contained. I will investigate shortly! |
|
None of the failures are related to the PR's code location, I am re-running failed jobs. |
|
Only failure after rerun is in AST fuzzer (amd_ubsan) - flaky, tracked by #79451 |
|
@nickitat Please review! |
| /// intersection boundary. | ||
| auto candidates_start = std::upper_bound(part_candidate_ranges.begin(), part_candidate_ranges.end(), | ||
| selected_lower_bound); | ||
| if (unlikely(candidates_start == part_candidate_ranges.end())) |
There was a problem hiding this comment.
| if (unlikely(candidates_start == part_candidate_ranges.end())) | |
| if (candidates_start == part_candidate_ranges.end()) |
| } | ||
|
|
||
| Values selected_highest_value; | ||
| std::vector<std::set<size_t>> part_selected_ranges(ranges_in_data_parts.size(), std::set<size_t>()); |
There was a problem hiding this comment.
As far as I see, we only use this DS to determine whether the particular granule was selected. Do we really have to store a boolean for each granule, or could we just do a similar binary search over selected ranges?
There was a problem hiding this comment.
This DS will only grow in size as per the number of granules was selected by the skip index. Even if doing binary search on selected ranges, we have to somewhere expand the selected ranges into granules, else code becomes complex. My first version of code did not have this DS and I was doing the 2 for loops in a single loop and the code was very messy and difficult to comprehend. The DS explains the code well and should grow just be a small subset of the total granules in the table.
There was a problem hiding this comment.
AFAIU, we would expand for only (candidates_end - candidates_start) granules. Maybe it is indeed not so important.
Then, I'd propose replacing set with vector + sort, at least to avoid the large number of small dynamic allocations that a set would make.
|
|
||
| const PartsRangesIterator selected_lower_bound = selected_ranges[0]; | ||
| /// highest value is important, range & part can be any | ||
| const PartsRangesIterator selected_upper_bound{selected_highest_value, false, MarkRange{0, 1}, 0, |
There was a problem hiding this comment.
Maybe it'd make sense to provide additional overloads to comparison functions to avoid such "hacks"
| selected_lower_bound); | ||
| if (unlikely(candidates_start == part_candidate_ranges.end())) | ||
| continue; /// no intersection possible in this part | ||
| auto candidates_end = std::upper_bound(part_candidate_ranges.begin(), part_candidate_ranges.end(), |
There was a problem hiding this comment.
IndexAccess has methods like findRightmostMarkLessThanValueInRange that seem to do something very similar - can we use them?
|
Clean CI Run! |
|
Only failure in |
741609f
Background in PR #78350. This PR implements couple of optimizations to reduce query execution time for typical primary key layout in parts (merged or unmerged) -
std::upper_bound()with the selected lower/upper bound to identify the appropriate sub-range of the part that will be further evaluated for intersection.Both the above steps will significantly reduce the number of memory copies and reduce the number of ranges in the sort step.
Changelog category (leave one):