Skip to content

Improvements skip index final exact mode (follow up to PR #78350)#79661

Merged
shankar-iyer merged 11 commits intoClickHouse:masterfrom
shankar-iyer:improvements_skip_index_final_exact_mode
Jun 3, 2025
Merged

Improvements skip index final exact mode (follow up to PR #78350)#79661
shankar-iyer merged 11 commits intoClickHouse:masterfrom
shankar-iyer:improvements_skip_index_final_exact_mode

Conversation

@shankar-iyer
Copy link
Copy Markdown
Member

Background in PR #78350. This PR implements couple of optimizations to reduce query execution time for typical primary key layout in parts (merged or unmerged) -

  • From the ranges selected by the skip index, determine lowest bound and upper bound of the primary key. Now for each part, consider the part's ranges for intersection only if selected lower/upper bound are within the part's primary key lower/upper bound.
  • If a part's ranges have to be considered for intersection (after above check), use std::upper_bound() with the selected lower/upper bound to identify the appropriate sub-range of the part that will be further evaluated for intersection.

Both the above steps will significantly reduce the number of memory copies and reduce the number of ranges in the sort step.

Changelog category (leave one):

  • Not for changelog (changelog entry is not required)

@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh bot commented Apr 28, 2025

Workflow [PR], commit [50f7207]

@clickhouse-gh clickhouse-gh bot added the pr-not-for-changelog This PR should not be mentioned in the changelog label Apr 28, 2025
@shankar-iyer
Copy link
Copy Markdown
Member Author

shankar-iyer commented May 9, 2025

Where this optimization helps -

1. Assume a table #1 with many parts and all are merged parts with no PK overlap and the PK ranges are -
        part #1 -> 1..100, part #2 -> 101..200, part #3 -> 201..300.
 2. Assume a table #2 with many parts and there is unmerged update activity and the parts look like this -
        part #1 -> 1..100, part #2 -> 60..200, part #3 -> 40..300., part #4 -> 250..400

As an example, assume key range 30-60 is shortlisted by secondary index.

In table #1, the PR ensures that ranges from part #2, part #3 and beyond are just eliminated from
intersection evaluation with a couple of checks
In table #2, the PR ensures that only a small subset of ranges from part #2 and part #3 
encompassing PK range 30-60 are included for intersection evaluation and same applies
for other parts where intersection is feasible. 

Now extrapolate above for 1000's of parts and millions of ranges. The first cut implementation considered all ranges from all parts for intersection evaluation. This follow up PR keeps in mind typical real-world table layout and improves latency significantly by cutting down memory copies, sorting time, and range intersection evaluation.

@shankar-iyer
Copy link
Copy Markdown
Member Author

No code changes, just refreshed to master.

@shankar-iyer
Copy link
Copy Markdown
Member Author

Too many failures, hopefully nothing related to PR because it is self-contained. I will investigate shortly!

@shankar-iyer
Copy link
Copy Markdown
Member Author

None of the failures are related to the PR's code location, I am re-running failed jobs.

@shankar-iyer
Copy link
Copy Markdown
Member Author

Only failure after rerun is in AST fuzzer (amd_ubsan) - flaky, tracked by #79451

@shankar-iyer
Copy link
Copy Markdown
Member Author

@nickitat Please review!

/// intersection boundary.
auto candidates_start = std::upper_bound(part_candidate_ranges.begin(), part_candidate_ranges.end(),
selected_lower_bound);
if (unlikely(candidates_start == part_candidate_ranges.end()))
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (unlikely(candidates_start == part_candidate_ranges.end()))
if (candidates_start == part_candidate_ranges.end())

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done!

}

Values selected_highest_value;
std::vector<std::set<size_t>> part_selected_ranges(ranges_in_data_parts.size(), std::set<size_t>());
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I see, we only use this DS to determine whether the particular granule was selected. Do we really have to store a boolean for each granule, or could we just do a similar binary search over selected ranges?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This DS will only grow in size as per the number of granules was selected by the skip index. Even if doing binary search on selected ranges, we have to somewhere expand the selected ranges into granules, else code becomes complex. My first version of code did not have this DS and I was doing the 2 for loops in a single loop and the code was very messy and difficult to comprehend. The DS explains the code well and should grow just be a small subset of the total granules in the table.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIU, we would expand for only (candidates_end - candidates_start) granules. Maybe it is indeed not so important.
Then, I'd propose replacing set with vector + sort, at least to avoid the large number of small dynamic allocations that a set would make.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack, done!


const PartsRangesIterator selected_lower_bound = selected_ranges[0];
/// highest value is important, range & part can be any
const PartsRangesIterator selected_upper_bound{selected_highest_value, false, MarkRange{0, 1}, 0,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it'd make sense to provide additional overloads to comparison functions to avoid such "hacks"

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, reworked.

selected_lower_bound);
if (unlikely(candidates_start == part_candidate_ranges.end()))
continue; /// no intersection possible in this part
auto candidates_end = std::upper_bound(part_candidate_ranges.begin(), part_candidate_ranges.end(),
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IndexAccess has methods like findRightmostMarkLessThanValueInRange that seem to do something very similar - can we use them?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, done!

@shankar-iyer
Copy link
Copy Markdown
Member Author

Clean CI Run!

@shankar-iyer
Copy link
Copy Markdown
Member Author

Only failure in "[Stateless tests (tsan, s3 storage, 3/3)]" is data-race, tracked by #80866

@shankar-iyer shankar-iyer enabled auto-merge June 3, 2025 15:50
@shankar-iyer shankar-iyer added this pull request to the merge queue Jun 3, 2025
Merged via the queue into ClickHouse:master with commit 741609f Jun 3, 2025
118 of 121 checks passed
@shankar-iyer shankar-iyer deleted the improvements_skip_index_final_exact_mode branch June 3, 2025 16:06
@robot-ch-test-poll2 robot-ch-test-poll2 added the pr-synced-to-cloud The PR is synced to the cloud repo label Jun 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-not-for-changelog This PR should not be mentioned in the changelog pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants