Sparse serialization and ColumnSparse#22535
Merged
alesapin merged 102 commits intoClickHouse:masterfrom Dec 17, 2021
Merged
Conversation
alesapin
approved these changes
Dec 3, 2021
| /// Convert to full column, because sparse column has | ||
| /// access to element in O(log(K)), where K is number of non-default rows, | ||
| /// which can be inefficient. | ||
| convertToFullIfSparse(chunk); |
Member
There was a problem hiding this comment.
I think it's better to add such comments for each place where we use convertToFullIfSparse. Because it looks like code here works with some column internals, but just not ready for sparse format.
alesapin
reviewed
Dec 8, 2021
Member
|
Tests Ok, let's merge! |
Contributor
|
Internal documentation ticket: DOCSUP-20369 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Implemented sparse serialization. It can reduce usage of disk space and improve performance of some queries for columns, which contain a lot of default (zero) values. It can be enabled by setting
ratio_of_defaults_for_sparse_serialization. Sparse serialization will be chosen dynamically for column, if it has ratio of number of default values to number of all values above that threshold. Serialization (default or sparse) will be fixed for every column in part, but may varies between parts.Detailed description / Documentation draft:
Second part of #19953.
TODO:
ColumnSparse.