Skip to content

Enable compact parts by default for small parts#11913

Merged
alexey-milovidov merged 17 commits intomasterfrom
compact-parts-by-default
Sep 19, 2020
Merged

Enable compact parts by default for small parts#11913
alexey-milovidov merged 17 commits intomasterfrom
compact-parts-by-default

Conversation

@alexey-milovidov
Copy link
Copy Markdown
Member

@alexey-milovidov alexey-milovidov commented Jun 24, 2020

Changelog category (leave one):

  • Performance Improvement

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Enable compact parts by default for small parts. This will allow to process frequent inserts slightly more efficiently (4..100 times).

@blinkov blinkov added the pr-performance Pull request with some performance improvements label Jun 24, 2020
@alexey-milovidov
Copy link
Copy Markdown
Member Author

Performance tests:

merge_tree_many_partitions_2

Sped up two times.

codecs...

Various difference.

@CurtizJ
Copy link
Copy Markdown
Member

CurtizJ commented Jul 3, 2020

codecs...

Various difference.

It's unrelated, because compact parts don't support columns' special codecs. Need to force creation of wide parts in those tests.

@alexey-milovidov
Copy link
Copy Markdown
Member Author

alexey-milovidov commented Jul 3, 2020

compact parts don't support columns' special codecs

But why not?
It's reasonable to use codecs even for compact parts.
What if I want NONE... or if I know that Delta, LZ4 is better?

@CurtizJ
Copy link
Copy Markdown
Member

CurtizJ commented Jul 3, 2020

It's possible, but we have to write compressed block for every column in granule. Won't blocks be too small? Also, in that case it will be possible to count per column sizes.

@alexey-milovidov
Copy link
Copy Markdown
Member Author

Ok, I understand the reason.

@alexey-milovidov
Copy link
Copy Markdown
Member Author

There are plans to use ClickHouse with high-latency filesystem (e.g. S3). For that case the users may want to enable 100% compact parts. And if codecs won't work, it's a huge drawback.

@alexey-milovidov
Copy link
Copy Markdown
Member Author

alexey-milovidov commented Jul 29, 2020

@CurtizJ Do you want to take over this PR?

@CurtizJ CurtizJ self-assigned this Jul 30, 2020
@CurtizJ
Copy link
Copy Markdown
Member

CurtizJ commented Sep 3, 2020

I want to finish #12183 first, and after it will be merged enable compact parts by default.

@CurtizJ
Copy link
Copy Markdown
Member

CurtizJ commented Sep 10, 2020

I don't understand why huge part of tests from skip list of check with enabled compact parts is skipping now in all checks. Skip list is unchanged in this PR.

@CurtizJ
Copy link
Copy Markdown
Member

CurtizJ commented Sep 11, 2020

It's because of code in clickhouse-test script.

@CurtizJ
Copy link
Copy Markdown
Member

CurtizJ commented Sep 14, 2020

00926_adaptive_index_granularity_pk: [ FAIL ] 670.21 sec. - Timeout!

In this test a table with index_granularity_bytes=40 is created, which leads to index_granularity=1. With compact parts this configuration is extremely inefficient, because of writing compressed block for every granule of column. It is not a regular case and performance tests didn't show regressions with default settings, when this behaviour was introduced and I think it's ok.

Maybe we need introduce a setting to allow set behaviour, when compressed block is written only when threshold is exceded.

@CurtizJ
Copy link
Copy Markdown
Member

CurtizJ commented Sep 14, 2020

The same about 00804_test_alter_compression_codecs.

@CurtizJ
Copy link
Copy Markdown
Member

CurtizJ commented Sep 14, 2020

Other fixed tests checked some implementation defined things like checksums or number of files or hardcoded paths to files. Also some compatibility tests, which run cluster with different versions older than 20.3, were failed, because replication protocol is incompatible with such old versions, when polymorphic parts are used.
Didn't find any real bugs.

@alexey-milovidov
Copy link
Copy Markdown
Member Author

@akuzm AST Fuzzer:
2020-09-18 08:00:36 ERROR 404: Not Found.

@alexey-milovidov alexey-milovidov merged commit eb9ee72 into master Sep 19, 2020
@alexey-milovidov alexey-milovidov deleted the compact-parts-by-default branch September 19, 2020 12:31
@alexey-milovidov
Copy link
Copy Markdown
Member Author

@CurtizJ I suspect that this PR has broke integration tests:
eb9ee72 (look at this commit in master)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-performance Pull request with some performance improvements

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants