Skip to content

Compression for Memory tables#20168

Merged
alexey-milovidov merged 30 commits intomasterfrom
in-memory-compression
Feb 20, 2021
Merged

Compression for Memory tables#20168
alexey-milovidov merged 30 commits intomasterfrom
in-memory-compression

Conversation

@alexey-milovidov
Copy link
Copy Markdown
Member

@alexey-milovidov alexey-milovidov commented Feb 7, 2021

Changelog category (leave one):

  • Performance Improvement

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Add compress setting for Memory tables. If it's enabled the table will use less RAM. On some machines and datasets it can also work faster on SELECT, but it is not always the case. This closes #20093. Note: there are reasons why Memory tables can work slower than MergeTree: (1) lack of compression (2) static size of blocks (3) lack of indices and prewhere...

@robot-clickhouse robot-clickhouse added the pr-performance Pull request with some performance improvements label Feb 7, 2021
@alexey-milovidov
Copy link
Copy Markdown
Member Author

milovidov-desktop :) SELECT sum(x) FROM test_memory

SELECT sum(x)
FROM test_memory

Query id: 14833c4a-436f-40d6-8cce-f1a09db81158

┌─────sum(x)─┐
│ 1000000000 │
└────────────┘

1 rows in set. Elapsed: 0.140 sec. Processed 1.00 billion rows, 8.00 GB (7.13 billion rows/s., 57.04 GB/s.) 

milovidov-desktop :) SELECT sum(x) FROM test_memory_uncompressed

SELECT sum(x)
FROM test_memory_uncompressed

Query id: d11ce092-72a8-46c8-a75a-7275c45f1145

┌─────sum(x)─┐
│ 1000000000 │
└────────────┘

1 rows in set. Elapsed: 0.313 sec. Processed 1.00 billion rows, 8.00 GB (3.19 billion rows/s., 25.55 GB/s.) 

@alexey-milovidov
Copy link
Copy Markdown
Member Author

There is up to 5 times improvement in perf tests.

@kitaisreal kitaisreal self-assigned this Feb 7, 2021
@alexey-milovidov
Copy link
Copy Markdown
Member Author

01396_inactive_replica_cleanup_nodes_zookeeper

Broken after merging NuRaft.

@alexey-milovidov
Copy link
Copy Markdown
Member Author

#20494

@alexey-milovidov
Copy link
Copy Markdown
Member Author

AST fuzzer (ASan) — Received signal -3

OOM

@alexey-milovidov alexey-milovidov changed the title (prototype) Compression for Memory tables Compression for Memory tables Feb 17, 2021
@alexey-milovidov alexey-milovidov marked this pull request as draft February 18, 2021 00:03
@alexey-milovidov alexey-milovidov marked this pull request as ready for review February 18, 2021 00:54
@alexey-milovidov
Copy link
Copy Markdown
Member Author

I found another case how MergeTree tables can work better than Memory tables.

It happens due do dynamic block size selection.
Example:

milovidov-desktop :) SELECT avg(blockSize()) FROM test.hits_memory WHERE NOT ignore(URL)

SELECT avg(blockSize())
FROM test.hits_memory
WHERE NOT ignore(URL)

Query id: 6b374325-c2dc-4693-8927-da1bdcd92f35

┌──avg(blockSize())─┐
│ 65419.37380889436 │
└───────────────────┘

1 rows in set. Elapsed: 0.071 sec. Processed 8.87 million rows, 767.95 MB (124.56 million rows/s., 10.78 GB/s.) 

milovidov-desktop :) SELECT avg(blockSize()) FROM test.hits WHERE NOT ignore(URL)

SELECT avg(blockSize())
FROM test.hits
WHERE NOT ignore(URL)

Query id: c7d4073d-e360-4427-90dd-410241a74944

┌──avg(blockSize())─┐
│ 9279.622728591201 │
└───────────────────┘

1 rows in set. Elapsed: 0.054 sec. Processed 8.87 million rows, 767.95 MB (163.13 million rows/s., 14.12 GB/s.)

milovidov-desktop :) SELECT avg(length(URL)) FROM test.hits

SELECT avg(length(URL))
FROM test.hits

Query id: 653b288b-eac0-4aed-b5b7-a39390870930

┌──avg(length(URL))─┐
│ 77.54074297450794 │
└───────────────────┘

1 rows in set. Elapsed: 0.053 sec. Processed 8.87 million rows, 767.95 MB (165.86 million rows/s., 14.35 GB/s.)

@alexey-milovidov
Copy link
Copy Markdown
Member Author

@alexey-milovidov
Copy link
Copy Markdown
Member Author

@amosbird If we lower block size to 8192 for this Memory table, query will be faster - about 30 ms if I remember correctly.

@alexey-milovidov
Copy link
Copy Markdown
Member Author

This PR looks ready but I cannot enable compression by default:

  • sometimes it improves performance sometimes not; it depends on queries, on the dataset and on specific server configurations.

@alexey-milovidov alexey-milovidov merged commit b4196c8 into master Feb 20, 2021
@alexey-milovidov alexey-milovidov deleted the in-memory-compression branch February 20, 2021 06:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-performance Pull request with some performance improvements

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Compression for Memory tables

3 participants