-
Notifications
You must be signed in to change notification settings - Fork 8.3k
Roadmap 2022 (discussion) #32513
Description
This is ClickHouse open-source roadmap 2022.
Descriptions and links to be filled.
This roadmap does not cover the tasks related to infrastructure, orchestration, documentation, marketing, integrations, SaaS, drivers, etc.
See also:
Roadmap 2021: #17623
Roadmap 2020: in Russian
Main Tasks
✔️ Make clickhouse-keeper Production Ready
✔️ It is already feature-complete and being used in production.
✔️ Update documentation to replace ZooKeeper with clickhouse-keeper everywhere.
✔️ Support for Backup and Restore
✔️ Backup of tables, databases, servers and clusters.
✔️ Incremental backups. Support for partial restore.
✔️ Support for pluggable backup storage options.
✔️ Semistructured Data
✔️ JSON data type with automatic type inference and dynamic subcolumns.
✔️ Sparse column format and optimization of functions for sparse columns. #22535
Dynamic selection of column format - full, const, sparse, low cardinality.
✔️ Hybrid wide/compact data part format for huge number of columns.
✔️ Type Inference for Data Import
✔️ Allow to skip column names and types if data format already contains schema (e.g. Parquet, Avro).
✔️ Allow to infer types for text formats (e.g. CSV, TSV, JSONEachRow).
Support for Transactions
Atomic insert of more than one block or to more than one partition into MergeTree and ReplicatedMergeTree tables.
Atomic insert into table and dependent materialized views. Atomic insert into multiple tables.
Multiple SELECTs from one consistent snapshot.
Atomic insert into distributed table.
✔️ Lightweight DELETE
✔️ Make mutations more lightweight by using delete-masks.
✔️ It won't enable frequent UPDATE/DELETE like in OLTP databases, but will make it more close.
✔️ ### SQL Compatibility Improvements
✔️ Untangle name resolution and query analysis.
✔️ Initial support for correlated subqueries.
✔️ Allow using window functions inside expressions.
✔️ Add compatibility aliases for some window functions, etc.
✔️ Support for GROUPING SETS.
JOIN Improvements
✔️ Support for join reordering.
✔️ Extend the cases when condition pushdown is applicable.
Convert anti-join to NOT IN.
✔️ Use table sorting for DISTINCT optimization.
✔️ Use table sorting for merge JOIN.
✔️ Grace hash join algorithm.
Resource Management
✔️ Memory overcommit (sort and hard memory limits).
✔️ Enable external GROUP BY and ORDER BY by default.
✔️ IO operations scheduler with priorities.
✔️ Make scalar subqueries accountable.
✔️ CPU and network priorities.
Separation of Storage and Compute
✔️ Parallel reading from replicas.
✔️ Dynamic cluster configuration with service discovery.
✔️ Caching of data from object storage.
Simplification of ReplicatedMergeTree.
✔️ Shared metadata storage.
Experimental and Intern Tasks
Streaming Queries
Fix POPULATE for materialized views.
Unification of materialized views, live views and window views.
Allow to set up subscriptions on top of all tables including Merge, Distributed.
✔️ Normalization of Kafka tables with storing offsets in ClickHouse.
✔️ Support for exactly once consumption from Kafka, non-consuming reads and multiple consumers.
Streaming queries with GROUP BY, ORDER BY with windowing criterias.
Persistent queues on top of ClickHouse tables.
Integration with ML/AI
🗑️ Integration with Tensorflow
🗑️ Integration with MADLib
GPU Support
🗑️ Compile expressions to GPU
Unique Key Constraint
User-Defined Data Types
Incremental aggregation in memory
Key-value data marts
Text Classification
Graph Processing
Foreign SQL Dialects in ClickHouse
🗑️ Support for MySQL dialect or Apache Calcite as an option.
✔️ Batch Jobs and Refreshable Materialized Views
✔️ Embedded ClickHouse Engine
Data Hub
Build And Testing Improvements
Testing
✔️ Add tests for AArch64 builds.
✔️ Automated tests for backward compatibility.
✔️ Server-side query fuzzer for all kind of tests.
✔️ Fuzzing of query settings in functional tests.
SQL function-based fuzzer.
Fuzzer of data formats.
✔️ Integrate with SQLogicTest.
Import obfuscated queries from Yandex Metrica.
Builds
✔️ Docker images for AArch64.
✔️ Enable missing libraries for AArch64 builds.
✔️ Add and explore Musl builds.
Build all libraries with our own CMake files.
Embed root certificates to the binary.
Embed DNS resolver to the binary.
Add ClickHouse to Snap, so people will not install obsolete versions by accident.