Skip to content

Roadmap 2021 (discussion) #17623

@alexey-milovidov

Description

@alexey-milovidov

This is ClickHouse roadmap 2021.
Descriptions and links to be filled.
It will be published in documentation in December.

Main tasks

✔️ Provide alternative for ZooKeeper

Implementation of a server with ZooKeeper interface inside ClickHouse.

Done, @alesapin

#15090 #16877 #19580 #20585 #21425 #21677 #21593 #21690 #22274 #26150 #28981 #31150 #30880 #30678 #30372 #30170 #29417 #29367 #29268 #29223 #29071 #29030 #28526 #28519 #28360 #28152 #28190 #28197 #28143 #28080 #27818 #27125 #26874 #25428 #25421 #24533 #24499 #24448 #24412 #24059 #24017 #23077 #23038 #22992 #22743 #22707 #22470 #22373 #22274 #21677

✔️ Nested and semistructured data

In progress, @CurtizJ

Reading of subcolumns from tables. Nested type with arbitrary nesting level. Unify Nested and named tuples. Better support for nested and named tuples in syntax. Naturally map Nested to JSON format. Map datatype. Move ser/de methods from DataType to Column. Allow different column representations to store the same DataType. ColumnSparse. Codec inference from data. Dynamic columns in tables.

#21562
#21157
#21699
#21562
#14196
#17310
#14963
#15806
#1841

Limited support for transactions

Atomic inserts into table and all dependent materialized views. Atomic inserts of more than one block.
Acquire a snapshot to use in multiple SELECT queries.
In progress, @tavplubix

#22086

✔️ Backups

#13953
@vitlibar
#21945

✔️ Hedged requests

#19291
Done, @Avogar

✔️ Window functions

Experimental support, @akuzm
#18222
#18455
#19022
#19299
#19921
#19951
#20041
#20060
#20111
#20284
#20293
#20337
#21895

✔️ Separation of storage and compute

✔️ Object storage for Replicated tables: #16240
✔️ Support for partitions in file-like engines
✔️ Distributed INSERT and SELECT over file-like engines, @nikitamikhaylov #22012
✔️ Remove ugliness and general inefficiencies from reading from remote storage.
Remote filesystem over ClickHouse server
✔️ Distributed SELECT over MergeTree on shared filesystem, @nikitamikhaylov #29279

✔️ Short-circuit evaluation

Done, @Avogar

✔️ Projections

Experimental stage.
#20202
@amosbird

ALTER PRIMARY KEY

In progress, @amosbird

✔️ Lightweight DELETE/UPDATE

#24755

✔️ Workload management

Add async method for processors. Shared thread pool for all queries.
@KochetovNicolai

✔️ User-Defined Functions

Done, @kitaisreal
SQL UDFs - done!
Executable UDFs - done!

Simplify replication

JOIN improvements

#18672
@vdimir

Embedded documentation

In progress, @FArthur-cmd

Pluggable auth with tokens

Experimental and interns tasks

🗑️ Calculation of test coverage on a per-query basis

Dropped.
@myrrc
#20539

Limited support for correlated subqueries

Postponed.

✔️ PostgreSQL table engine.

Done.
@kssenii
#18554

✔️ Streaming replication from PostgreSQL.

#20470, Done.
@kssenii

✔️ Implement SQL/JSON standard.

Done, #24148

✔️ Table constraints and hypothesis on data for query optimization

Done, #18787, #31476
@nikvas0

✔️ Schema inference for text formats

Done, @Avogar

🗑️ Advanced compression methods

Cancelled.

🗑️ Integration of ClickHouse with Tensorflow

Cancelled.

✔️ Integration of more streaming data sketches in ClickHouse

Two new sketches are added.

✔️ Data processing with external tools in streaming fashion aka ClickHouse MR

Done @kitaisreal

🗑️ Caching of deserialized data in memory on MergeTree part level

Cancelled.

✔️ Subquery operators: INTERSECT/EXCEPT, ANY/ALL/EXISTS.

Done.

✔️ Implementation of GROUPING SETS.

In progress.

✔️ Refreshable materialized views and cron jobs.

In progress.

User-defined data types

In progress.

Limited support for unique key constraints.

✔️ YAML configuration

Done, scheduled for release in 21.7.
#21858
@BoloniniD

Incremental data aggregation in memory

In progress.

✔️ Natural language processing functions

Done, @evillique.

✔️ Implementation of a table engine to consume application log files

Done, @ucasfl, @kssenii

✔️ Collection of common system metrics

Done, @alexey-milovidov

✔️ Integration of S2 geometry library

Done.

SQL functions for compatibility with MySQL

A few functions were added. Review stage.

Data formats for fast import of nested JSON and XML

In progress.

✔️ Text classification

Done, @evillique

✔️ Data encryption on-rest

Done, @alexelex, @vitlibar

🗑️ NEAR modifier for GROUP BY

Cancelled.

🗑️ Specialized precompression codecs

Moved to 2022.

✔️ Integration of SQLite as database engine and data format

Done.

✔️ Query cache for result datasets

Postponed.

✔️ Support for INFORMATION SCHEMA

Done by @tavplubix

🗑️ Arrow Flight interface

Cancelled.

✔️ Functions and data types for geospatial data

Experimental stage.

✔️ User-Agent parsing functions

#21694

Integrate novel optimization for GROUP BY

✔️ Descriptive analysis of datasets

Done.

🗑️ Learning of vector embeddings for table rows

Cancelled.

🗑️ Userspace RAID

Postponed.

✔️ VFS over HDFS

#11058

🗑️ Etcd instead of ZooKeeper

#17495 Cancelled.

🗑️ GPU accelerated aggregate functions

nVidia
Cancelled.

✔️ Rewrite type inference and identifiers analysis

E.g. a way to analyze this query

WITH b + 1 AS c
SELECT a AS b, *, t.*, n.b, a -> a = b + 1 AS func, arrayMap(func, n.c)
FROM mysql(...) RIGHT JOIN (SELECT ...) t ARRAY JOIN nest AS n

in a generic, not ad-hoc fashion.

In progress, @kitaisreal

Tech debt and small tasks

✔️ Fix low performance of encrypt/decrypt functions

Done. @alexey-milovidov

✔️ Fix the remaining issues with in-memory parts and WAL

@CurtizJ
We removed in-memory parts and WAL.

✔️ Continue to support play.clickhouse.com

There is no source code. The version of ClickHouse is too old. There are multiple bugs.
Or remove it completely. @qoega

✔️ Fix issues with Postgres via ODBC

Done @kssenii

✔️ User roles from LDAP

Done. @traceon, @vitlibar

✔️ Remove DataStreams

Done, @KochetovNicolai

🗑️ Incremental data clustering

Cancelled, @KochetovNicolai

✔️ Min-hash, Sim-hash support

Done. @KochetovNicolai, @alexey-milovidov

✔️ Enable compile_expressions by default

Done. @kitaisreal

✔️ Z-order indexing

In creeping progress.

✔️ Low performance of ser/de functions of DataType

Due to introduction of "Data type domains".

✔️ Library dictionary bridge

Done, @kssenii
#21509

✔️ Versioning of aggregate function states

Done.
@kssenii

✔️ Type conversions for IN, JOIN

Done. @vdimir
#16724
#18672

✔️ Support for all types in CASE operator with values

✔️ Extended range for DateTime64

Done, @Enmk, @alexey-milovidov
#9404

✔️ Improve logic of priorities of background merges

@nikitamikhaylov #22381
Done.

✔️ Better criteria for Too Many Parts

✔️ Speed-up ODBC table engine

Done, @kssenii

✔️ Replace OpenSSL with BoringSSL

Done. @alexey-milovidov
#16043
#18129

Enable pk-aware GROUP BY by default

#19401

✔️ Deduplication for non-replicated MergeTree on block level

Done, @yuzhichang, @alesapin: #8467

✔️ Pre-configured named connections in config

To avoid specifying user/password for external storages.
Done, @kssenii

Testing improvements

✔️ Automated tests for AArch64 builds

#15174
#22534
#22580
#22582
#22590
#22595
#22596
#22632

✔️ Add Query Fuzzer for Stress Tests

Done.

✔️ Add Thread Fuzzer for flaky tests checking

Done, #18299

Import obfuscated queries from Yandex.Metrica production

#29672

Fuzzing of cluster configurations

Fuzzing of ClickHouse versions for tests with distributed queries for compatibility

✔️ Integrate SQLancer

Done, @qoega
#19006
#19077

But it is abandoned and does not work anymore.

Integrate SQLLogicTest

#15112
#18706
#18701
#18707

✔️ More intense fuzzing of new added tests

Done, @alexey-milovidov
#18916

🗑️ Network replay server

Moved to next year.

✔️ Add PowerPC cross-builds

#25486
#30010

✔️ Add Darwin/AArch64 cross-builds

✔️ Ensure that no source files from OS are used during build

#18915
#29974
#30011

Metadata

Metadata

Assignees

No one assigned

    Labels

    comp-documentationDocumentation (docs, examples, READMEs).

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions