Skip to content

Intern Tasks 2021/2022 #29601

@alexey-milovidov

Description

@alexey-milovidov

This is the list of proposed tasks. It is to be extented. You can propose more tasks.
You can also find the previous lists here:

2020/2021: #15065
2019/2020: https://gist.github.com/alexey-milovidov/4251f71275f169d8fd0867e2051715e9
2018/2019: https://gist.github.com/alexey-milovidov/6735d193762cab1ad3b6e6af643e3a43
2017/2018: https://gist.github.com/alexey-milovidov/26cc3862eb87e52869b9dac64ab99156

The tasks should be:

  • not too hard (doable within about a month) but usually not less than a week;
  • not alter some core components of the system;
  • mostly isolated, does not require full knowledge of the system;
  • somewhat interesting to implement or have some point of research;
  • not in critical path of our roadmap (ok to be throwed away after a year);
  • most of them are for C++ developers, but there should be also tasks for frontend developers or tools/research that only require Go/Python/whatever;
  • some tasks should allow team work;
  • cover various skills, e.g. system programming, algorithm knowledge, etc...

This is a draft. Descriptions to be filled.

⚙️ Aggregate functions for graph processing

Booked by @ElderlyPassionFruit and @Hattonuri

#29701

⚙️ Aggregate functions for statistical tests

Booked by @trenin17

#3266

Normality tests and similar.

✔️ Representation of ZooKeeper data model as a flat table in ClickHouse

Booked by @punkmunk
But implemented by completely different people.

#25645 #22130

🗑️ Network replay server for testing

Booked by @zhukowladimir

#11481

✔️ Evaluation and testing of non-cryptographic hash functions in ClickHouse

Booked by @olevino

Integrate wyhash, meowhash, aquahash, farsh, t1ha and highwayhash.

⚙️ Parallel compression for data export

Booked by @kavladst

Allow to parallelize data export into gz, xz and bzip2 formats.

🗑️ ClickHouse in a web browser with WebAssembly

Booked by @Alucardik
An experiment has been finished successfully and the outcome is a demonstration of why this task is unrealistic.

⚙️ Implementation of Graphite Carbon API (Graphite Web) in ClickHouse

Booked by @qwertBR

🗑️ Implementation of Prometheus querying API in ClickHouse

Booked by @gitnabi

✔️ Minimal plotting capabilities in ClickHouse

Booked by @vlerdman
Reimplemented by @alexey-milovidov as /dashboard UI.

⚙️ Collecting of Linux Perf data in ClickHouse

Booked by @rubin-do
A prototype has been demonstrated, but it has low applicability.

✔️ Integration of ClickHouse with MeiliSearch

Booked by @Michicosun

🗑️ Time series analysis with window functions

Booked by @mathalex Alexey Boykov.

Simple moving average. Holt-Winters forecast. ARIMA. Discovery of "shock events".
A modifier for ORDER BY WITH FILL or similar to fill data with extrapolation.

🗑️ API endpoints based on parametrized views in ClickHouse

Booked by @Fancy2000

Manage HTTP handlers (API endpoints) with SQL queries (creating parametrized views a.k.a. table functions).

⚙️ Embedded ClickHouse as a Python module

Booked by @LGrishin

⚙️ Compilation of expressions to GPU code

Booked by @evillique

✔️ Integrating Rust code into ClickHouse

Booked by @BoloniniD

With BLAKE3 hash function as an example.

⚙️ Key value data marts in ClickHouse

Booked by @dankondr

#33581

⚙️ Improvements of ClickHouse integration with foreign databases

Booked by @aapetrenko and @kate1mag

Table functions to access MongoDB, Redis, and Cassandra. Integration with ElasticSearch.
Unrestricted reads from ZooKeeper.

⚙️ Improvements of ClickHouse integration with data streams

Booked by @tchepavel

Integration with Apache Pulsar, Redis Streams, NATS or Kinesis, SQS.
NATS successfully merged and used in production.
Redis Streams is in pull request stage.

🗑️ Integration of ClickHouse with embedded key-value stores

Booked by @nautaa

Integration with TerarkDB, libfpta, FASTER.

🗑️ Integration of ClickHouse with MADLib

Booked by @antikvist, @sabinadayanova
#4425

✔️ Schema inference for data formats. Support for new input/output formats in ClickHouse

Booked by @Avogar

Flatbuffers, HDF5 and sas7bdat.

🗑️ Advanced compression methods in ClickHouse

Booked by @takashirei
bsc, csc and bcm.

✔️ ClickHouse as a backend for Istio Telemetry.

Booked by @Romanchenko
https://github.com/Romanchenko/telemetry_broker
Limited applicability.

✔️ Versions Playground for ClickHouse

Booked by @darkkeks
https://fiddle.clickhouse.com/

⚙️ Integration of ClickHouse with MySQL Parser

Booked by @mrworker27

🗑️ Tamper-proof data storage with blockchain

Booked by @Justarone

🗑️ Isolation of user-defined functions with Firecracker VM

Booked by @ivolff

🗑️ Direct import from files inside tar/zip/7z archives

Booked by @0442A403

⚙️ SQL functions for compatibility with MySQL dialect

#7320

Booked by @Shuba-Buba, @evlampiy-lavrentiev, @psevdoinsaf

✔️ Porting ClickHouse SIMD optimizations to ARM NEON

Booked by @chalice19, preliminary

🗑️ Limited support for correlated subqueries in ClickHouse

Booked by @Amesaru

🗑️ : User Defined Functions with Julia, R or Scipy

Booked by @vvd170501

⚙️ Functions to extract data from HTML with CSS selectors

Booked by @zdikov

⚙️ Improvements of PREWHERE operator in ClickHouse.

Booked by @nikvas0

⚙️ Implicit user credentials and TOTP for authentication

Booked by @kam3nskii

🗑️ : Parallel execution of Distributed DDL queries

Booked by @shaprunovk

🗑️ Fuzzy GROUP BY for data clustering

Booked by @umchemurziev

✔️ Optimization of caching strategies in ClickHouse

Booked by @alexX512

⚙️ Optimization of queries with ordering by sublinear aggregate functions

Booked by @dimarub2000

✔️ Extensions of ZooKeeper protocol for transactions.

Booked by @asokol123
Finished by @antonio2368, #41410.

⚙️ Improvements for CASE operator and transform function

Booked by @pmimanukyan

⚙️ Comparison of Snap, AppImage and Flatpak formats on ClickHouse builds

Booked by @TrueAstralpirate

✔️ Integration of ClickHouse with Observable and Falcon

Booked by @DotJason

⚙️ Implementation of GWP-Asan and comparison of memory allocators in ClickHouse

**Booked by **

⚙️ Implementation of aggregate function combinators: TOTAL, BY and ORDER BY.

**Booked by **

🗑️ Improvements of ClickHouse fuzzing

Booked by @mark-polokhov

✔️ Optimizations of ClickHouse for cloud infrastructure

Booked by @nikitamikhaylov

⚙️ Extending Date and Time Functions in ClickHouse

Booked by @elevankoff

🗑️ Extended Temporary Tables in ClickHouse

**Booked by **

✔️ Specialized compression codecs for floating point data

Booked by @koloshmet

⚙️ Grace hash JOIN

Booked by Sergei Skvortsov, @BigRedEye

Probabilistic data structures for approximate (range) filtering in ClickHouse queries.

For example: SuRF: Practical Range Query Filtering with Fast Succinct Tries (2018) and Proteus: A Self-Designing Range Filter (2022)

Contact: @rschu1ze

Booked by @ruct

Investigate last-level cache partitioning for ClickHouse queries (Intel Cache Allocation Technology)

For example, Accelerating Concurrent Workloads with CPU Cache Partitioning (2018) and Data Processing on Modern Hardware

Contact: @rschu1ze

Entropy-learned Hashing

Try out Entropy-Learned Hashing Constant Time Hashing with Controllable Uniformity (2022) in ClickHouse's hash aggregation

Contact: @rschu1ze

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions