-
Notifications
You must be signed in to change notification settings - Fork 8.3k
Allow to explicitly pass query string to table functions for federated queries. #46758
Description
Use case
We have a bunch of table functions to integrate with external databases:
- mysql;
- postgresql;
- mongodb;
- odbc;
- jdbc;
- hive;
- sqlite;
- meilisearch;
- duckdb (under development);
We implement automatic query rewrite and condition pushdown to query these external storages.
Typically, it rewrites a query in the form of
SELECT columns... FROM table WHERE compatible expressions
Sometimes the user might want full control of the query that is being pushed to the external storage.
Describe the solution you'd like
For both table functions and the corresponding table engines, allow providing the query instead of database and table names in the following possible forms:
mysql('address', (SELECT ...), ...)- when the external query can be transparently parsed by ClickHouse;mysql('address', 'SELECT ...', ...)- compatible with any query dialects, but less convenient;
Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.
Additional context
It can be extended to support the S3 SELECT feature by AWS and Yandex Cloud: https://docs.aws.amazon.com/AmazonS3/latest/API/API_SelectObjectContent.html
For variant 1, it's unclear if we should keep the lexer's output and pass it directly to the storage or if we should obtain the AST and pass the reformatted AST. The latter can be better due to support for ClickHouse-style string literal escaping and numeric literals, but the former is more direct.
A similar syntax is already supported for Meilisearch.
It should look consistent with the executable table function and the upcoming feature of HTTP services as UDTFs: #45994