-
Notifications
You must be signed in to change notification settings - Fork 8.3k
Extending schemas and relative URLs in the URL engine #59617
Description
- The URL table function should support
file,s3, and other schemas available for similar engines.
Examples:
file://data.csv - file path relative to the user_files directory (which is the current directory for clickhouse-local).
file:///home/alexey/data.csv - absolute path.
s3://clickhouse-public-datasets/hits_compatible/hits.csv - an URL with s3 schema; currently works for environment credentials.
Implementation: consider the URL engine and table function as a wrapper on top of other engines.
- The URL table function should support relative URLs.
Examples:
data.csv - a path-relative URL.
/test/data.csv - a host-relative URL.
//example.com/test/data.csv - a schema-relative URL.
All of them should be controlled by the setting url_base.
For example, if url_base is https://abc.xyz/def/, the path-relative URL will resolve to https://abc.xyz/def/data.csv, the host-relative URL will resolve to https://abc.xyz/test/data.csv, and the schema-relative URL will resolve to https://example.com/test/data.csv.
The base URL can have a foreign schema like file:// or s3://.
- S3-like table functions should support relative URLs.
Similarly to the previous item, it can be controlled by the setting s3_base.
For example, if s3_base is s3://clickhouse-public-datasets/, you can write SELECT * FROM s3('hits_compatible/hits.csv').
- Support
URLas a database engine (to use inside theOverlayengine).
If it has file://path/to/current/dir/ as its base URL, it will allow handling file, url, and s3 uniformly.
The default database engine in clickhouse-local should be switched to the Overlay with URL (currently it is Overlay with File).