Skip to content

Extending schemas and relative URLs in the URL engine #59617

@alexey-milovidov

Description

@alexey-milovidov
  1. The URL table function should support file, s3, and other schemas available for similar engines.

Examples:

file://data.csv - file path relative to the user_files directory (which is the current directory for clickhouse-local).
file:///home/alexey/data.csv - absolute path.
s3://clickhouse-public-datasets/hits_compatible/hits.csv - an URL with s3 schema; currently works for environment credentials.

Implementation: consider the URL engine and table function as a wrapper on top of other engines.

  1. The URL table function should support relative URLs.

Examples:

data.csv - a path-relative URL.
/test/data.csv - a host-relative URL.
//example.com/test/data.csv - a schema-relative URL.

All of them should be controlled by the setting url_base.
For example, if url_base is https://abc.xyz/def/, the path-relative URL will resolve to https://abc.xyz/def/data.csv, the host-relative URL will resolve to https://abc.xyz/test/data.csv, and the schema-relative URL will resolve to https://example.com/test/data.csv.

The base URL can have a foreign schema like file:// or s3://.

  1. S3-like table functions should support relative URLs.

Similarly to the previous item, it can be controlled by the setting s3_base.

For example, if s3_base is s3://clickhouse-public-datasets/, you can write SELECT * FROM s3('hits_compatible/hits.csv').

  1. Support URL as a database engine (to use inside the Overlay engine).

If it has file://path/to/current/dir/ as its base URL, it will allow handling file, url, and s3 uniformly.

The default database engine in clickhouse-local should be switched to the Overlay with URL (currently it is Overlay with File).

Metadata

Metadata

Assignees

No one assigned

    Labels

    featureusabilitywarmup taskThe task for new ClickHouse team members. Low risk, moderate complexity, no urgency.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions