Skip to content

chdb-ds: UrlTableFunction.to_sql() emits HEADERS outside url() call #564

@nklmish

Description

@nklmish

Repository: chdb-ds (datastore package)
File: datastore/table_functions.py, class UrlTableFunction, method to_sql() (~line 244)
Severity: LOW — workaround available via raw SQL
Environment: Python 3.13, macOS arm64, pip install chdb==4.1.6
Pull Request: chdb-io/chdb#563

Description

UrlTableFunction.to_sql() appends the headers() clause outside the url() function call, producing invalid ClickHouse SQL that raises a SYNTAX_ERROR at query time.

-- Actual (broken):
url('https://example.com/data.parquet', 'Parquet') HEADERS('User-Agent: Mozilla/5.0')

-- Expected (per ClickHouse docs):
url('https://example.com/data.parquet', 'Parquet', 'auto', headers('User-Agent'='Mozilla/5.0'))

The ClickHouse url() table function accepts headers as the 4th positional argument inside the function call:

url(URL [,format] [,structure] [,headers])

Reference: https://clickhouse.com/docs/sql-reference/table-functions/url

Reproduction

from chdb.datastore import DataStore

ds = DataStore.from_url(
    "https://example.com/data.parquet",
    format="Parquet",
    headers=["User-Agent: Mozilla/5.0"],
)
print(ds._table_function.to_sql())
# Actual:   url('https://example.com/data.parquet', 'Parquet') HEADERS('User-Agent: Mozilla/5.0')
# Expected: url('https://example.com/data.parquet', 'Parquet', 'auto', headers('User-Agent'='Mozilla/5.0'))

chdb parse error confirming the SQL is rejected:

Code: 62. DB::Exception: Syntax error: failed at position 74 ('User-Agent: Mozilla/5.0'):
'User-Agent: Mozilla/5.0'). Expected one of: list of aliases expressions, list of elements,
identifier. (SYNTAX_ERROR)

Impact

DataStore.from_url(headers=...) is silently broken — the generated SQL is rejected by ClickHouse with a SYNTAX_ERROR. This blocks the documented API path for passing custom HTTP headers (e.g., User-Agent, Authorization) when fetching remote Parquet/CSV files via url().

Root Cause

In UrlTableFunction.to_sql() (table_functions.py ~line 244), the headers parameter is appended after the closing ) of url() instead of being passed as the 4th positional argument inside it:

# Current (broken):
sql = f"url({', '.join(sql_params)})"
if headers:
    headers_sql = ...
    sql += f" HEADERS({headers_sql})"  # appended OUTSIDE url()

The desired output shape is:

url('https://...', 'Parquet', 'auto', headers('User-Agent'='Mozilla/5.0'))

Where headers('key'='value', ...) is the 4th positional argument, and the structure argument (3rd) should be 'auto' (not empty string — chdb raises Code: 36 Table structure is empty if structure is '' when headers are present).

Workaround

Use raw SQL via Connection.execute() directly:

from datastore.connection import Connection

conn = Connection(database=db_path)
conn.connect()
sql = (
    "SELECT * FROM url("
    "  'https://example.com/data.parquet',"
    "  'Parquet',"
    "  'auto',"
    "  headers('User-Agent'='Mozilla/5.0 (compatible; MyApp/1.0)')"
    ")"
)
df = conn.execute(sql, output_format="Dataframe").to_df()
conn.close()

Related

  • ClickHouse issue #34239: "forbidden requests due to missing user-agent header in URL engine" — motivating use case for custom headers support
  • ClickHouse issue #37897: "Add an ability to pass headers to url table function" — the feature request that introduced headers() as the 4th positional arg in ClickHouse's url() function

Verification

chdb version:              4.1.6
datastore package version: 4.1.6 (bundled)
Python:                    3.13
Platform:                  macOS arm64

Captured UrlTableFunction SQL (broken):
  url('https://example.com/data.parquet', 'Parquet') HEADERS('User-Agent: Mozilla/5.0')

Captured DataStore.from_url SQL (broken, same bug):
  url('https://example.com/data.parquet', 'Parquet') HEADERS('User-Agent: Mozilla/5.0')

chdb parse error on broken SQL:
  Code: 62. DB::Exception: Syntax error: failed at position 74 (SYNTAX_ERROR)

chdb default User-Agent (captured via httpbin.org echo):
  ClickHouse/26.1.2.1

Verified at: 2026-04-24T08:00Z

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions