Repository: chdb-ds (datastore package)
File: datastore/table_functions.py, class UrlTableFunction, method to_sql() (~line 244)
Severity: LOW — workaround available via raw SQL
Environment: Python 3.13, macOS arm64, pip install chdb==4.1.6
Pull Request: chdb-io/chdb#563
Description
UrlTableFunction.to_sql() appends the headers() clause outside the url() function call, producing invalid ClickHouse SQL that raises a SYNTAX_ERROR at query time.
-- Actual (broken):
url('https://example.com/data.parquet', 'Parquet') HEADERS('User-Agent: Mozilla/5.0')
-- Expected (per ClickHouse docs):
url('https://example.com/data.parquet', 'Parquet', 'auto', headers('User-Agent'='Mozilla/5.0'))
The ClickHouse url() table function accepts headers as the 4th positional argument inside the function call:
url(URL [,format] [,structure] [,headers])
Reference: https://clickhouse.com/docs/sql-reference/table-functions/url
Reproduction
from chdb.datastore import DataStore
ds = DataStore.from_url(
"https://example.com/data.parquet",
format="Parquet",
headers=["User-Agent: Mozilla/5.0"],
)
print(ds._table_function.to_sql())
# Actual: url('https://example.com/data.parquet', 'Parquet') HEADERS('User-Agent: Mozilla/5.0')
# Expected: url('https://example.com/data.parquet', 'Parquet', 'auto', headers('User-Agent'='Mozilla/5.0'))
chdb parse error confirming the SQL is rejected:
Code: 62. DB::Exception: Syntax error: failed at position 74 ('User-Agent: Mozilla/5.0'):
'User-Agent: Mozilla/5.0'). Expected one of: list of aliases expressions, list of elements,
identifier. (SYNTAX_ERROR)
Impact
DataStore.from_url(headers=...) is silently broken — the generated SQL is rejected by ClickHouse with a SYNTAX_ERROR. This blocks the documented API path for passing custom HTTP headers (e.g., User-Agent, Authorization) when fetching remote Parquet/CSV files via url().
Root Cause
In UrlTableFunction.to_sql() (table_functions.py ~line 244), the headers parameter is appended after the closing ) of url() instead of being passed as the 4th positional argument inside it:
# Current (broken):
sql = f"url({', '.join(sql_params)})"
if headers:
headers_sql = ...
sql += f" HEADERS({headers_sql})" # appended OUTSIDE url()
The desired output shape is:
url('https://...', 'Parquet', 'auto', headers('User-Agent'='Mozilla/5.0'))
Where headers('key'='value', ...) is the 4th positional argument, and the structure argument (3rd) should be 'auto' (not empty string — chdb raises Code: 36 Table structure is empty if structure is '' when headers are present).
Workaround
Use raw SQL via Connection.execute() directly:
from datastore.connection import Connection
conn = Connection(database=db_path)
conn.connect()
sql = (
"SELECT * FROM url("
" 'https://example.com/data.parquet',"
" 'Parquet',"
" 'auto',"
" headers('User-Agent'='Mozilla/5.0 (compatible; MyApp/1.0)')"
")"
)
df = conn.execute(sql, output_format="Dataframe").to_df()
conn.close()
Related
- ClickHouse issue #34239: "forbidden requests due to missing user-agent header in URL engine" — motivating use case for custom headers support
- ClickHouse issue #37897: "Add an ability to pass headers to url table function" — the feature request that introduced
headers() as the 4th positional arg in ClickHouse's url() function
Verification
chdb version: 4.1.6
datastore package version: 4.1.6 (bundled)
Python: 3.13
Platform: macOS arm64
Captured UrlTableFunction SQL (broken):
url('https://example.com/data.parquet', 'Parquet') HEADERS('User-Agent: Mozilla/5.0')
Captured DataStore.from_url SQL (broken, same bug):
url('https://example.com/data.parquet', 'Parquet') HEADERS('User-Agent: Mozilla/5.0')
chdb parse error on broken SQL:
Code: 62. DB::Exception: Syntax error: failed at position 74 (SYNTAX_ERROR)
chdb default User-Agent (captured via httpbin.org echo):
ClickHouse/26.1.2.1
Verified at: 2026-04-24T08:00Z
Repository: chdb-ds (datastore package)
File:
datastore/table_functions.py, classUrlTableFunction, methodto_sql()(~line 244)Severity: LOW — workaround available via raw SQL
Environment: Python 3.13, macOS arm64,
pip install chdb==4.1.6Pull Request: chdb-io/chdb#563
Description
UrlTableFunction.to_sql()appends theheaders()clause outside theurl()function call, producing invalid ClickHouse SQL that raises aSYNTAX_ERRORat query time.The ClickHouse
url()table function accepts headers as the 4th positional argument inside the function call:Reference: https://clickhouse.com/docs/sql-reference/table-functions/url
Reproduction
chdb parse error confirming the SQL is rejected:
Impact
DataStore.from_url(headers=...)is silently broken — the generated SQL is rejected by ClickHouse with aSYNTAX_ERROR. This blocks the documented API path for passing custom HTTP headers (e.g.,User-Agent,Authorization) when fetching remote Parquet/CSV files viaurl().Root Cause
In
UrlTableFunction.to_sql()(table_functions.py ~line 244), theheadersparameter is appended after the closing)ofurl()instead of being passed as the 4th positional argument inside it:The desired output shape is:
Where
headers('key'='value', ...)is the 4th positional argument, and the structure argument (3rd) should be'auto'(not empty string — chdb raisesCode: 36 Table structure is emptyif structure is''when headers are present).Workaround
Use raw SQL via
Connection.execute()directly:Related
headers()as the 4th positional arg in ClickHouse'surl()functionVerification