Skip to content

chdb-io/chdb-core

ย 
ย 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

3,373 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Build X86 PyPI Downloads Discord Twitter

chdb-core

chdb-core is the foundational engine of the chDB ecosystem โ€” an in-process SQL OLAP Engine powered by ClickHouse 1

Table of Contents


chDB Ecosystem

The chDB project is split into two packages:

Package Role Install
chdb-core (this repo) C++ engine + Session / Connection / DB-API interfaces pip install chdb-core
chDB Pandas-compatible DataStore API built on top of chdb-core pip install chdb
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚          chDB (pip install chdb)          โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”‚
โ”‚  โ”‚  DataStore: pandas-like lazy API    โ”‚  โ”‚
โ”‚  โ”‚  QueryPlanner / dual-engine exec    โ”‚  โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ”‚
โ”‚                     โ”‚                     โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”‚
โ”‚  โ”‚  chdb-core (pip install chdb-core)  โ”‚  โ”‚
โ”‚  โ”‚  C++ ClickHouse Engine              โ”‚  โ”‚
โ”‚  โ”‚  Session / Connection / DB-API      โ”‚  โ”‚
โ”‚  โ”‚  query() / UDF / Stream             โ”‚  โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

chdb-core provides everything you need to run SQL queries with ClickHouse performance โ€” no server required. If you want a higher-level pandas-compatible DataFrame API, install chDB instead.


Features

  • In-process SQL OLAP Engine, powered by ClickHouse
  • No need to install ClickHouse
  • Minimized data copy from C++ to Python with python memoryview
  • Input & Output support Parquet, CSV, JSON, Arrow, ORC and 60+ more formats
  • Session and Connection management with stateful queries
  • Streaming query support for constant-memory processing
  • Python DB-API 2.0 compliance
  • User Defined Functions (UDF) support
  • AI-assisted SQL generation

Architecture


Installation

Currently, chdb-core supports Python 3.9+ on macOS and Linux (x86_64 and ARM64).

pip install chdb-core

Quick Start

import chdb

result = chdb.query("SELECT version()", "Pretty")
print(result)

API Reference

One-shot Query

The simplest way to run SQL โ€” no session or connection needed:

import chdb

# Basic query with CSV output (default)
result = chdb.query("SELECT 1, 'hello'")
print(result)

# Pandas DataFrame output
df = chdb.query("SELECT number, number * 2 AS double FROM numbers(10)", "DataFrame")
print(df)

# Parameterized queries
df = chdb.query(
    "SELECT toDate({base_date:String}) + number AS date "
    "FROM numbers({total_days:UInt64}) "
    "LIMIT {items_per_page:UInt64}",
    "DataFrame",
    params={"base_date": "2025-01-01", "total_days": 10, "items_per_page": 5},
)
print(df)
Query on Files (Parquet, CSV, JSON, Arrow, ORC and 60+)
import chdb

res = chdb.query('SELECT * FROM file("data.parquet", Parquet)', "JSON")
print(res)

res = chdb.query('SELECT * FROM file("data.csv", CSV)', "CSV")
print(res)

# Query result statistics
print(f"SQL read {res.rows_read()} rows, {res.bytes_read()} bytes, "
      f"storage read {res.storage_rows_read()} rows, {res.storage_bytes_read()} bytes, "
      f"elapsed {res.elapsed()} seconds")

# Pandas DataFrame output
chdb.query('SELECT * FROM file("data.parquet", Parquet)', "Dataframe")
Connection API

Connection-based API for cursor-style interaction, supporting both in-memory and file-based databases:

import chdb

conn = chdb.connect(":memory:")
cur = conn.cursor()

cur.execute("CREATE TABLE test (id UInt32, name String) ENGINE = Memory")
cur.execute("INSERT INTO test VALUES (1, 'Alice'), (2, 'Bob'), (3, 'Charlie')")
cur.execute("SELECT * FROM test ORDER BY id")

print(cur.fetchone())      # (1, 'Alice')
print(cur.fetchmany(2))    # ((2, 'Bob'), (3, 'Charlie'))
print(cur.column_names())  # ['id', 'name']
print(cur.column_types())  # ['UInt32', 'String']

# Use the cursor as an iterator
cur.execute("SELECT number FROM system.numbers LIMIT 3")
for row in cur:
    print(row)

# Always close resources when done
cur.close()
conn.close()

For more details, see examples/connect.py.

# File-based persistent database
conn = chdb.connect("mydata.db")
conn.query("CREATE TABLE IF NOT EXISTS logs (ts DateTime, msg String) ENGINE = MergeTree ORDER BY ts")
conn.query("INSERT INTO logs VALUES (now(), 'started')")
result = conn.query("SELECT * FROM logs", "Pretty")
print(result)
conn.close()
Stateful Session

Sessions provide a higher-level API with automatic resource management:

from chdb import session as chs

sess = chs.Session()
sess.query("CREATE DATABASE IF NOT EXISTS db_xxx ENGINE = Atomic")
sess.query("CREATE TABLE IF NOT EXISTS db_xxx.log_table (x String, y Int) ENGINE = Log")
sess.query("INSERT INTO db_xxx.log_table VALUES ('a', 1), ('b', 3), ('c', 2), ('d', 5)")
sess.query("CREATE VIEW db_xxx.view_xxx AS SELECT * FROM db_xxx.log_table LIMIT 4")

print(sess.query("SELECT * FROM db_xxx.view_xxx", "Pretty"))

see also: test_stateful.py.

Streaming Query

Process large datasets with constant memory usage through chunked streaming:

from chdb import session as chs

sess = chs.Session()

rows_cnt = 0
with sess.send_query("SELECT * FROM numbers(200000)", "CSV") as stream_result:
    for chunk in stream_result:
        rows_cnt += chunk.rows_read()

print(rows_cnt) # 200000

# Example 2: Manual iteration with fetch()
rows_cnt = 0
stream_result = sess.send_query("SELECT * FROM numbers(200000)", "CSV")
while True:
    chunk = stream_result.fetch()
    if chunk is None:
        break
    rows_cnt += chunk.rows_read()

print(rows_cnt) # 200000

For more details, see test_streaming_query.py.

Python DB-API 2.0
import chdb.dbapi as dbapi
print("chdb driver version: {0}".format(dbapi.get_client_info()))

conn1 = dbapi.connect()
cur1 = conn1.cursor()
cur1.execute('select version()')
print("description: ", cur1.description)
print("data: ", cur1.fetchone())
cur1.close()
conn1.close()
Query on Table (Pandas DataFrame, Parquet file/bytes, Arrow bytes)
import chdb.dataframe as cdf
import pandas as pd
# Join 2 DataFrames
df1 = pd.DataFrame({'a': [1, 2, 3], 'b': ["one", "two", "three"]})
df2 = pd.DataFrame({'c': [1, 2, 3], 'd': ["โ‘ ", "โ‘ก", "โ‘ข"]})
ret_tbl = cdf.query(sql="select * from __tbl1__ t1 join __tbl2__ t2 on t1.a = t2.c",
                  tbl1=df1, tbl2=df2)
print(ret_tbl)
# Query on the DataFrame Table
print(ret_tbl.query('select b, sum(a) from __table__ group by b'))
# Pandas DataFrames are automatically registered as temporary tables in ClickHouse
chdb.query("SELECT * FROM Python(df1) t1 JOIN Python(df2) t2 ON t1.a = t2.c").show()
Python Table Engine

Query on Pandas DataFrame

import chdb
import pandas as pd
df = pd.DataFrame(
    {
        "a": [1, 2, 3, 4, 5, 6],
        "b": ["tom", "jerry", "auxten", "tom", "jerry", "auxten"],
    }
)

chdb.query("SELECT b, sum(a) FROM Python(df) GROUP BY b ORDER BY b").show()

Query on Arrow Table

import chdb
import pyarrow as pa
arrow_table = pa.table(
    {
        "a": [1, 2, 3, 4, 5, 6],
        "b": ["tom", "jerry", "auxten", "tom", "jerry", "auxten"],
    }
)

chdb.query("SELECT b, sum(a) FROM Python(arrow_table) GROUP BY b ORDER BY b").show()

see also: test_query_py.py.

User Defined Functions (UDF)

chDB supports native Python UDFs that run in-process with full type safety.

import chdb
from chdb.session import Session
from chdb.sqltypes import INT64, STRING

sess = Session()

# Using the @chdb.func decorator
@chdb.func([INT64, INT64], INT64)
def add(a, b):
    return a + b

print(sess.query("SELECT add(12, 22)"))

# With type annotations (types inferred automatically)
@chdb.func()
def multiply(a: int, b: int) -> int:
    return a * b

print(sess.query("SELECT multiply(3, 7)"))

# Using chdb.create_function directly
chdb.create_function("strlen", len, arg_types=[STRING], return_type=INT64)
print(sess.query("SELECT strlen('hello')"))

# Remove a registered function
chdb.drop_function("strlen")

Key features:

  • Type-safe: supports INT64, FLOAT64, STRING, BOOL, DATETIME64, etc. See chdb.sqltypes for full list.
  • Type inference: automatically inferred from Python annotations (int, str, bool, etc.)
  • NULL handling: on_null=NullHandling.SKIP (default) skips the function call and returns NULL; NullHandling.PASS passes None to the function.
  • Exception handling: on_error=ExceptionHandling.PROPAGATE (default) raises the error to the caller; ExceptionHandling.IGNORE returns NULL for that row and continues.

See also: test_func_udf.py, test_func_udf_types.py.

Query Progress
import chdb

# Auto-detect: TTY progress in terminal, progress bar in notebook
conn = chdb.connect(":memory:?progress=auto")
conn.query("SELECT sum(number) FROM numbers_mt(1e10) GROUP BY number % 10 SETTINGS max_threads=4")

Progress options: progress=auto | progress=tty | progress=err | progress=off

AI-assisted SQL Generation
import chdb

conn = chdb.connect("file::memory:?ai_provider=openai&ai_model=gpt-4o-mini")
conn.query("CREATE TABLE nums (n UInt32) ENGINE = Memory")
conn.query("INSERT INTO nums VALUES (1), (2), (3)")

sql = conn.generate_sql("Select all rows from nums ordered by n desc")
print(sql)  # SELECT * FROM nums ORDER BY n DESC

print(conn.ask("List the numbers table", format="Pretty"))
Command Line

python3 -m chdb SQL [OutputFormat]

python3 -m chdb "SELECT 1,'abc'" Pretty

For more examples, see examples and tests.


Demos and Examples


Benchmark

---

Documentation


AI Coding Agent Skill

chDB provides an AI Skill that teaches AI coding agents (Cursor, Claude Code, etc.) chDB's multi-source data analytics API. Install it so your AI assistant can write correct chDB code out of the box:

curl -sL https://raw.githubusercontent.com/chdb-io/chdb/main/install_skill.sh | bash

Events


Contributing

Contributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.

  • Help test and report bugs
  • Help improve documentation
  • Help improve code quality and performance

Bindings

We welcome bindings for other languages, please refer to bindings for more details.


Version Guide

Please refer to VERSION-GUIDE.md for more details.


Paper


License

Apache 2.0, see LICENSE for more information.


Acknowledgments

chDB is mainly based on ClickHouse 1 for trade mark and other reasons, I named it chDB.


Contact


Footnotes

  1. ClickHouseยฎ is a trademark of ClickHouse Inc. All trademarks, service marks, and logos mentioned or depicted are the property of their respective owners. The use of any third-party trademarks, brand names, product names, and company names does not imply endorsement, affiliation, or association with the respective owners. โ†ฉ โ†ฉ2

About

chDB is an in-process OLAP SQL Engine ๐Ÿš€ powered by ClickHouse

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages

  • C++ 77.0%
  • Python 9.0%
  • Assembly 7.9%
  • C 2.8%
  • CMake 1.3%
  • HTML 0.6%
  • Other 1.4%