Skip to content

feat(chdb): implement chdb - the embedded clickhouse - backend#8497

Closed
kszucs wants to merge 11 commits into
ibis-project:mainfrom
kszucs:chdb-backend
Closed

feat(chdb): implement chdb - the embedded clickhouse - backend#8497
kszucs wants to merge 11 commits into
ibis-project:mainfrom
kszucs:chdb-backend

Conversation

@kszucs

@kszucs kszucs commented Feb 29, 2024

Copy link
Copy Markdown
Member

A big chunk of the backends testing suite is already passing, but there is still work to do:

  1. Support passing in memory tables, this can be done using the same "pass as a file" trick used in chdb.dataframe
  2. Better error handling
  3. Pass the right dialect everywhere

Test suite is currently at 247 failed, 1067 passed, 28 skipped, 29854 deselected, 131 xfailed, 6 errors

Comment thread ibis/backends/chdb/__init__.py Outdated

"""
with contextlib.suppress(AttributeError):
query = query.sql(dialect="clickhouse", pretty=True)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
query = query.sql(dialect="clickhouse", pretty=True)
query = query.sql(dialect=self.dialect, pretty=True)

Comment thread ibis/backends/chdb/__init__.py Outdated
Comment on lines +97 to +98
names = table.column("name").to_pylist()
types = table.column("type").to_pylist()

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
names = table.column("name").to_pylist()
types = table.column("type").to_pylist()
names = table["name"].to_pylist()
types = table["type"].to_pylist()

Comment thread ibis/backends/chdb/__init__.py Outdated
if external_tables:
raise NotImplementedError("External tables are not yet supported")

df = self.con.query(sql, "arrowtable").to_pandas()

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
df = self.con.query(sql, "arrowtable").to_pandas()
df = self.con.query(sql, fmt="arrowtable").to_pandas()

Comment thread ibis/backends/chdb/__init__.py Outdated
if external_tables:
raise NotImplementedError("External tables are not yet supported")

table = self.con.query(sql, "arrowtable")

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
table = self.con.query(sql, "arrowtable")
table = self.con.query(sql, fmt="arrowtable")

Comment thread ibis/backends/chdb/tests/conftest.py Outdated
Comment on lines +34 to +35
# [(value,)] = self.connection.con.query("SELECT true").result_set
# return isinstance(value, bool)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# [(value,)] = self.connection.con.query("SELECT true").result_set
# return isinstance(value, bool)

We definitely will not be supporting versions of chdb that don't support native booleans.

def ddl_script(self) -> Iterable[str]:
parquet_dir = self.data_dir / "parquet"
for sql in super().ddl_script:
yield sql.format(parquet_dir=parquet_dir)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather not introduce templating if we can avoid it. How about hard coding the path to ci/ibis-testing-data/... like we do with everything else?

"""
con = self.connection.con

con.query(f"CREATE DATABASE {database} ENGINE = Atomic")

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this work with our process-based parallel testing setup (i.e., pytest-xdist)?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not there yet :)

Comment thread ibis/config.py
default_backend: Optional[Any] = None
context_adjustment: ContextAdjustment = ContextAdjustment()
sql: SQL = SQL()
chdb: Optional[Config] = None

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need this?

@kszucs

kszucs commented May 15, 2024

Copy link
Copy Markdown
Member Author

I won't have time to keep working on this in the near future, so closing it.

@kszucs kszucs closed this May 15, 2024
@deepyaman deepyaman reopened this Jul 8, 2024
@zhenzhongxu

Copy link
Copy Markdown
Contributor

@auxten we've reopened the PR based on your feedback.
@kszucs, we'd like to move the Ibis/chDB integration forward. I'm wondering if you have any useful context to whoever will pick this up.

@jcrist

jcrist commented Aug 12, 2024

Copy link
Copy Markdown
Member

Going to reclose this for now. Can always reopen if we want to continue the work, but no need to keep it open if we're not hacking on it proactively.

@jcrist jcrist closed this Aug 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants