Skip to content

Add chdb vs pandas peak-memory benchmark#557

Merged
auxten merged 1 commit into
chdb-io:mainfrom
wudidapaopao:add_benchmark_tests
Apr 1, 2026
Merged

Add chdb vs pandas peak-memory benchmark#557
auxten merged 1 commit into
chdb-io:mainfrom
wudidapaopao:add_benchmark_tests

Conversation

@wudidapaopao

Copy link
Copy Markdown
Contributor

Self-contained benchmark that auto-generates test data (default 10M rows) and compares chdb SQL-pushdown vs pandas across 10 scenarios including filter, groupby, join, window functions, quantiles, and time-series. Measures peak memory via VmHWM (Linux) or ru_maxrss (macOS) in isolated subprocesses.

Changelog category (leave one):

  • Not for changelog (changelog entry is not required)

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

...

Documentation entry for user-facing changes

  • Documentation is written (mandatory for new features)

Information about CI checks: https://clickhouse.com/docs/en/development/continuous-integration/

CI Settings

NOTE: If your merge the PR with modified CI you MUST KNOW what you are doing
NOTE: Checked options will be applied if set before CI RunConfig/PrepareRunConfig step

Run these jobs only (required builds will be added automatically):

  • Integration Tests
  • Stateless tests
  • Stateful tests
  • Unit tests
  • Performance tests
  • All with aarch64
  • All with ASAN
  • All with TSAN
  • All with Analyzer
  • All with Azure
  • Add your option here

Deny these jobs:

  • Fast test
  • Integration Tests
  • Stateless tests
  • Stateful tests
  • Performance tests
  • All with ASAN
  • All with TSAN
  • All with MSAN
  • All with UBSAN
  • All with Coverage
  • All with Aarch64

Extra options:

  • do not test (only style check)
  • disable merge-commit (no merge from master before tests)
  • disable CI cache (job reuse)

Only specified batches in multi-batch jobs:

  • 1
  • 2
  • 3
  • 4

Self-contained benchmark that auto-generates test data (default 10M rows)
and compares chdb SQL-pushdown vs pandas across 10 scenarios including
filter, groupby, join, window functions, quantiles, and time-series.
Measures peak memory via VmHWM (Linux) or ru_maxrss (macOS) in isolated
subprocesses.
@auxten auxten merged commit 2e61cc5 into chdb-io:main Apr 1, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants