Skip to content

Fix profiling not recording data for remote SQL and metadata queries#530

Merged
auxten merged 6 commits into
mainfrom
fix/profiling-remote-sql-coverage
Mar 5, 2026
Merged

Fix profiling not recording data for remote SQL and metadata queries#530
auxten merged 6 commits into
mainfrom
fix/profiling-remote-sql-coverage

Conversation

@auxten

@auxten auxten commented Mar 5, 2026

Copy link
Copy Markdown
Member

Summary

  • _remote_sql() called chdb.query() directly, bypassing the profiler entirely. Users calling session.sql("SELECT ...") on remote ClickHouse connections got "No profiling data recorded" from profiler.report().
  • _execute_metadata_query() had the same issue — databases(), tables(), and _remote_describe() were invisible to the profiler.
  • Added profiler.step() with timing and SQL preview metadata to both methods, consistent with the existing pattern in Connection.execute() / Connection.query_df().
  • Added 12 unit tests covering all profiling paths: local execution, remote SQL, metadata queries, disabled profiling, and the exact user-reported scenario.

Test plan

  • 12 new tests in datastore/tests/test_profiling.py — all passing
  • Remote SQL and metadata paths use unittest.mock.patch("chdb.query") so no real server needed
  • Verified profiling steps include correct metadata (sql, time_ms)
  • Verified profiling records nothing when disable_profiling() is active

auxten added 6 commits March 5, 2026 14:57
_remote_sql() and _execute_metadata_query() called chdb.query() directly,
bypassing the profiler. This caused profiler.report() to return
"No profiling data recorded" when users used session.sql() on remote
connections or called databases()/tables().
UNION ALL does not guarantee row order in ClickHouse. The bare
ORDER BY after UNION ALL only applied to the last SELECT, not the
full result. Wrap all UNION ALL queries in a subquery so ORDER BY
applies to the combined result.
- Rename test methods to reflect exactly what is verified
- Verify specific step names and exact counts instead of just len() > 0
- Validate time_ms is non-negative and sql preview is non-empty
- Add missing _remote_describe() profiling coverage
- Add len(profiler.steps) == 0 assertion for disabled state
- Add descriptive error messages to all assertions
Column assignment is an order-preserving operation — adding sort_values
in tests masks potential row-ordering bugs in DataStore. The fixture
already provides ordered data via ORDER BY, so tests should verify
DataStore preserves that order without extra sorting.

Also verify actual values in test_json_array_extraction_in_pure_chdb
instead of only checking len().
TestRemoteClickHouseIntegration was hitting the public demo server
(sql-clickhouse.clickhouse.com) which has a 200 queries/hour quota.
CI runs across multiple platforms easily exceed this, causing flaky
failures unrelated to code changes.

Switch to the local ClickHouse test server managed by conftest.py
(clickhouse_server fixture), which is already used by
test_remote_connection_integration.py.
@auxten auxten merged commit 1ef7e2a into main Mar 5, 2026
6 of 7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant