Optimize tupleElement(dictGet(..., tuple_of_attrs), N) into single-attribute dictGet#100186
Optimize tupleElement(dictGet(..., tuple_of_attrs), N) into single-attribute dictGet#100186alexey-milovidov wants to merge 16 commits intomasterfrom
Conversation
…ngle-attribute `dictGet`
When `dictGet` or `dictGetOrDefault` is called with a tuple of attribute names
and the result is immediately indexed with `tupleElement` (positional like `.1`
or named like `.country`), rewrite to fetch only the needed attribute.
This avoids fetching unnecessary dictionary attributes, improving performance
for queries like:
`dictGet('dict', ('country', 'city', 'population'), key).1`
which becomes:
`dictGet('dict', 'country', key)`
Controlled by the `optimize_dictget_tuple_element` setting (default: true).
Closes: #50167
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Workflow [PR], commit [f011c8d] Summary: ❌
AI ReviewSummaryThis PR adds an analyzer optimization that rewrites Findings
ClickHouse Rules
Final Verdict
|
| size_t element_index = *maybe_index; | ||
|
|
||
| /// Replace the tuple of attribute names with a single attribute name | ||
| dict_get_args[1] = std::make_shared<ConstantNode>(attribute_names[element_index]); |
There was a problem hiding this comment.
❌ This mutates dict_get_args[1] before the optimization is guaranteed to succeed.
In the dictGetOrDefault branch below, we can return early when the default tuple is shorter than the selected element (element_index >= default_tuple.size()). In that case, the function exits with a partially rewritten tree: attribute list already changed to a single string, but tupleElement(...) wrapper is still present and dictGet is not re-resolved.
That can leave the query tree inconsistent and may change diagnostics/behavior for invalid queries.
Please defer mutation until all preconditions are validated (or roll back on failure). For example: compute new_attr_arg and new_default_arg first, verify both, then apply both mutations and call resolveOrdinaryFunctionNodeByName.
…ry.cpp` and make `dictGetOrDefault` rewrite atomic - Added the new setting to the 26.4 settings changes history to fix the 02995_new_settings_history test. - Refactored the `dictGetOrDefault` path to stage all replacements before mutating the tree, preventing partial rewrites on early return. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
|
||
| -- Test with dictGetOrDefault | ||
| SELECT 'dictGetOrDefault'; | ||
| SELECT dictGetOrDefault('default.test_dict', ('country', 'city'), id, ('Unknown', 'Unknown')).1 FROM test_keys ORDER BY id; |
There was a problem hiding this comment.
dictGetOrDefault coverage here only uses existing keys, so the default branch is never exercised. That means this test does not validate the rewritten default-value path in DictGetTupleElementPass (the path that slices tuple defaults to a single element).
Please add at least one missing-key case (for example id = 999) and assert both positional and/or named access return the expected default element after optimization.
| /// Now apply all mutations atomically | ||
| dict_get_args[1] = std::move(new_attr_arg); | ||
|
|
||
| if (is_dict_get_or_default && new_default_arg) |
There was a problem hiding this comment.
❌ This rewrite can still apply dictGetOrDefault without rewriting the default argument, which may break valid queries.
When default_arg is neither a constant tuple nor a tuple(...) function (for example, an arbitrary tuple-typed expression), new_default_arg stays null, but the code still rewrites attr_names to a single string and replaces the whole tupleElement(...) with dictGetOrDefault(...).
That produces a mismatched call shape (dictGetOrDefault(dict, 'attr', key, <tuple_default>)) and can introduce a new exception during analysis for queries that were previously valid.
Please bail out unless both pieces are rewritable for dictGetOrDefault:
- single attribute name, and
- matching single-element default expression (or an equivalent
tupleElement(default_arg, idx)rewrite).
… be extracted Previously, when `dictGetOrDefault` was called with a default argument that was neither a constant tuple nor a `tuple()` function, the optimization would still rewrite the attribute names to a single string while leaving the original default argument untouched. This produced a mismatched call signature that could break valid queries. Now the pass requires both the attribute name and the default value to be rewritable before applying any mutations. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… add missing-key coverage - Replace hardcoded `'default.test_dict'` with `currentDatabase() || '.test_dict'` so the test works in CI where each test runs in its own database. - Replace full EXPLAIN QUERY TREE output matching with predicate checks (presence/absence of `tupleElement` in the query tree) to avoid database-name-dependent reference output. - Add `dictGetOrDefault` tests with missing keys (id=999) to exercise the default value rewrite path, as requested in review. - Add test for `dictGetOrDefault` with `tuple()` function as default. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The `DictGetTupleElementPass` is an Analyzer pass and `EXPLAIN QUERY TREE` also requires the analyzer. Add `SET enable_analyzer = 1` at the top of the test to fix failures in the "old analyzer" CI configuration. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
| -- Test dictGetOrDefault with tuple() function as default | ||
| SELECT 'dictGetOrDefault with tuple function default'; | ||
| SELECT dictGetOrDefault(currentDatabase() || '.test_dict', ('country', 'city'), toUInt64(999), tuple('FuncCountry', 'FuncCity')).1; | ||
| SELECT dictGetOrDefault(currentDatabase() || '.test_dict', ('country', 'city'), toUInt64(999), tuple('FuncCountry', 'FuncCity')).2; |
There was a problem hiding this comment.
dictGetOrDefault default-path rewrite for positional access, but the pass also handles named tupleElement access (.country → string-index form). Please add one missing-key case for named access, for example:
SELECT dictGetOrDefault(currentDatabase() || '.test_dict', ('country', 'city'), toUInt64(999), ('DefaultCountry', 'DefaultCity')).country;This ensures both default rewrite branches are covered: positional and named.
…parents The AST fuzzer found a segfault triggered by queries like `SELECT dictGet(...).1 FROM t ORDER BY ALL` where `ORDER BY ALL` causes the `tupleElement` node to be shared between the SELECT list and the ORDER BY clause. Using `std::move` to extract the inner `dictGet` node left a null child pointer in the `tupleElement` node, crashing the second visit. Use copy instead of move so the node remains valid for all parents. Also add named-access test coverage for `dictGetOrDefault` with missing keys, as requested in review. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
|
||
| -- Test dictGetOrDefault with tuple() function as default | ||
| SELECT 'dictGetOrDefault with tuple function default'; | ||
| SELECT dictGetOrDefault(currentDatabase() || '.test_dict', ('country', 'city'), toUInt64(999), tuple('FuncCountry', 'FuncCity')).1; |
There was a problem hiding this comment.
tuple(...) defaults, but we still miss a regression for the non-rewritable dictGetOrDefault default-value path.
DictGetTupleElementPass now correctly bails out when the default argument is not a constant tuple / tuple(...). Please add one case where the default tuple comes from an alias/expression (for example WITH ('DefaultCountry','DefaultCity') AS d ... dictGetOrDefault(..., d).1) and assert behavior stays correct.
Without this, we don't lock in the fix for the prior mismatch bug on non-rewritable defaults.
Address review feedback: add a test case where the default argument is a non-constant expression (`materialize(...)`) to verify the pass correctly bails out without breaking the query. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
| DECLARE(Bool, optimize_rewrite_array_exists_to_has, false, R"( | ||
| Rewrite arrayExists() functions to has() when logically equivalent. For example, arrayExists(x -> x = 1, arr) can be rewritten to has(arr, 1) | ||
| )", 0) \ | ||
| DECLARE(Bool, optimize_dictget_tuple_element, true, R"( |
There was a problem hiding this comment.
This PR introduces a new user-facing setting optimize_dictget_tuple_element in Settings.cpp, but the PR template still has Documentation is written unchecked and there is no docs update in this change set.
For ClickHouse, new settings should be documented so users can discover behavior, defaults, and compatibility implications. Please add docs for this setting (and mark the PR template checkbox accordingly), or explicitly justify why docs are unnecessary.
LLVM Coverage Report
Changed lines: 76.47% (117/153) · Uncovered code |
When
dictGetordictGetOrDefaultis called with a tuple of attribute names and the result is immediately indexed withtupleElement(positional like.1or named like.country), rewrite to fetch only the needed attribute. This avoids fetching unnecessary dictionary attributes.Example:
Controlled by the
optimize_dictget_tuple_elementsetting (default:true).Closes #50167
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):
Added a query optimization that rewrites
tupleElement(dictGet('dict', ('a', 'b', 'c'), key), N)intodictGet('dict', 'a', key), avoiding fetching unnecessary dictionary attributes. Controlled by theoptimize_dictget_tuple_elementsetting (enabled by default).Documentation entry for user-facing changes
New setting
optimize_dictget_tuple_element(Bool, defaulttrue): when enabled, rewritesdictGet/dictGetOrDefaultcalls that fetch a tuple of attributes and immediately index a single element, into adictGetcall that fetches only that single attribute. Supports both positional (.1,.2) and named (.country) access patterns.