-
Notifications
You must be signed in to change notification settings - Fork 8.3k
[Umbrella] Analyzer, Planner migration #42648
Copy link
Copy link
Closed
Labels
analyzerIssues and pull-requests related to new analyzerIssues and pull-requests related to new analyzer
Description
After #31796 was merged now we have new infrastructure for query analysis and planning. There are a lot of things that new infrastructure brings to us:
- Better performance of queries that spend considerable time during query analysis, planning stage. Better performance of queries with a lot of JOINs, a lot of subqueries. Example: Performance degradation on query interpretation #39996.
- A lot of new SQL features that are now possible to implement (better scalar subqueries, correlated subqueries).
- Better support for JOINs (Support for indexes, FINAL, SAMPLE).
- Ability to implement full featured query optimizer on top of new infrastructure.
- Better usability, better exception messages.
- Full SQL specificiation of our SQL extensions is now possible to write.
This ticket will contain tasks that are necessary to implement before we can enable new infrastructure by default.
Analyzer
- SELECT (compound_expression).*, (compound_expression).COLUMNS are not supported on parser level. Developer @evillique. Improve Asterisk and ColumnMatcher parsers #42884
- SELECT a.b.c.*, a.b.c.COLUMNS. Qualified matcher where identifier size is greater than 2 are not supported on parser level. Developer @evillique. Improve Asterisk and ColumnMatcher parsers #42884
- Support function identifier resolve from parent query scope, if lambda in parent scope does not capture any columns.
- Support group_by_use_nulls. Developer @novikd. Support
group_by_use_nullssetting in new analyzer #45910 - Support cache for scalar subqueries. Developer @SmitaRKulkarni. Support scalar subqueries cache #43640
- Support query tree in table functions. Developer @kitaisreal. Analyzer planner fixes before enable by default #46471
- Support
SELECT * FROM table AS tablefor table expressions. Developer @kitaisreal. Analyzer improve aliases support for table expressions #43089. - Support typo corrections for identifiers. Support typo corrections for functions. Developer @kitaisreal. Analyzer added identifier typo corrections #43002.
- Support storage Merge. Developer @kitaisreal. Analyzer planner fixes before enable by default #46471
- Support storage Materialized View. Developer @kitaisreal. Analyzer planner fixes before enable by default #46471
- Support storage Live View. Developer @kitaisreal. Analyzer planner fixes before enable by default #46471
- Support storage Window View.
- Support transform query for external database. Developer @vdimir. Support transformQueryForExternalDatabase for analyzer #47316
- Join columns refactoring. Developer @novikd Analyzer: support functional arguments in USING clause #58317 @vdimir Analyzer: fixes for JOIN columns resolution #59007
- Constants refactoring. Developer @kitaisreal Analyzer constants refactoring #59523
- Execution names refactoring. Developer @novikd Analyzer: always qualify execution names #53705
QueryTreePassManager
- Support logical expressions optimizer. Developer @antonio2368 Logical expression optimizer in new analyzer #46644
- Support fuse sum count optimize_fuse_sum_count_avg, optimize_syntax_fuse_functions. Developer @vdimir. Support optimize_syntax_fuse_functions for sum/count/avg via analyzer #42689
- Support setting convert_query_to_cnf. Developer @antonio2368 CNF/Constraint optimizer in new analyzer #47617
- Support setting optimize_using_constraints. Developer @antonio2368 CNF/Constraint optimizer in new analyzer #47617
- Support setting optimize_substitute_columns. Developer @antonio2368 CNF/Constraint optimizer in new analyzer #47617
- Support GROUP BY injective function elimination. Developer @novikd Analyzer: Support GROUP BY injective function elimination #58919
- Support GROUP BY functions of other keys elimination. Developer @novikd. Support optimize_group_by_function_keys on top of QueryTree #43261
- Support setting optimize_move_functions_out_of_any. Developer @JackyWoo. New analyzer: add move functions out of any pass #52082
- Support setting optimize_aggregators_of_group_by_keys. Developer @JackyWoo @novikd. New analyzer: add eliminate min/max/any/anyLast aggregators pass #52230
- Support setting optimize_redundant_functions_in_order_by. Developer @novikd. Implement optimize_redundant_functions_in_order_by on top of QueryTree. #42970
- Support setting optimize_monotonous_functions_in_order_by. Developer @novikd.
- Support setting optimize_if_transform_strings_to_enum. Developer @antonio2368. Support
optimize_if_transform_strings_to_enumin new analyzer #43999 - Support settings.optimize_or_like_chain. Developer @novikd. Support optimize_or_like_chain in QueryTreePassManager #42797
- Add optimizations based on function semantics. Example: SELECT * FROM test_table WHERE id != id. (id is not nullable column).
- Support CROSS JOIN to INNER JOIN rewrite. Developer @vdimir. Add CrossToInnerJoinPass #46408
- Support _shard_num identifier to shardNum() function rewriting. Developer @kitaisreal. Analyzer Planner fixes before enable by default #47101
- Support remove unused projection columns in subqueries. Developer @kitaisreal. Analyzer remove unused projection columns #56499
Planner
- Support JOIN with JOIN engine. Developer @vdimir. Join engine works with analyzer #43897
- JOIN drop unnecessary columns after ON, USING section. @vdimir Drop unused columns after join on/using #44545
- Support RBAC. Developer @kitaisreal. Analyzer planner fixes before enable by default #47622
- Support distributed query processing. Developer @kitaisreal. Analyzer support distributed queries processing #44540 Analyzer planner fixes before enable by default #47383
- Support projections. Developer @KochetovNicolai.
- Support UNION storage limits. Support storage limits in general. Need to check all places in InterpreterSelectQuery. Developer @yakov-olkhovskiy. Analyzer - storage limits - propagate limits into subquery #46306
- Support indexes. Support indexes for JOINS. Developer @kitaisreal. Analyzer added indexes support #43341 Analyzer support Set index #44097
- Support experimental hypothesis indexes. Developer @antonio2368. Hypothesis index in the analyzer #48381
- Support experimental ANN indexes. Developer @kitaisreal.
- Support indexes for IN function. Developer @kitaisreal.
- Support settings limit and offset. Developer @yakov-olkhovskiy. Analyzer - apply limit and offset settings - second attempt #46632
- Support trivial count optimization. Developer @devcrafter. Planner: trivial count optimization #46923
- StorageMerge refactoring. Developer @novikd
QueryPlan optimizations
- Support read in order optimizations. Developer @KochetovNicolai. Read-in-order over query plan #42829 Aggregation-in-order from query plan #43592
- Support ORDER BY read in order optimization. Developer @KochetovNicolai. Read-in-order over query plan #42829
- Support GROUP BY read in order optimization. Developer @KochetovNicolai. Aggregation-in-order from query plan #43592
- Support PREWHERE. Developer @kitaisreal. Analyzer planner fixes before enable by default #47622
- Support setting optimize_duplicate_order_by_and_distinct. Developer @devcrafter.
[Analyzer migration] Enable optimize_duplicate_order_by_and_distinct (ORDER BY part) #43229Remove redundant sorting #43905 Remove redundant DISTINCT #44176
Backward compatibility
- Added setting
single_join_prefer_left_table. Developer @kitaisreal. Analyzer planner fixes before enable by default #46471
Additional tasks
- Make full SQL + ClickHouse extensions specification. Developer @kitaisreal. #58027
- Tune performance of Analyzer using different low level optimizations.
- Add performance tests in our CI/CD for big queries with a lot of JOINS.
- Add performance tests for high concurrent queries workload scenario.
Broken tests
Grouped
- distributed_unused_shards @kitaisreal @KochetovNicolai
00754_distributed_optimize_skip_select_on_unused_shards00754_distributed_optimize_skip_select_on_unused_shards_with_prewhere01071_force_optimize_skip_unused_shards01072_optimize_skip_unused_shards_const_expr_eval01211_optimize_skip_unused_shards_type_mismatch01213_optimize_skip_unused_shards_DISTINCT01247_optimize_distributed_group_by_sharding_key_dist_on_dist01270_optimize_skip_unused_shards_low_cardinality01319_optimize_skip_unused_shards_nesting01527_dist_sharding_key_dictGet_reload01528_allow_nondeterministic_optimize_skip_unused_shards01563_distributed_query_finish01756_optimize_skip_unused_shards_rewrite_in01757_optimize_skip_unused_shards_limit01758_optimize_skip_unused_shards_once01759_optimize_skip_unused_shards_zero_shards01901_in_literal_shard_prune01930_optimize_skip_unused_shards_rewrite_inSupport skip_unused_shards in Analyzer #53282 - projections
01576_alias_column_rewrite01710_projection_additional_filtersFix some more analyzer tests #54128 - subquery_in_index @KochetovNicolai
01700_system_zookeeper_path_in 00562_in_subquery_merge_tree00673_subquery_prepared_set_performance00945_bloom_filter_index00981_in_subquery_with_tuple01583_const_column_in_set_index01585_use_index_for_global_in01585_use_index_for_global_in_with_null01681_bloom_filter_nullable_column01739_index_hint - rewrite
00597_push_down_predicate_long(Fix 00597_push_down_predicate_long for analyzer #49551)01655_plan_optimizations(already fixed),01655_plan_optimizations_optimize_read_in_window_order(Fix 01655_plan_optimizations_optimize_read_in_window_order for analyzer #49565)01951_distributed_push_down_limit,02343_aggregation_pipeline(already fixed),02377_optimize_sorting_by_input_stream_properties_explain(Fix for analyzer: 02377_ optimize_sorting_by_input_stream_properties_e… #49943),02493_inconsistent_hex_and_binary_number02511_complex_literals_as_aggregate_function_parameters(Analyzer: apply _CAST to constants only once #49800) @devcrafter - storage_materialized_view
01155_rename_move_materialized_view01268_mv_scalars01947_mv_subquery02139_MV_with_scalar_subquery02174_cte_scalar_cache_mv02479_race_condition_between_insert_and_droppin_mv@yakov-olkhovskiy - storage_merge
01083_expressions_in_engine_arguments01214_test_storage_merge_aliases_with_where01560_merge_distributed_join01925_test_storage_merge_aliases02402_merge_engine_with_view02575_merge_prewhere_different_default_kind02763_row_policy_storage_merge_alias@novikd Analyzer: support aliases and distributed JOINs in StorageMerge #50894 - join_use_nulls
01142_join_lc_and_nullable_in_key01142_merge_join_lc_and_nullable_in_key01476_right_full_join_switch01477_lc_in_merge_join_left_key02024_join_on_or_longFix test reference files for join using nullable column #48893 and Fix join_use_nulls in analyzer #49359 @vdimir - join_types
01049_join_low_card_bug_long01353_low_cardinality_join_types01721_join_implicit_cast_long02242_join_rocksdb02267_join_dup_columns_issue36199Fixing join tests with analyzer #49555 @vdimir -
01062_pm_all_join_with_block_continuation@vdimir Analyzer: compatibility setting to support JOIN USING to level column #56458 -
02481_merge_array_join_sample_byAnalyzer: fix column not found for optimized prewhere with sample by #49784 @vdimir - distributed_different_databases
01152_cross_replication01232_extremes01487_distributed_in_not_default_db01615_random_one_shard_insertion@novikd Analyzer: WIP on distributed queries #50097 Analyzer: fix 01487_distributed_in_not_default_db #50587 - invalid_set
00700_decimal_compare02421_decimal_in_precision_issue_4112502428_decimal_in_floating_point_literal@vdimir Fix IN with decimal in analyzer #48754 - transform_query_for_external_database
01086_odbc_roundtripFixing 01086_odbc_roundtrip with analyzer. #54133 - partition_pruning
01540_verbatim_partition_pruning01748_partition_id_pruningFix partition id pruning for analyzer. #54185 - subquery_remove_unused_columns
00593_union_all_assert_columns_removed01455_shard_leaf_max_rows_bytes_to_read01586_columns_pruning@kitaisreal Analyzer remove unused projection columns #56499 - key_condition
01651_bugs_from_1588901786_explain_merge_tree01889_key_condition_function_chainsEasy tests fix for analyzer #56211 - additional_filters
02346_additional_filters_distr - set_index
00838_unique_index00979_set_index_notAnalyzer: fix 00979_set_index_not.sql #52754 @devcrafter - use_structure_from_insertion_table_in_table_functions
02458_use_structure_from_insertion_table - storage_key_value
01504_rocksdb02375_rocksdb_with_filtersFix rocksdb with analyzer. #56391 - transactions
01173_transaction_control_queries02345_implicit_transaction@kitaisreal - optimize_using_constraints
01624_soft_constraintsHypothesis index in the analyzer #48381 - read_in_order
00940_order_by_read_in_order_query_planAnalyzer ORDER BY read in order query plan add test #56532 @kitaisreal - dependend_objects
01747_join_view_filter_dictionary@kitaisreal - projection_names
02426_orc_bugAnalyzer special functions projection names fix #56514 @kitaisreal - alias_columns
02494_optimize_group_by_function_keys_and_alias_columns - parameterized_view
02428_parameterized_view02818_parameterized_view_with_cte_multiple_usage@novikd @SmitaRKulkarni Refactor parameterized view & add support for new analyzer #54211 - allow_compound_columns
00927_asof_joins@vdimir - allow_aggregate_partitions_independently
02521_aggregation_by_partitionsProperly process aliases for aggregation-by-partition optimization. #56254 - join_with_constants
02382_join_and_filtering_set@vdimir - cte_distributed
02341_global_join_cte - remove_join
01890_materialized_distributed_join@vdimir Analyzer: fix block stucture mismatch in matview with engine distributed #55741 - distributed_conversion
01268_shard_avgweighted/// Not found column avgWeighted(100_Int8, -1_Int8) in block. There are only columns: avgWeighted(100_UInt8, -1_Int8)@kitaisreal - invalid_view_subquery_parsing
02366_union_decimal_conversionFix incomplete query result for UNION in view() function. #56274 - prewhere
00717_merge_and_distributedVirtual columns for merge :Column _table not found in table _table_function.merge - any_join_const
02000_join_on_const@vdimir - recursive_cte
01495_subqueries_in_with_statementInfinit cte substitution?WITH test1 AS (SELECT i + 1, j + 1 FROM test1) SELECT * FROM test1->Too deep subqueries - used_functions_log
01656_test_query_log_factories_infoWe create some helper functions (or,if,multiif,multiMatchAny), in passes even we don't need it. Alos there is a difference inCRC32case. - memory_bound_merging
02404_memory_bound_mergingFix 02404_memory_bound_merging with analyzer. #56419 - row_policy
02131_used_row_policies_in_query_logAnalyzer log used row policies #56531 @kitaisreal - distributed
00223_shard_distributed_aggregation_memory_efficientselect randConstant(), randConstant(), randConstant()->Block structure mismatch - asof_join_nullable
01428_nullable_asof_joinAllow ASOF JOIN over nullable right column #49826 @vdimir - storage_s3
02302_s3_file_pruning@vdimir Analyzer: filtering by virtual columns for StorageS3 #56668 - distinct_in_order
02317_distinct_in_order_optimization_explain(Update sort desc in ReadFromMergeTree after applying PREWHERE info #48669) @devcrafter - distributed_aggregation_memory_efficient
01231_distributed_aggregation_memory_efficient_mix_levels - cast_null
01761_cast_to_enum_nullable@kitaisreal - grouping_sets
02554_fix_grouping_sets_predicate_push_downFix 02554_fix_grouping_sets_predicate_push_down with analyzer. #56595 - input_function
00952_input_function - sorting in distributed queries
02001_shard_num_shard_counttest_parallel_replicas_skip_shards/test.py::test_skip_unavailable_shards - invalid_aggregation
02352_grouby_shadows_arg - merge_tree_index
01064_incremental_streaming_from_2_src_with_feedback - storage_buffer
01584_distributed_buffer_cannot_find_column - big_constants
00725_memory_tracking@kitaisreal - join_column_cast
01925_join_materialized_columns@vdimir -
01739_index_hint02880_indexHint__partition_id@KochetovNicolai
Integrations tests
-
test_wrong_db_or_table_name/test.py::test_wrong_table_name,test_sql_user_defined_functions_on_cluster/test.py::test_sql_user_defined_functions_on_cluster,test_user_defined_object_persistence/test.py::test_persistence,test_dictionaries_update_and_reload/test.py::test_reload_after_fail_in_cache_dictionary,test_access_for_functions/test.py::test_access_rights_for_function,test_select_access_rights -
test_profile_events_s3/test.py::test_profile_events -
test_mysql_database_engine/test.py::test_mysql_ddl_for_mysql_database -
test_executable_table_function/test.py::test_executable_function_input_python -
test_cluster_copier -
test_backward_compatibility,test_distributed_backward_compatability/test.py::test_distributed_in_tupledistributed between nodes with old and new analyzer -
test_user_defined_object_persistence/test.py::test_persistence -
test_storage_rabbitmq/test.py::test_rabbitmq_materialized_viewSELECT * FROM test.rabbitmq GROUP BY (key, value) -
test_mutations_with_merge_tree/test.py::test_mutations_with_merge_background_task -
test_build_sets_from_multiple_threads/test.py::test_set
Performance tests
Separate issue was created for this topic: #62245.
-
direct_dictionary,set_index@kitaisreal -
push_down_limit -
injective_functions_inside_uniq -
agg_functions_min_max_any -
norm_distance_float,norm_distance -
array_fillbig constants @kitaisreal -
logical_functions_medium -
if_transform_strings_to_enum -
distinct_combinator -
arithmetic_operations_in_aggr_func -
aggregate_functions_of_group_by_keys -
order_by_read_in_order -
aggregate_with_serialized_method -
materialized_view_parallelize_output_from_storages -
if_transform_strings_to_enum -
low_cardinality_query -
uniqExactIf -
parallel_index -
randrewrite -
group_by_fixed_keysrewrite -
logical_functions_small,logical_functions_medium,logical_functions_largerewrite -
hash_table_sizes_statsrewrite -
joins_in_memoryrewrite
Other
-
02713_array_low_cardinality_string -
02707_skip_index_with_in -
02707_complex_query_fails_analyzerRemove 02707_complex_query_fails_analyzer test #49403 -
02680_mysql_ast_logical_err@vdimir Fix 02680_mysql_ast_logical_err in analyzer #49362 -
02324_map_combinator_bug -
02241_join_rocksdb_bs@vdimir -
02003_WithMergeableStateAfterAggregationAndLimit_LIMIT_BY_LIMIT_OFFSET -
01626_cnf_fuzz_longCorrectly stop CNF convert for too many atomics in new analyzer #49402 -
01115_join_with_dictionary@vdimir -
01009_global_array_join_names@vdimir Analyzer: fix logical error with set in array join #56587 -
00917_multiple_joins_denny_crane@vdimir -
00725_join_on_bug_1@vdimir -
00636_partition_key_parts_pruning -
00261_storage_aliases_and_array_joinRemove passed tests fromanalyzer_tech_debt.txt#55618 -
02701_non_parametric_functionThrow an exception for non-parametric functions in new analyzer #49419 -
01825_type_json_multiple_files -
01281_group_by_limit_memory_tracking -
02723_zookeeper_name@vdimir -
01019_alter_materialized_view_consistent01019_alter_materialized_view_consistentfailed #49261 Analyzer: fix 01019_alter_materialized_view_consistent #56246 -
01244_optimize_distributed_group_by_sharding_key01952_optimize_distributed_group_by_sharding_key -
01287_max_execution_speedFlaky test01287_max_execution_speedwith analyzer #56052 -
02725_agg_projection_resprect_PK@KochetovNicolai -
02784_parallel_replicas_automatic_decision_join@Algunenano (broken test added in Automatic decision of nº of parallel replicas #51692) @KochetovNicolai Allow parallel replicas for JOIN with analyzer [part 1]. #58838
Later
- experimental_ann_index
02354_annoy
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
analyzerIssues and pull-requests related to new analyzerIssues and pull-requests related to new analyzer