After #31796 was merged now we have new infrastructure for query analysis and planning. There are a lot of things that new infrastructure brings to us:
Better performance of queries that spend considerable time during query analysis, planning stage. Better performance of queries with a lot of JOINs, a lot of subqueries. Example: Performance degradation on query interpretation #39996 .
A lot of new SQL features that are now possible to implement (better scalar subqueries, correlated subqueries).
Better support for JOINs (Support for indexes, FINAL, SAMPLE).
Ability to implement full featured query optimizer on top of new infrastructure.
Better usability, better exception messages.
Full SQL specificiation of our SQL extensions is now possible to write.
This ticket will contain tasks that are necessary to implement before we can enable new infrastructure by default.
Analyzer
QueryTreePassManager
Planner
QueryPlan optimizations
Backward compatibility
Additional tasks
Broken tests
Grouped
distributed_unused_shards @kitaisreal @KochetovNicolai 00754_distributed_optimize_skip_select_on_unused_shards 00754_distributed_optimize_skip_select_on_unused_shards_with_prewhere 01071_force_optimize_skip_unused_shards 01072_optimize_skip_unused_shards_const_expr_eval 01211_optimize_skip_unused_shards_type_mismatch 01213_optimize_skip_unused_shards_DISTINCT 01247_optimize_distributed_group_by_sharding_key_dist_on_dist 01270_optimize_skip_unused_shards_low_cardinality 01319_optimize_skip_unused_shards_nesting 01527_dist_sharding_key_dictGet_reload 01528_allow_nondeterministic_optimize_skip_unused_shards 01563_distributed_query_finish 01756_optimize_skip_unused_shards_rewrite_in 01757_optimize_skip_unused_shards_limit 01758_optimize_skip_unused_shards_once 01759_optimize_skip_unused_shards_zero_shards 01901_in_literal_shard_prune 01930_optimize_skip_unused_shards_rewrite_in Support skip_unused_shards in Analyzer #53282
projections 01576_alias_column_rewrite 01710_projection_additional_filters Fix some more analyzer tests #54128
subquery_in_index @KochetovNicolai 01700_system_zookeeper_path_in 00562_in_subquery_merge_tree 00673_subquery_prepared_set_performance 00945_bloom_filter_index 00981_in_subquery_with_tuple 01583_const_column_in_set_index 01585_use_index_for_global_in 01585_use_index_for_global_in_with_null 01681_bloom_filter_nullable_column 01739_index_hint
rewrite 00597_push_down_predicate_long(Fix 00597_push_down_predicate_long for analyzer #49551 ) 01655_plan_optimizations (already fixed), 01655_plan_optimizations_optimize_read_in_window_order(Fix 01655_plan_optimizations_optimize_read_in_window_order for analyzer #49565 ) 01951_distributed_push_down_limit, 02343_aggregation_pipeline (already fixed), 02377_optimize_sorting_by_input_stream_properties_explain(Fix for analyzer: 02377_ optimize_sorting_by_input_stream_properties_e… #49943 ), 02493_inconsistent_hex_and_binary_number 02511_complex_literals_as_aggregate_function_parameters (Analyzer: apply _CAST to constants only once #49800 ) @devcrafter
storage_materialized_view 01155_rename_move_materialized_view 01268_mv_scalars 01947_mv_subquery 02139_MV_with_scalar_subquery 02174_cte_scalar_cache_mv 02479_race_condition_between_insert_and_droppin_mv @yakov-olkhovskiy
storage_merge 01083_expressions_in_engine_arguments 01214_test_storage_merge_aliases_with_where 01560_merge_distributed_join 01925_test_storage_merge_aliases 02402_merge_engine_with_view 02575_merge_prewhere_different_default_kind 02763_row_policy_storage_merge_alias @novikd Analyzer: support aliases and distributed JOINs in StorageMerge #50894
join_use_nulls 01142_join_lc_and_nullable_in_key 01142_merge_join_lc_and_nullable_in_key 01476_right_full_join_switch 01477_lc_in_merge_join_left_key 02024_join_on_or_long Fix test reference files for join using nullable column #48893 and Fix join_use_nulls in analyzer #49359 @vdimir
join_types 01049_join_low_card_bug_long 01353_low_cardinality_join_types 01721_join_implicit_cast_long 02242_join_rocksdb 02267_join_dup_columns_issue36199 Fixing join tests with analyzer #49555 @vdimir
01062_pm_all_join_with_block_continuation @vdimir Analyzer: compatibility setting to support JOIN USING to level column #56458
02481_merge_array_join_sample_by Analyzer: fix column not found for optimized prewhere with sample by #49784 @vdimir
distributed_different_databases 01152_cross_replication 01232_extremes 01487_distributed_in_not_default_db 01615_random_one_shard_insertion @novikd Analyzer: WIP on distributed queries #50097 Analyzer: fix 01487_distributed_in_not_default_db #50587
invalid_set 00700_decimal_compare 02421_decimal_in_precision_issue_41125 02428_decimal_in_floating_point_literal @vdimir Fix IN with decimal in analyzer #48754
transform_query_for_external_database 01086_odbc_roundtrip Fixing 01086_odbc_roundtrip with analyzer. #54133
partition_pruning 01540_verbatim_partition_pruning 01748_partition_id_pruning Fix partition id pruning for analyzer. #54185
subquery_remove_unused_columns 00593_union_all_assert_columns_removed 01455_shard_leaf_max_rows_bytes_to_read 01586_columns_pruning @kitaisreal Analyzer remove unused projection columns #56499
key_condition 01651_bugs_from_15889 01786_explain_merge_tree 01889_key_condition_function_chains Easy tests fix for analyzer #56211
additional_filters 02346_additional_filters_distr
set_index 00838_unique_index 00979_set_index_not Analyzer: fix 00979_set_index_not.sql #52754 @devcrafter
use_structure_from_insertion_table_in_table_functions 02458_use_structure_from_insertion_table
storage_key_value 01504_rocksdb 02375_rocksdb_with_filters Fix rocksdb with analyzer. #56391
transactions 01173_transaction_control_queries 02345_implicit_transaction @kitaisreal
optimize_using_constraints 01624_soft_constraints Hypothesis index in the analyzer #48381
read_in_order 00940_order_by_read_in_order_query_plan Analyzer ORDER BY read in order query plan add test #56532 @kitaisreal
dependend_objects 01747_join_view_filter_dictionary @kitaisreal
projection_names 02426_orc_bug Analyzer special functions projection names fix #56514 @kitaisreal
alias_columns 02494_optimize_group_by_function_keys_and_alias_columns
parameterized_view 02428_parameterized_view 02818_parameterized_view_with_cte_multiple_usage @novikd @SmitaRKulkarni Refactor parameterized view & add support for new analyzer #54211
allow_compound_columns 00927_asof_joins @vdimir
allow_aggregate_partitions_independently 02521_aggregation_by_partitions Properly process aliases for aggregation-by-partition optimization. #56254
join_with_constants 02382_join_and_filtering_set @vdimir
cte_distributed 02341_global_join_cte
remove_join 01890_materialized_distributed_join @vdimir Analyzer: fix block stucture mismatch in matview with engine distributed #55741
distributed_conversion 01268_shard_avgweighted /// Not found column avgWeighted(100_Int8, -1_Int8) in block. There are only columns: avgWeighted(100_UInt8, -1_Int8) @kitaisreal
invalid_view_subquery_parsing 02366_union_decimal_conversion Fix incomplete query result for UNION in view() function. #56274
prewhere 00717_merge_and_distributed Virtual columns for merge : Column _table not found in table _table_function.merge
any_join_const 02000_join_on_const @vdimir
recursive_cte 01495_subqueries_in_with_statement Infinit cte substitution? WITH test1 AS (SELECT i + 1, j + 1 FROM test1) SELECT * FROM test1 -> Too deep subqueries
used_functions_log 01656_test_query_log_factories_info We create some helper functions (or, if, multiif, multiMatchAny), in passes even we don't need it. Alos there is a difference in CRC32 case.
memory_bound_merging 02404_memory_bound_merging Fix 02404_memory_bound_merging with analyzer. #56419
row_policy 02131_used_row_policies_in_query_log Analyzer log used row policies #56531 @kitaisreal
distributed 00223_shard_distributed_aggregation_memory_efficient select randConstant(), randConstant(), randConstant() -> Block structure mismatch
asof_join_nullable 01428_nullable_asof_join Allow ASOF JOIN over nullable right column #49826 @vdimir
storage_s3 02302_s3_file_pruning @vdimir Analyzer: filtering by virtual columns for StorageS3 #56668
distinct_in_order 02317_distinct_in_order_optimization_explain (Update sort desc in ReadFromMergeTree after applying PREWHERE info #48669 ) @devcrafter
distributed_aggregation_memory_efficient 01231_distributed_aggregation_memory_efficient_mix_levels
cast_null 01761_cast_to_enum_nullable @kitaisreal
grouping_sets 02554_fix_grouping_sets_predicate_push_down Fix 02554_fix_grouping_sets_predicate_push_down with analyzer. #56595
input_function 00952_input_function
sorting in distributed queries 02001_shard_num_shard_count test_parallel_replicas_skip_shards/test.py::test_skip_unavailable_shards
invalid_aggregation 02352_grouby_shadows_arg
merge_tree_index 01064_incremental_streaming_from_2_src_with_feedback
storage_buffer 01584_distributed_buffer_cannot_find_column
big_constants 00725_memory_tracking @kitaisreal
join_column_cast 01925_join_materialized_columns @vdimir
01739_index_hint 02880_indexHint__partition_id @KochetovNicolai
Integrations tests
Performance tests
Separate issue was created for this topic: #62245 .
Other
Later
After #31796 was merged now we have new infrastructure for query analysis and planning. There are a lot of things that new infrastructure brings to us:
This ticket will contain tasks that are necessary to implement before we can enable new infrastructure by default.
Analyzer
group_by_use_nullssetting in new analyzer #45910SELECT * FROM table AS tablefor table expressions. Developer @kitaisreal. Analyzer improve aliases support for table expressions #43089.QueryTreePassManager
optimize_if_transform_strings_to_enumin new analyzer #43999Planner
QueryPlan optimizations
[Analyzer migration] Enable optimize_duplicate_order_by_and_distinct (ORDER BY part) #43229Remove redundant sorting #43905 Remove redundant DISTINCT #44176Backward compatibility
single_join_prefer_left_table. Developer @kitaisreal. Analyzer planner fixes before enable by default #46471Additional tasks
Broken tests
Grouped
00754_distributed_optimize_skip_select_on_unused_shards00754_distributed_optimize_skip_select_on_unused_shards_with_prewhere01071_force_optimize_skip_unused_shards01072_optimize_skip_unused_shards_const_expr_eval01211_optimize_skip_unused_shards_type_mismatch01213_optimize_skip_unused_shards_DISTINCT01247_optimize_distributed_group_by_sharding_key_dist_on_dist01270_optimize_skip_unused_shards_low_cardinality01319_optimize_skip_unused_shards_nesting01527_dist_sharding_key_dictGet_reload01528_allow_nondeterministic_optimize_skip_unused_shards01563_distributed_query_finish01756_optimize_skip_unused_shards_rewrite_in01757_optimize_skip_unused_shards_limit01758_optimize_skip_unused_shards_once01759_optimize_skip_unused_shards_zero_shards01901_in_literal_shard_prune01930_optimize_skip_unused_shards_rewrite_inSupport skip_unused_shards in Analyzer #5328201576_alias_column_rewrite01710_projection_additional_filtersFix some more analyzer tests #5412801700_system_zookeeper_path_in 00562_in_subquery_merge_tree00673_subquery_prepared_set_performance00945_bloom_filter_index00981_in_subquery_with_tuple01583_const_column_in_set_index01585_use_index_for_global_in01585_use_index_for_global_in_with_null01681_bloom_filter_nullable_column01739_index_hint00597_push_down_predicate_long(Fix 00597_push_down_predicate_long for analyzer #49551)01655_plan_optimizations(already fixed),01655_plan_optimizations_optimize_read_in_window_order(Fix 01655_plan_optimizations_optimize_read_in_window_order for analyzer #49565)01951_distributed_push_down_limit,02343_aggregation_pipeline(already fixed),02377_optimize_sorting_by_input_stream_properties_explain(Fix for analyzer: 02377_ optimize_sorting_by_input_stream_properties_e… #49943),02493_inconsistent_hex_and_binary_number02511_complex_literals_as_aggregate_function_parameters(Analyzer: apply _CAST to constants only once #49800) @devcrafter01155_rename_move_materialized_view01268_mv_scalars01947_mv_subquery02139_MV_with_scalar_subquery02174_cte_scalar_cache_mv02479_race_condition_between_insert_and_droppin_mv@yakov-olkhovskiy01083_expressions_in_engine_arguments01214_test_storage_merge_aliases_with_where01560_merge_distributed_join01925_test_storage_merge_aliases02402_merge_engine_with_view02575_merge_prewhere_different_default_kind02763_row_policy_storage_merge_alias@novikd Analyzer: support aliases and distributed JOINs in StorageMerge #5089401142_join_lc_and_nullable_in_key01142_merge_join_lc_and_nullable_in_key01476_right_full_join_switch01477_lc_in_merge_join_left_key02024_join_on_or_longFix test reference files for join using nullable column #48893 and Fix join_use_nulls in analyzer #49359 @vdimir01049_join_low_card_bug_long01353_low_cardinality_join_types01721_join_implicit_cast_long02242_join_rocksdb02267_join_dup_columns_issue36199Fixing join tests with analyzer #49555 @vdimir01062_pm_all_join_with_block_continuation@vdimir Analyzer: compatibility setting to support JOIN USING to level column #5645802481_merge_array_join_sample_byAnalyzer: fix column not found for optimized prewhere with sample by #49784 @vdimir01152_cross_replication01232_extremes01487_distributed_in_not_default_db01615_random_one_shard_insertion@novikd Analyzer: WIP on distributed queries #50097 Analyzer: fix 01487_distributed_in_not_default_db #5058700700_decimal_compare02421_decimal_in_precision_issue_4112502428_decimal_in_floating_point_literal@vdimir Fix IN with decimal in analyzer #4875401086_odbc_roundtripFixing 01086_odbc_roundtrip with analyzer. #5413301540_verbatim_partition_pruning01748_partition_id_pruningFix partition id pruning for analyzer. #5418500593_union_all_assert_columns_removed01455_shard_leaf_max_rows_bytes_to_read01586_columns_pruning@kitaisreal Analyzer remove unused projection columns #5649901651_bugs_from_1588901786_explain_merge_tree01889_key_condition_function_chainsEasy tests fix for analyzer #5621102346_additional_filters_distr00838_unique_index00979_set_index_notAnalyzer: fix 00979_set_index_not.sql #52754 @devcrafter02458_use_structure_from_insertion_table01504_rocksdb02375_rocksdb_with_filtersFix rocksdb with analyzer. #5639101173_transaction_control_queries02345_implicit_transaction@kitaisreal01624_soft_constraintsHypothesis index in the analyzer #4838100940_order_by_read_in_order_query_planAnalyzer ORDER BY read in order query plan add test #56532 @kitaisreal01747_join_view_filter_dictionary@kitaisreal02426_orc_bugAnalyzer special functions projection names fix #56514 @kitaisreal02494_optimize_group_by_function_keys_and_alias_columns02428_parameterized_view02818_parameterized_view_with_cte_multiple_usage@novikd @SmitaRKulkarni Refactor parameterized view & add support for new analyzer #5421100927_asof_joins@vdimir02521_aggregation_by_partitionsProperly process aliases for aggregation-by-partition optimization. #5625402382_join_and_filtering_set@vdimir02341_global_join_cte01890_materialized_distributed_join@vdimir Analyzer: fix block stucture mismatch in matview with engine distributed #5574101268_shard_avgweighted/// Not found column avgWeighted(100_Int8, -1_Int8) in block. There are only columns: avgWeighted(100_UInt8, -1_Int8)@kitaisreal02366_union_decimal_conversionFix incomplete query result for UNION in view() function. #5627400717_merge_and_distributedVirtual columns for merge :Column _table not found in table _table_function.merge02000_join_on_const@vdimir01495_subqueries_in_with_statementInfinit cte substitution?WITH test1 AS (SELECT i + 1, j + 1 FROM test1) SELECT * FROM test1->Too deep subqueries01656_test_query_log_factories_infoWe create some helper functions (or,if,multiif,multiMatchAny), in passes even we don't need it. Alos there is a difference inCRC32case.02404_memory_bound_mergingFix 02404_memory_bound_merging with analyzer. #5641902131_used_row_policies_in_query_logAnalyzer log used row policies #56531 @kitaisreal00223_shard_distributed_aggregation_memory_efficientselect randConstant(), randConstant(), randConstant()->Block structure mismatch01428_nullable_asof_joinAllow ASOF JOIN over nullable right column #49826 @vdimir02302_s3_file_pruning@vdimir Analyzer: filtering by virtual columns for StorageS3 #5666802317_distinct_in_order_optimization_explain(Update sort desc in ReadFromMergeTree after applying PREWHERE info #48669) @devcrafter01231_distributed_aggregation_memory_efficient_mix_levels01761_cast_to_enum_nullable@kitaisreal02554_fix_grouping_sets_predicate_push_downFix 02554_fix_grouping_sets_predicate_push_down with analyzer. #5659500952_input_function02001_shard_num_shard_counttest_parallel_replicas_skip_shards/test.py::test_skip_unavailable_shards02352_grouby_shadows_arg01064_incremental_streaming_from_2_src_with_feedback01584_distributed_buffer_cannot_find_column00725_memory_tracking@kitaisreal01925_join_materialized_columns@vdimir01739_index_hint02880_indexHint__partition_id@KochetovNicolaiIntegrations tests
test_wrong_db_or_table_name/test.py::test_wrong_table_name,test_sql_user_defined_functions_on_cluster/test.py::test_sql_user_defined_functions_on_cluster,test_user_defined_object_persistence/test.py::test_persistence,test_dictionaries_update_and_reload/test.py::test_reload_after_fail_in_cache_dictionary,test_access_for_functions/test.py::test_access_rights_for_function,test_select_access_rightstest_profile_events_s3/test.py::test_profile_eventstest_mysql_database_engine/test.py::test_mysql_ddl_for_mysql_databasetest_executable_table_function/test.py::test_executable_function_input_pythontest_cluster_copiertest_backward_compatibility,test_distributed_backward_compatability/test.py::test_distributed_in_tupledistributed between nodes with old and new analyzertest_user_defined_object_persistence/test.py::test_persistencetest_storage_rabbitmq/test.py::test_rabbitmq_materialized_viewSELECT * FROM test.rabbitmq GROUP BY (key, value)test_mutations_with_merge_tree/test.py::test_mutations_with_merge_background_tasktest_build_sets_from_multiple_threads/test.py::test_setPerformance tests
Separate issue was created for this topic: #62245.
direct_dictionary,set_index@kitaisrealpush_down_limitinjective_functions_inside_uniqagg_functions_min_max_anynorm_distance_float,norm_distancearray_fillbig constants @kitaisreallogical_functions_mediumif_transform_strings_to_enumdistinct_combinatorarithmetic_operations_in_aggr_funcaggregate_functions_of_group_by_keysorder_by_read_in_orderaggregate_with_serialized_methodmaterialized_view_parallelize_output_from_storagesif_transform_strings_to_enumlow_cardinality_queryuniqExactIfparallel_indexrandrewritegroup_by_fixed_keysrewritelogical_functions_small,logical_functions_medium,logical_functions_largerewritehash_table_sizes_statsrewritejoins_in_memoryrewriteOther
02713_array_low_cardinality_string02707_skip_index_with_in02707_complex_query_fails_analyzerRemove 02707_complex_query_fails_analyzer test #4940302680_mysql_ast_logical_err@vdimir Fix 02680_mysql_ast_logical_err in analyzer #4936202324_map_combinator_bug02241_join_rocksdb_bs@vdimir02003_WithMergeableStateAfterAggregationAndLimit_LIMIT_BY_LIMIT_OFFSET01626_cnf_fuzz_longCorrectly stop CNF convert for too many atomics in new analyzer #4940201115_join_with_dictionary@vdimir01009_global_array_join_names@vdimir Analyzer: fix logical error with set in array join #5658700917_multiple_joins_denny_crane@vdimir00725_join_on_bug_1@vdimir00636_partition_key_parts_pruning00261_storage_aliases_and_array_joinRemove passed tests fromanalyzer_tech_debt.txt#5561802701_non_parametric_functionThrow an exception for non-parametric functions in new analyzer #4941901825_type_json_multiple_files01281_group_by_limit_memory_tracking02723_zookeeper_name@vdimir01019_alter_materialized_view_consistent01019_alter_materialized_view_consistentfailed #49261 Analyzer: fix 01019_alter_materialized_view_consistent #5624601244_optimize_distributed_group_by_sharding_key01952_optimize_distributed_group_by_sharding_key01287_max_execution_speedFlaky test01287_max_execution_speedwith analyzer #5605202725_agg_projection_resprect_PK@KochetovNicolai02784_parallel_replicas_automatic_decision_join@Algunenano (broken test added in Automatic decision of nº of parallel replicas #51692) @KochetovNicolai Allow parallel replicas for JOIN with analyzer [part 1]. #58838Later
02354_annoy