-
Notifications
You must be signed in to change notification settings - Fork 3.7k
Description
Add a count distinct AggregateType base on RoaringBitmap, I have done a initial version, which could improve the count distinct performance dramatically.
The constraint for this count distinct AggregateType: the input data type should be bool, tiny int, small int, int. for other data type, we need mapping it to int use global dict or hash function.
When I do this work, I found some issue with current aggregate framework:
1 In order to add a new AggregateType, we need to change a lot of files:
be/src/common/bitmap.h | 111 +++++++++++++++++++++++++++++++++++++++++++++++++
be/src/exec/broker_scanner.cpp | 25 +++++++++--
be/src/exec/olap_scanner.cpp | 9 +---
be/src/exec/olap_scanner.h | 2 -
be/src/exprs/aggregate_functions.cpp | 110 +++++++++++++++++++++++++++++++++++++++++++++++-
be/src/exec/broker_scanner.cpp | 25 +++++++++--
be/src/exec/olap_scanner.cpp | 9 +---
be/src/exec/olap_scanner.h | 2 -
be/src/exprs/aggregate_functions.cpp | 110 +++++++++++++++++++++++++++++++++++++++++++++++-
be/src/exprs/aggregate_functions.h | 15 +++++++
be/src/exprs/anyval_util.cpp | 4 ++
be/src/exprs/anyval_util.h | 5 +++
be/src/exprs/expr.cpp | 3 ++
be/src/exprs/expr_context.cpp | 1 +
be/src/exprs/new_agg_fn_evaluator.cc | 5 ++-
be/src/gutil/port.h | 12 +++---
be/src/olap/CMakeLists.txt | 2 +-
be/src/olap/aggregate_func.cpp | 7 +++-
be/src/olap/aggregate_func.h | 32 +++++++++++++-
be/src/olap/column_reader.cpp | 3 +-
be/src/olap/column_reader.h | 1 +
be/src/olap/column_writer.cpp | 3 +-
be/src/olap/column_writer.h | 1 +
be/src/olap/field.cpp | 19 +++++----
be/src/olap/field.h | 30 +++++++++++--
be/src/olap/field_info.cpp | 11 +++++
be/src/olap/memtable.cpp | 19 ++++++++-
be/src/olap/olap_common.h | 6 ++-
be/src/olap/olap_engine.cpp | 6 +--
be/src/olap/olap_table.cpp | 20 ---------
be/src/olap/olap_table.h | 2 -
be/src/olap/push_handler.cpp | 3 +-
be/src/olap/reader.cpp | 2 +-
be/src/olap/row_block.cpp | 6 ++-
be/src/olap/row_cursor.cpp | 23 +++++++++-
be/src/olap/row_cursor.h | 11 ++++-
be/src/olap/schema.h | 10 ++---
be/src/olap/segment_reader.cpp | 27 +-----------
be/src/olap/segment_reader.h | 14 -------
be/src/olap/types.cpp | 1 +
be/src/olap/types.h | 12 ++++++
be/src/runtime/primitive_type.cpp | 13 +++++-
be/src/runtime/primitive_type.h | 7 +++-
be/src/runtime/raw_value.cpp | 6 ++-
be/src/runtime/raw_value.h | 11 +++--
be/src/runtime/raw_value_ir.cpp | 3 +-
be/src/runtime/result_writer.cpp | 1 -
be/src/runtime/tuple.cpp | 4 +-
be/src/runtime/types.h | 6 ++-
be/src/runtime/vectorized_row_batch.cpp | 6 ++-
be/src/udf/udf.h | 1 +
be/src/runtime/tuple.cpp | 4 +-
be/src/runtime/types.h | 6 ++-
be/src/runtime/vectorized_row_batch.cpp | 6 ++-
be/src/udf/udf.h | 1 +
be/src/util/symbols_util.cpp | 1 +
fe/src/main/cup/sql_parser.cup | 13 +++++-
fe/src/main/java/org/apache/doris/alter/SchemaChangeHandler.java | 5 +++
fe/src/main/java/org/apache/doris/analysis/AggregateInfoBase.java | 1 +
fe/src/main/java/org/apache/doris/analysis/BuiltinAggregateFunction.java | 1 +
fe/src/main/java/org/apache/doris/analysis/CreateTableStmt.java | 10 +++++
fe/src/main/java/org/apache/doris/analysis/Expr.java | 6 +++
fe/src/main/java/org/apache/doris/analysis/FunctionCallExpr.java | 18 +++++++-
fe/src/main/java/org/apache/doris/analysis/LiteralExpr.java | 1 +
fe/src/main/java/org/apache/doris/analysis/SelectStmt.java | 13 ++++++
fe/src/main/java/org/apache/doris/catalog/AggregateType.java | 9 +++-
fe/src/main/java/org/apache/doris/catalog/Column.java | 2 +-
fe/src/main/java/org/apache/doris/catalog/Function.java | 2 +
fe/src/main/java/org/apache/doris/catalog/FunctionSet.java | 33 ++++++++++++++-
fe/src/main/java/org/apache/doris/catalog/PrimitiveType.java | 22 +++++++++-
fe/src/main/java/org/apache/doris/catalog/ScalarFunction.java | 1 +
fe/src/main/java/org/apache/doris/catalog/ScalarType.java | 18 ++++++++
fe/src/main/java/org/apache/doris/catalog/Type.java | 38 +++++++++++++++--
fe/src/main/java/org/apache/doris/common/util/Util.java | 1 +
fe/src/main/java/org/apache/doris/planner/SingleNodePlanner.java | 14 ++++++-
fe/src/main/jflex/sql_scanner.flex | 4 +-
gensrc/thrift/PlanNodes.thrift | 3 +-
gensrc/thrift/Types.thrift | 4 +-
70 files changed, 704 insertions(+), 147 deletions(-)
2 In order to add a new complex AggregateType, we also need to add a new data type.
3 Currently, we don't support variable-length Aggregate value very well.
4 Currently, the memory management isn't simple for complex aggregate value.
5 The write and read logic for Aggregate value is dispersive.
6 For a Aggregate value, the logic in aggregate_func.h and aggregate_functions.cpp should be unified.
7 we don't have a unified, clear interface for all Aggregate value.
After simply discussed with @imay. we decided to reactor the Aggregate value framework firstly to make we could add a complex aggregate value easily.
As for the reactor goal, reactor way and reactor task, I will write a doc later.