Skip to content

Conversation

@kangkaisen
Copy link
Contributor

@kangkaisen kangkaisen commented Aug 8, 2019

for #1486 and #1485

The change for this PR:

1 Add a bitmap_union agg type

2 Add a bitmap udaf

3 Add a bitmap_inin function, convert a int value to bitmap

4 Remove aligned_free method in port.h, because which is conflict with RoaringBitmap portability.h.

5 Remove merge method from aggregate_func, becuase all agg input is aggregated intermediate data. So the merge method and update method we only need one.

6 Remove unused Field constructor method

7 Use the Field polymorphism eliminate if-else.

8 why change the AggregateInfo::init api?

because key column and none agg value column no update semantic,so we need fill the dst with src by init method

How to use:

1 creata table with bitmap_union:

CREATE TABLE `bitmap_test` (
  `id` int(11) NULL COMMENT "",
  `id2` varchar(100) bitmap_count NULL
) ENGINE=OLAP
AGGREGATE KEY(`id`)
DISTRIBUTED BY HASH(`id`) BUCKETS 10;

2 load the data by stream load:

seq 10 13 | awk '{OFS="\t"}{print $1, $1 * 10}' | curl --location-trusted -u xxx:xxx  -T - http://hostname:8410/api/test/bitmap_test/_stream_load

3 query:

select bitmap_union(id2) from bitmap_test;

select id, bitmap_union(id2) from bitmap_test group by id ;

Todo:

  • add document for bitmap_union

  • add test for bitmap_union

  • rewrite count distinct to bitmap_union for bitmap_union agg type column

  • bitmap_union support insert into select

  • bitmap_union support broker load

  • avoid serialize and deserialize from Scannode to Aggnode

  • add a new type for hll_union and bitmap_union

@kangkaisen
Copy link
Contributor Author

This update contains the following change:

1 Rebase the code
2 Add UT for bitmap function
3 Add doc about bitmap
4 Rewrite count distinct to bitmap_union for bitmap_union agg type
5 Add a version for bitmap serialize result

@kangkaisen
Copy link
Contributor Author

This update contains the following change:

1 Release row_cursor arena memory immediately
2 Move bitmap functions from AggregateFunctions to BitmapFunctions
3 Add BITMAP_COUNT function, rename bitmap_init function to TO_BITMAP, rename bitmap function to BITMAP_UNION_INT
4 Update the doc

@kangkaisen
Copy link
Contributor Author

After tested with real data, I found the bitmap load is very slower than sum. For about one million rows, stream load with sum only need 3 seconds, but stream load with bitmap need 60 seconds.

So I did a special improve for empty bitmap and single int bitmap, After the improvement, stream load with bitmap only need 4 seconds.

@kangkaisen
Copy link
Contributor Author

Update for @imay comment.

@kangkaisen
Copy link
Contributor Author

Rebase the code and Fix UT.

Copy link
Contributor

@imay imay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@JackyYangPassion
Copy link

when use the case, there is some error, version 0.11
ERROR 1064 (HY000): Syntax error at: CREATE TABLE bitmap_test( idint(11) NULL COMMENT "ID", id2 varchar(100) bitmap_count NULL ) ENGINE=OLAP AGGREGATE KEY(id) DISTRIBUTED BY HASH(id) BUCKETS 10 ^ Encountered: IDENTIFIER Expected: COMMA

@kangkaisen
Copy link
Contributor Author

when use the case, there is some error, version 0.11
ERROR 1064 (HY000): Syntax error at: CREATE TABLE bitmap_test( idint(11) NULL COMMENT "ID", id2 varchar(100) bitmap_count NULL ) ENGINE=OLAP AGGREGATE KEY(id) DISTRIBUTED BY HASH(id) BUCKETS 10 ^ Encountered: IDENTIFIER Expected: COMMA

please refer to the bitmap doc:
https://github.com/apache/incubator-doris/blob/master/docs/documentation/cn/sql-reference/sql-functions/aggregate-functions/bitmap.md

@JackyYangPassion
Copy link

JackyYangPassion commented Jan 6, 2020

I use mysql client with the bitmap doc, but there is error
Encountered: IDENTIFIER
create table sql :
CREATE TABLE pv_bitmap(dtint(11) NULL COMMENT "",pagevarchar(10) NULL COMMENT "",user_id bitmap BITMAP_UNION NULL COMMENT "" ) ENGINE=OLAP AGGREGATE KEY(dt, page) COMMENT "OLAP" DISTRIBUTED BY HASH(dt) BUCKETS 2;

if there need some set in client?

Server version: 5.1.0 Doris version DORIS-0.11.14-release

@JackyYangPassion
Copy link

fe log
`Encountered: IDENTIFIER
Expected

at org.apache.doris.qe.StmtExecutor.analyze(StmtExecutor.java:362) ~[palo-fe.jar:?]
at org.apache.doris.qe.StmtExecutor.execute(StmtExecutor.java:213) [palo-fe.jar:?]
at org.apache.doris.qe.ConnectProcessor.handleQuery(ConnectProcessor.java:168) [palo-fe.jar:?]
at org.apache.doris.qe.ConnectProcessor.dispatch(ConnectProcessor.java:259) [palo-fe.jar:?]
at org.apache.doris.qe.ConnectProcessor.processOnce(ConnectProcessor.java:393) [palo-fe.jar:?]
at org.apache.doris.qe.ConnectProcessor.loop(ConnectProcessor.java:403) [palo-fe.jar:?]
at org.apache.doris.qe.ConnectScheduler$LoopHandler.run(ConnectScheduler.java:172) [palo-fe.jar:?]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_211]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_211]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_211]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_211]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_211]

Caused by: org.apache.doris.common.AnalysisException: Syntax error
at org.apache.doris.analysis.SqlParser.unrecovered_syntax_error(SqlParser.java:1459) ~[palo-fe.jar:?]
at java_cup.runtime.lr_parser.parse(lr_parser.java:616) ~[jflex-1.4.3.jar:?]
at org.apache.doris.qe.StmtExecutor.analyze(StmtExecutor.java:350) ~[palo-fe.jar:?]
... 11 more`

@imay
Copy link
Contributor

imay commented Jan 6, 2020

I use mysql client with the bitmap doc, but there is error
Encountered: IDENTIFIER
create table sql :
CREATE TABLE pv_bitmap(dtint(11) NULL COMMENT "",pagevarchar(10) NULL COMMENT "",user_id bitmap BITMAP_UNION NULL COMMENT "" ) ENGINE=OLAP AGGREGATE KEY(dt, page) COMMENT "OLAP" DISTRIBUTED BY HASH(dt) BUCKETS 2;

if there need some set in client?

Server version: 5.1.0 Doris version DORIS-0.11.14-release

you can try

`user_id` varchar(0) BITMAP_UNION NULL COMMENT ""

@JackyYangPassion
Copy link

I use mysql client with the bitmap doc, but there is error
Encountered: IDENTIFIER
create table sql :
CREATE TABLE pv_bitmap(dtint(11) NULL COMMENT "",pagevarchar(10) NULL COMMENT "",user_id bitmap BITMAP_UNION NULL COMMENT "" ) ENGINE=OLAP AGGREGATE KEY(dt, page) COMMENT "OLAP" DISTRIBUTED BY HASH(dt) BUCKETS 2;
if there need some set in client?
Server version: 5.1.0 Doris version DORIS-0.11.14-release

you can try

`user_id` varchar(0) BITMAP_UNION NULL COMMENT ""

when i create table with sql it works
CREATE TABLE bitmap_test(idint(11) NULL COMMENT "",id2 varchar(100) BITMAP_UNION NULL COMMENT "" ) ENGINE=OLAP AGGREGATE KEY(id) COMMENT "OLAP" DISTRIBUTED BY HASH(id) BUCKETS 10

but when insert value there is some error

insert into bitmap_test values (10,bitmap_hash('a100'));

ERROR 1064 (HY000): No matching function with signature: bitmap_hash(varchar(-1)).

@imay
Copy link
Contributor

imay commented Jan 6, 2020

@JackyYangPassion
The version don't include the bitmap_hash function. You can try it with our latest master. Which needs evn-1.2 docker image.

@JackyYangPassion
Copy link

JackyYangPassion commented Jan 6, 2020

@JackyYangPassion
The version don't include the bitmap_hash function. You can try it with our latest master. Which needs evn-1.2 docker image.
3Q!
use master it https://github.com/apache/incubator-doris/blob/master/docs/documentation/cn/sql-reference/sql-functions/aggregate-functions/bitmap.md works

swjtu-zhanglei pushed a commit to swjtu-zhanglei/incubator-doris that referenced this pull request Jul 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants