Skip to content

[Bug] Bucket shuffle join executes failed when two tables without any data #5144

@xinghuayu007

Description

@xinghuayu007

Describe the bug
Bucket shuffle join is a algorithm of joining two tables. Left table is distrubuted by a column. Right table sends the data to the left table for joining operation. It reduces the network cost. But when two table is withou any data. Bucket shuffle join will fail.

To Reproduce
Steps to reproduce the behavior:

  1. create two table:

CREATE TABLE dynamic_partition2 (
k1 bigint(20) NULL COMMENT "",
k2 int(11) NULL COMMENT "",
k3 smallint(6) NULL COMMENT ""
) ENGINE=OLAP
DUPLICATE KEY(k1, k2, k3)
COMMENT "OLAP"
PARTITION BY RANGE(k1)
(PARTITION p20201219 VALUES [("20201219"), ("20201220")),
PARTITION p20201220 VALUES [("20201220"), ("20201221")),
PARTITION p20201221 VALUES [("20201221"), ("20201222")),
PARTITION p20201222 VALUES [("20201222"), ("20201223")),
PARTITION p20201223 VALUES [("20201223"), ("20201224")),
PARTITION p20201224 VALUES [("20201224"), ("20201225")),
PARTITION p20201225 VALUES [("20201225"), ("20201226")),
PARTITION p20201226 VALUES [("20201226"), ("20201227")),
PARTITION p20201227 VALUES [("20201227"), ("20201228")),
PARTITION p20201228 VALUES [("20201228"), ("20201229")),
PARTITION p20201229 VALUES [("20201229"), ("20201230")),
PARTITION p20201230 VALUES [("20201230"), ("20201231")),
PARTITION p20201231 VALUES [("20201231"), ("20210101")))
DISTRIBUTED BY HASH(k2) BUCKETS 32
PROPERTIES (
"replication_num" = "1",
"colocate_with" = "group1",
"dynamic_partition.enable" = "true",
"dynamic_partition.time_unit" = "DAY",
"dynamic_partition.time_zone" = "Asia/Shanghai",
"dynamic_partition.start" = "-6",
"dynamic_partition.end" = "6",
"dynamic_partition.prefix" = "p",
"dynamic_partition.replication_num" = "1",
"dynamic_partition.buckets" = "32",
"in_memory" = "false",
"storage_format" = "V2"
);

CREATE TABLE dynamic_partition_bucket2 (
k1 bigint(20) NULL COMMENT "",
k2 int(11) NULL COMMENT "",
k3 smallint(6) NULL COMMENT ""
) ENGINE=OLAP
DUPLICATE KEY(k1, k2, k3)
COMMENT "OLAP"
PARTITION BY RANGE(k1)
(PARTITION p20201219 VALUES [("20201219"), ("20201220")),
PARTITION p20201220 VALUES [("20201220"), ("20201221")),
PARTITION p20201221 VALUES [("20201221"), ("20201222")),
PARTITION p20201222 VALUES [("20201222"), ("20201223")),
PARTITION p20201223 VALUES [("20201223"), ("20201224")),
PARTITION p20201224 VALUES [("20201224"), ("20201225")),
PARTITION p20201225 VALUES [("20201225"), ("20201226")),
PARTITION p20201226 VALUES [("20201226"), ("20201227")),
PARTITION p20201227 VALUES [("20201227"), ("20201228")),
PARTITION p20201228 VALUES [("20201228"), ("20201229")),
PARTITION p20201229 VALUES [("20201229"), ("20201230")),
PARTITION p20201230 VALUES [("20201230"), ("20201231")),
PARTITION p20201231 VALUES [("20201231"), ("20210101")))
DISTRIBUTED BY HASH(k2, k3) BUCKETS 32
PROPERTIES (
"replication_num" = "1",
"colocate_with" = "group3",
"dynamic_partition.enable" = "true",
"dynamic_partition.time_unit" = "DAY",
"dynamic_partition.time_zone" = "Asia/Shanghai",
"dynamic_partition.start" = "-6",
"dynamic_partition.end" = "6",
"dynamic_partition.prefix" = "p",
"dynamic_partition.replication_num" = "1",
"dynamic_partition.buckets" = "32",
"in_memory" = "false",
"storage_format" = "V2"
);

  1. do not insert any data into two tables

  2. enable bucket shuffle join
    set enable_bucket_shuffle_join=true

  3. execute a sql
    select * from dynamic_partition2 a join dynamic_partition_bucket2 b on a.k2 = b.k2 where a.k1=20201218;

  4. see the error
    2020-12-25 10-14-47屏幕截图

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

  • OS: [e.g. iOS]
  • Browser [e.g. chrome, safari]
  • Version [e.g. 22]

Smartphone (please complete the following information):

  • Device: [e.g. iPhone6]
  • OS: [e.g. iOS8.1]
  • Browser [e.g. stock browser, safari]
  • Version [e.g. 22]

Additional context
Add any other context about the problem here.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions