-
Notifications
You must be signed in to change notification settings - Fork 588
[VL] Result mismatch when BHJ with left semi join #8787
Copy link
Copy link
Closed
Labels
Description
Backend
VL (Velox)
Bug description
We are running into an issue where the probe table only contains a subset of rows that meet the build table, but the result of gluten BHJ is equal to the number of output rows of probe table. The join condition does not filter any rows. Through the velox plan we can see the join condition filter is pushed down to table scan and it also does not filter any rows.
vanilla plan:
gluten plan:
velox plan:
-- HashJoin[3][LEFT SEMI (FILTER) n0_14=n2_0] -> n0_0:BIGINT, n0_1:VARCHAR, n0_2:BIGINT, n0_3:BIGINT, n0_4:VARCHAR, n0_5:INTEGER, n0_6:INTEGER, n0_7:BIGINT, n0_8:VARCHAR, n0_9:BIGINT, n0_10:BIGINT, n0_11:BIGINT, n0_12:BIGINT, n0_13:INTEGER, n0_14:BIGINT, n0_15:BIGINT, n0_16:INTEGER, n0_17:VARCHAR, n0_18:VARCHAR, n0_19:VARCHAR, n0_20:VARCHAR, n0_21:VARCHAR, n0_22:VARCHAR
Output: 135 rows (295.33KB, 1 batches), Cpu time: 129.13us, Wall time: 139.36us, Blocked wall time: 0ns, Peak memory: 68.00KB, Memory allocations: 2, CPU breakdown: B/I/O/F (85.06us/612ns/39.82us/3.63us)
HashBuild: Input: 2 rows (32B, 1 batches), Output: 0 rows (0B, 0 batches), Cpu time: 13.00us, Wall time: 15.54us, Blocked wall time: 0ns, Peak memory: 68.00KB, Memory allocations: 2, Threads: 1, CPU breakdown: B/I/O/F (10.05us/0ns/1.50us/1.45us)
distinctKey0 sum: 3, count: 1, min: 3, max: 3
hashtable.buildWallNanos sum: 84.66us, count: 1, min: 84.66us, max: 84.66us
hashtable.capacity sum: 3, count: 1, min: 3, max: 3
hashtable.numDistinct sum: 2, count: 1, min: 2, max: 2
hashtable.numRehashes sum: 1, count: 1, min: 1, max: 1
queuedWallNanos sum: 0ns, count: 1, min: 0ns, max: 0ns
rangeKey0 sum: 3, count: 1, min: 3, max: 3
runningAddInputWallNanos sum: 0ns, count: 1, min: 0ns, max: 0ns
runningFinishWallNanos sum: 2.00us, count: 1, min: 2.00us, max: 2.00us
runningGetOutputWallNanos sum: 2.08us, count: 1, min: 2.08us, max: 2.08us
HashProbe: Input: 135 rows (295.33KB, 1 batches), Output: 135 rows (295.33KB, 1 batches), Cpu time: 116.13us, Wall time: 123.82us, Blocked wall time: 0ns, Peak memory: 0B, Memory allocations: 0, Threads: 1, CPU breakdown: B/I/O/F (75.01us/612ns/38.32us/2.19us)
dynamicFiltersProduced sum: 1, count: 1, min: 1, max: 1
queuedWallNanos sum: 1.00us, count: 1, min: 1.00us, max: 1.00us
replacedWithDynamicFilterRows sum: 135, count: 1, min: 135, max: 135
runningAddInputWallNanos sum: 885ns, count: 1, min: 885ns, max: 885ns
runningFinishWallNanos sum: 3.65us, count: 1, min: 3.65us, max: 3.65us
runningGetOutputWallNanos sum: 39.83us, count: 1, min: 39.83us, max: 39.83us
Spark version
Spark-3.5.x
Spark configurations
No response
System information
No response
Relevant logs
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
Type
Projects
Status
No status