-
Notifications
You must be signed in to change notification settings - Fork 588
[VL] Results mismatch when scan low version orc file #6673
Copy link
Copy link
Closed
Labels
Description
Backend
VL (Velox)
Bug description
SparkSQL:
SELECT if(user_type <> -1 ,user_id ,null) as a
from table
where partition_date='2024-07-01' order by a desc limit 10;
Gluten Result:
gluten
NULL
NULL
NULL
NULL
NULL
NULL
NULL
NULL
NULL
NULL
Vanilla Result
vanilla
dp_765265243
dp_71942892
dp_71942892
dp_71942892
dp_71942892
dp_71942892
dp_3779112707
dp_3778736486
dp_3778655687
dp_3778588244
Physical Plan:
== Physical Plan ==
VeloxColumnarToRowExec
+- TakeOrderedAndProjectExecTransformer (limit=10, orderBy=[a#0 DESC NULLS LAST], output=[a#0])
+- ^(1) ProjectExecTransformer [if (NOT (user_type#6L = -1)) user_id#1 else null AS a#0]
+- ^(1) NativeFileScan orc table[user_id#1,user_type#6L,partition_date#18] Batched: true, DataFilters: [], Format: ORC, Location: InMemoryFileIndex(1 paths)[viewfs://******, PartitionFilters: [isnotnull(partition_date#18), (partition_date#18 = 2024-07-01)], PushedFilters: [], ReadSchema: struct<user_id:string,user_type:bigint>
Unfortunately, I can't reproduce it with new hive table. I tried to create a new table that contains rows in original table and submit a same SQL to Spark and even the physical plan is same as before. But the result of gluten is same as vanilla spark.
Spark version
None
Spark configurations
No response
System information
v1.2.0 rc1
Relevant logs
No response
Reactions are currently unavailable