Skip to content

[VL] Results mismatch when scan low version orc file #6673

@NEUpanning

Description

@NEUpanning

Backend

VL (Velox)

Bug description

SparkSQL:

SELECT if(user_type <> -1 ,user_id ,null) as a
from table
where partition_date='2024-07-01' order by a desc limit 10; 

Gluten Result:

gluten
NULL
NULL
NULL
NULL
NULL
NULL
NULL
NULL
NULL
NULL

Vanilla Result

vanilla
dp_765265243
dp_71942892
dp_71942892
dp_71942892
dp_71942892
dp_71942892
dp_3779112707
dp_3778736486
dp_3778655687
dp_3778588244

Physical Plan:

== Physical Plan ==
VeloxColumnarToRowExec
+- TakeOrderedAndProjectExecTransformer (limit=10, orderBy=[a#0 DESC NULLS LAST], output=[a#0])
   +- ^(1) ProjectExecTransformer [if (NOT (user_type#6L = -1)) user_id#1 else null AS a#0]
      +- ^(1) NativeFileScan orc table[user_id#1,user_type#6L,partition_date#18] Batched: true, DataFilters: [], Format: ORC, Location: InMemoryFileIndex(1 paths)[viewfs://******, PartitionFilters: [isnotnull(partition_date#18), (partition_date#18 = 2024-07-01)], PushedFilters: [], ReadSchema: struct<user_id:string,user_type:bigint>

Unfortunately, I can't reproduce it with new hive table. I tried to create a new table that contains rows in original table and submit a same SQL to Spark and even the physical plan is same as before. But the result of gluten is same as vanilla spark.

Spark version

None

Spark configurations

No response

System information

v1.2.0 rc1

Relevant logs

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingtriage

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions