Skip to content

CometExecIterator uses incorrect partition index #1113

@viirya

Description

@viirya

Describe the bug

Currently we get the partition index from TaskContext in CometExecIterator. It is actually incorrect. For example, in a query like

CartesianProductExec (25 partitions)
  - CometProject
    - CometScan (5 partitions)
  - CometProject
    - CometScan (5 partitions)

There are 25 partitions in the task for the zipped partitions from both CometScan's 5 partitions. So the partition indexes are 0~24. Each CometProject + CometScan has represented by a CometExecIterator. When executing the native plan, these partition indexes are used. However, it only has 5 partitions. In CometExecIterator, the number of partitions is the number of partitions of the corresponding RDD, and the partition index is the partition index of corresponding partition of the RDD that the iterator tries to run on.

Currently this doesn't exposed as test failure because the partition index is not actually used.

But in the feature branch comet-parquet-exec, the native ParquetExec uses given partition index to access partitioned file in file group. The incorrect partition index causes errors like out of index.

Steps to reproduce

No response

Expected behavior

No response

Additional context

No response

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions