-
Notifications
You must be signed in to change notification settings - Fork 270
Description
Describe the bug
Currently we get the partition index from TaskContext in CometExecIterator. It is actually incorrect. For example, in a query like
CartesianProductExec (25 partitions)
- CometProject
- CometScan (5 partitions)
- CometProject
- CometScan (5 partitions)
There are 25 partitions in the task for the zipped partitions from both CometScan's 5 partitions. So the partition indexes are 0~24. Each CometProject + CometScan has represented by a CometExecIterator. When executing the native plan, these partition indexes are used. However, it only has 5 partitions. In CometExecIterator, the number of partitions is the number of partitions of the corresponding RDD, and the partition index is the partition index of corresponding partition of the RDD that the iterator tries to run on.
Currently this doesn't exposed as test failure because the partition index is not actually used.
But in the feature branch comet-parquet-exec, the native ParquetExec uses given partition index to access partitioned file in file group. The incorrect partition index causes errors like out of index.
Steps to reproduce
No response
Expected behavior
No response
Additional context
No response