[GLUTEN-7600][VL] Remove EmptySchemaWorkaround#7620
[GLUTEN-7600][VL] Remove EmptySchemaWorkaround#7620zhztheplayer merged 9 commits intoapache:mainfrom
Conversation
|
Run Gluten Clickhouse CI |
|
Run Gluten Clickhouse CI |
| checkNullTypeRepartition( | ||
| spark.table("lineitem").selectExpr("null as x", "null as y").repartition(), | ||
| 1 | ||
| 0 |
There was a problem hiding this comment.
Query plan changed from
VeloxColumnarToRow
+- ShuffleQueryStage 0
+- ColumnarExchange SinglePartition, REBALANCE_PARTITIONS_BY_NONE, [plan_id=979], [shuffle_writer_type=hash], [OUTPUT] ArraySeq(x:NullType, y:NullType), [OUTPUT] ArraySeq(x:NullType, y:NullType)
+- VeloxResizeBatches 1024, 2147483647
+- RowToVeloxColumnar
+- *(1) Project [null AS x#296, null AS y#297]
+- *(1) ColumnarToRow
+- FileScan parquet [] Batched: true, DataFilters: [], Format: Parquet, Location: InMemoryFileIndex(1 paths)[file:/opt/gluten/backends-velox/target/scala-2.13/test-classes/tpch-da..., PartitionFilters: [], PushedFilters: [], ReadSchema: struct<>
to
VeloxColumnarToRow
+- ShuffleQueryStage 0
+- ColumnarExchange SinglePartition, REBALANCE_PARTITIONS_BY_NONE, [plan_id=957], [shuffle_writer_type=hash], [OUTPUT] ArraySeq(x:NullType, y:NullType), [OUTPUT] ArraySeq(x:NullType, y:NullType)
+- VeloxResizeBatches 1024, 2147483647
+- ^(12) ProjectExecTransformer [null AS x#296, null AS y#297]
+- ^(12) NativeFileScan parquet [] Batched: true, DataFilters: [], Format: Parquet, Location: InMemoryFileIndex(1 paths)[file:/opt/gluten/backends-velox/target/scala-2.13/test-classes/tpch-da..., PartitionFilters: [], PushedFilters: [], ReadSchema: struct<>
|
Run Gluten Clickhouse CI |
|
Run Gluten Clickhouse CI |
| Sig[CheckOverflow](CHECK_OVERFLOW), | ||
| Sig[MakeDecimal](MAKE_DECIMAL), | ||
| Sig[PromotePrecision](PROMOTE_PRECISION), | ||
| Sig[MonotonicallyIncreasingID](MONOTONICALLY_INCREASING_ID), |
| if ((list == null || list.isEmpty) && childCtx != null) { | ||
| // The computing for this project is not needed. | ||
| // the child may be an input adapter and childCtx is null. In this case we want to | ||
| // make a read node with non-empty base_schema. | ||
| context.registerEmptyRelToOperator(operatorId) | ||
| return childCtx | ||
| } |
There was a problem hiding this comment.
context.registerEmptyRelToOperator(operatorId) looks to be suitable only when the operator simply outputs all the inputs it receives. Which is not the case here for list.isEmpty
|
Run Gluten Clickhouse CI |
|
Run Gluten Clickhouse CI |
| if (child.output.isEmpty) { | ||
| // See: https://github.com/apache/incubator-gluten/issues/7600. | ||
| return Some("Shuffle with empty schema is not supported") | ||
| } |
There was a problem hiding this comment.
Empty schema batches should pass through shuffle so that they can be handled in reducer-side operators within the row number information they carried on.
This is a temporary code path to disable shuffle for empty schema input. We should finally remove it to add the support.
cc @marin-ma
|
|
||
| private static BatchType identifyBatchType(ColumnarBatch batch) { | ||
| if (batch.numCols() == 0) { | ||
| // zero-column batch considered as heavy batch |
There was a problem hiding this comment.
comment should be updated
| .exclude( | ||
| "SPARK-16633: lead/lag should return the default value if the offset row does not exist") |
|
Run Gluten Clickhouse CI |
Part of #7600
Remove rules
EmptySchemaWorkaround.PlanOneRowRelation/EmptySchemaWorkaround.FallbackEmptySchemaRelation/OffloadHashAggregate. Inline some workaround logics into utility classColumnarBatchesor into operator validation procedures.