[GLUTEN-7600][VL] Remove EmptySchemaWorkaround by zhztheplayer · Pull Request #7620 · apache/gluten

zhztheplayer · 2024-10-21T07:44:41Z

Part of #7600

Remove rules EmptySchemaWorkaround.PlanOneRowRelation / EmptySchemaWorkaround.FallbackEmptySchemaRelation / OffloadHashAggregate. Inline some workaround logics into utility class ColumnarBatches or into operator validation procedures.

github-actions · 2024-10-21T07:44:58Z

#7600

github-actions · 2024-10-21T07:47:36Z

Run Gluten Clickhouse CI

github-actions · 2024-10-21T07:52:51Z

Run Gluten Clickhouse CI

zhztheplayer · 2024-10-21T07:54:32Z

backends-velox/src/test/scala/org/apache/gluten/execution/MiscOperatorSuite.scala

          checkNullTypeRepartition(
            spark.table("lineitem").selectExpr("null as x", "null as y").repartition(),
-            1
+            0


Query plan changed from

VeloxColumnarToRow +- ShuffleQueryStage 0 +- ColumnarExchange SinglePartition, REBALANCE_PARTITIONS_BY_NONE, [plan_id=979], [shuffle_writer_type=hash], [OUTPUT] ArraySeq(x:NullType, y:NullType), [OUTPUT] ArraySeq(x:NullType, y:NullType) +- VeloxResizeBatches 1024, 2147483647 +- RowToVeloxColumnar +- *(1) Project [null AS x#296, null AS y#297] +- *(1) ColumnarToRow +- FileScan parquet [] Batched: true, DataFilters: [], Format: Parquet, Location: InMemoryFileIndex(1 paths)[file:/opt/gluten/backends-velox/target/scala-2.13/test-classes/tpch-da..., PartitionFilters: [], PushedFilters: [], ReadSchema: struct<>

to

VeloxColumnarToRow +- ShuffleQueryStage 0 +- ColumnarExchange SinglePartition, REBALANCE_PARTITIONS_BY_NONE, [plan_id=957], [shuffle_writer_type=hash], [OUTPUT] ArraySeq(x:NullType, y:NullType), [OUTPUT] ArraySeq(x:NullType, y:NullType) +- VeloxResizeBatches 1024, 2147483647 +- ^(12) ProjectExecTransformer [null AS x#296, null AS y#297] +- ^(12) NativeFileScan parquet [] Batched: true, DataFilters: [], Format: Parquet, Location: InMemoryFileIndex(1 paths)[file:/opt/gluten/backends-velox/target/scala-2.13/test-classes/tpch-da..., PartitionFilters: [], PushedFilters: [], ReadSchema: struct<>

github-actions · 2024-10-21T09:05:24Z

Run Gluten Clickhouse CI

github-actions · 2024-10-22T01:45:49Z

Run Gluten Clickhouse CI

zhztheplayer · 2024-10-22T01:46:28Z

gluten-substrait/src/main/scala/org/apache/gluten/expression/ExpressionMappings.scala

    Sig[CheckOverflow](CHECK_OVERFLOW),
    Sig[MakeDecimal](MAKE_DECIMAL),
    Sig[PromotePrecision](PROMOTE_PRECISION),
-    Sig[MonotonicallyIncreasingID](MONOTONICALLY_INCREASING_ID),


Due to #7628

zhztheplayer · 2024-10-22T02:26:06Z

...-substrait/src/main/scala/org/apache/gluten/execution/BasicPhysicalOperatorTransformer.scala

-    if ((list == null || list.isEmpty) && childCtx != null) {
-      // The computing for this project is not needed.
-      // the child may be an input adapter and childCtx is null. In this case we want to
-      // make a read node with non-empty base_schema.
-      context.registerEmptyRelToOperator(operatorId)
-      return childCtx
-    }


context.registerEmptyRelToOperator(operatorId) looks to be suitable only when the operator simply outputs all the inputs it receives. Which is not the case here for list.isEmpty

github-actions · 2024-10-22T03:45:36Z

Run Gluten Clickhouse CI

github-actions · 2024-10-22T04:59:54Z

Run Gluten Clickhouse CI

zhztheplayer · 2024-10-22T05:44:05Z

backends-velox/src/main/scala/org/apache/gluten/backendsapi/velox/VeloxValidatorApi.scala

+    if (child.output.isEmpty) {
+      // See: https://github.com/apache/incubator-gluten/issues/7600.
+      return Some("Shuffle with empty schema is not supported")
+    }


Empty schema batches should pass through shuffle so that they can be handled in reducer-side operators within the row number information they carried on.

This is a temporary code path to disable shuffle for empty schema input. We should finally remove it to add the support.

cc @marin-ma

Got it. Thanks!

zhztheplayer · 2024-10-22T06:03:46Z

gluten-arrow/src/main/java/org/apache/gluten/columnarbatch/ColumnarBatches.java


  private static BatchType identifyBatchType(ColumnarBatch batch) {
    if (batch.numCols() == 0) {
      // zero-column batch considered as heavy batch


comment should be updated

marin-ma

LGTM. Thanks!

zhztheplayer · 2024-10-22T06:29:39Z

gluten-ut/spark34/src/test/scala/org/apache/gluten/utils/velox/VeloxTestSettings.scala

+    .exclude(
+      "SPARK-16633: lead/lag should return the default value if the offset row does not exist")


github-actions · 2024-10-22T06:29:57Z

Run Gluten Clickhouse CI

fixup

ccaebb3

github-actions bot added the VELOX label Oct 21, 2024

fixup

8ffe799

github-actions bot added the CORE works for Gluten Core label Oct 21, 2024

fixup

fb1ae9d

zhztheplayer commented Oct 21, 2024

View reviewed changes

zhztheplayer mentioned this pull request Oct 21, 2024

[VL] Rework the workaround for empty schema batch #7600

Closed

fixup

36e66b3

zhztheplayer mentioned this pull request Oct 22, 2024

[VL] Result mismatch on monotonically_increasing_id #7628

Open

Fix UTs

ea78c14

github-actions bot added the CLICKHOUSE label Oct 22, 2024

zhztheplayer commented Oct 22, 2024

View reviewed changes

fixup

2a0b5e4

fixup

217018d

zhztheplayer commented Oct 22, 2024

View reviewed changes

zhztheplayer marked this pull request as ready for review October 22, 2024 05:47

zhztheplayer commented Oct 22, 2024

View reviewed changes

marin-ma approved these changes Oct 22, 2024

View reviewed changes

zhztheplayer added 2 commits October 22, 2024 14:11

fixup

5a3ac16

fixup

5264a74

zhztheplayer commented Oct 22, 2024

View reviewed changes

zhztheplayer merged commit 53e8161 into apache:main Oct 22, 2024

zhztheplayer mentioned this pull request Oct 23, 2024

[GLUTEN-7600][VL] Simplify offload rules in RAS #7646

Merged

		.exclude(
		"SPARK-16633: lead/lag should return the default value if the offset row does not exist")

Conversation

zhztheplayer commented Oct 21, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Oct 21, 2024

Uh oh!

github-actions bot commented Oct 21, 2024

Uh oh!

github-actions bot commented Oct 21, 2024

Uh oh!

zhztheplayer Oct 21, 2024

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Oct 21, 2024

Uh oh!

github-actions bot commented Oct 22, 2024

Uh oh!

zhztheplayer Oct 22, 2024

Choose a reason for hiding this comment

Uh oh!

zhztheplayer Oct 22, 2024

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Oct 22, 2024

Uh oh!

github-actions bot commented Oct 22, 2024

Uh oh!

zhztheplayer Oct 22, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

marin-ma Oct 22, 2024

Choose a reason for hiding this comment

Uh oh!

zhztheplayer Oct 22, 2024

Choose a reason for hiding this comment

Uh oh!

marin-ma left a comment

Choose a reason for hiding this comment

Uh oh!

zhztheplayer Oct 22, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Oct 22, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

zhztheplayer commented Oct 21, 2024 •

edited

Loading

zhztheplayer Oct 22, 2024 •

edited

Loading

zhztheplayer Oct 22, 2024 •

edited

Loading