[CORE][VL] Support RewriteTransformer Rules and DeltaLake Scan#3646
[CORE][VL] Support RewriteTransformer Rules and DeltaLake Scan#3646yma11 merged 1 commit intoapache:mainfrom
Conversation
|
Thanks for opening a pull request! Could you open an issue for this pull request on Github Issues? https://github.com/oap-project/gluten/issues Then could you also rename commit message and pull request title in the following format? See also: |
|
Run Gluten Clickhouse CI |
|
Moving my comment from old #3376: I think _metadata is not heavily used in 2.2, or not at all, but may be needed to replace the input_file_name UDF. In recent versions like 2.4 it is used for deletion vectors, as it needs the _metadata_row_index. I created this repro: It fails because we try to replace every column, and _metadata fields are not in the mapping:
|
| @@ -0,0 +1,155 @@ | |||
| <?xml version="1.0" encoding="UTF-8"?> | |||
There was a problem hiding this comment.
Do we need to make change to include this project in gluten-<backend_type>-bundle-spark?
There was a problem hiding this comment.
No. I hope these gluten-lakeformat modules can be used in both backends.
|
@felipepessoto I have tested this case, and can work correctly. Please check if there is this patch #2563 in gluten you used. And to support metadata column is tracked by #2618. So let this pr focus on the common cases. |
| TreeNodeTag[String]("io.glutenproject.delta.column.mapping") | ||
|
|
||
| private def notAppliedColumnMappingRule(plan: SparkPlan): Boolean = { | ||
| plan.getTagValue(COLUMN_MAPPING_RULE_TAG).isEmpty |
There was a problem hiding this comment.
What's the logic here? Seems COLUMN_MAPPING_RULE_TAG is not empty at initialization?
There was a problem hiding this comment.
Firstly, this COLUMN_MAPPING_RULE_TAG is used to avoid a delta scan applies this rule multiple times.
At initialization, the original transformer can't be tagged, so COLUMN_MAPPING_RULE_TAG is empty.
|
@YannByron LGTM except need to add documentation for this support, like additional configurations, etc. |
@yma11 thank you for your review. There is no configuration needed. Users just put the additional gluten-delta jar into the class path, then can query delta table in gluten/velox env. |
Yeah. Then just doc what you said in location like here. You may can also make a short introduction about what cases supported. |
…Scan docs for deltalake
80668a3 to
bc36f8b
Compare
|
Run Gluten Clickhouse CI |
|
@yma11 Doc is done. PTAL again. |
|
===== Performance report for TPCH SF2000 with Velox backend, for reference only ====
|
What changes were proposed in this pull request?
RewriteTransformerRulesto extend if needed.(Fixes: #2891)
How was this patch tested?
(Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)
(If this patch involves UI changes, please attach a screenshot; otherwise, remove this)