Skip to content

[GLUTEN-11088][VL] Fall back CSV reader#11190

Merged
philo-he merged 3 commits intoapache:mainfrom
jinchengchenghh:csv
Jan 19, 2026
Merged

[GLUTEN-11088][VL] Fall back CSV reader#11190
philo-he merged 3 commits intoapache:mainfrom
jinchengchenghh:csv

Conversation

@jinchengchenghh
Copy link
Copy Markdown
Contributor

@jinchengchenghh jinchengchenghh commented Nov 25, 2025

Related issue: #11088

@github-actions github-actions bot added the CORE works for Gluten Core label Nov 25, 2025
@github-actions
Copy link
Copy Markdown

Run Gluten ClickHouse CI on ARM

@github-actions
Copy link
Copy Markdown

Run Gluten ClickHouse CI on ARM

@zhouyuan
Copy link
Copy Markdown
Member

Run Gluten ClickHouse CI on x86

@zhouyuan zhouyuan changed the title [GLUTEN-11088][VL] Enable CSV suite [GLUTEN-11088][VL] Enable CSV suite in Spark-4.0 Nov 26, 2025
@github-actions
Copy link
Copy Markdown

Run Gluten ClickHouse CI on ARM

@jinchengchenghh
Copy link
Copy Markdown
Contributor Author

jinchengchenghh commented Nov 26, 2025

Passed the tests one time, but after rerun CSV failed by /arrow/java/dataset/src/main/cpp/jni_util.cc:79: Failed to update reservation while freeing bytes: Java Exception: java.lang.NullPointerException

@jinchengchenghh
Copy link
Copy Markdown
Contributor Author

@github-actions
Copy link
Copy Markdown

Run Gluten ClickHouse CI on ARM

@jinchengchenghh
Copy link
Copy Markdown
Contributor Author

Trigger the flaky test

2025-11-26T21:18:55.4968782Z #
2025-11-26T21:18:55.4981909Z # A fatal error has been detected by the Java Runtime Environment:
2025-11-26T21:18:55.4983467Z #
2025-11-26T21:18:55.4983914Z #  SIGSEGV (0xb) at pc=0x00007f5170655120, pid=82017, tid=107720
2025-11-26T21:18:55.4984658Z #
2025-11-26T21:18:55.4986579Z # JRE version: OpenJDK Runtime Environment 21.9 (17.0.1+12) (build 17.0.1+12-LTS)
2025-11-26T21:18:55.4996055Z # Java VM: OpenJDK 64-Bit Server VM 21.9 (17.0.1+12-LTS, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64)
2025-11-26T21:18:55.4997760Z # Problematic frame:
2025-11-26T21:18:55.4998099Z # C  0x00007f5170655120
2025-11-26T21:18:55.4998416Z #
2025-11-26T21:18:55.5003611Z # Core dump will be written. Default location: Core dumps may be processed with "/lib/systemd/systemd-coredump %P %u %g %s %t 9223372036854775808 %h %d" (or dumping to /__w/incubator-gluten/incubator-gluten/gluten-ut/spark40/core.82017)
2025-11-26T21:18:55.5005320Z #
2025-11-26T21:18:55.5005948Z # An error report file with more information is saved as:
2025-11-26T21:18:55.5006859Z # /__w/incubator-gluten/incubator-gluten/gluten-ut/spark40/hs_err_pid82017.log

Copy link
Copy Markdown
Member

@zhztheplayer zhztheplayer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems the CI is still failing

@jinchengchenghh jinchengchenghh marked this pull request as draft November 28, 2025 13:10
@jinchengchenghh
Copy link
Copy Markdown
Contributor Author

Maybe we need to compile the arrow, flaky test may cause by platform difference

@github-actions github-actions bot added the VELOX label Dec 23, 2025
@jinchengchenghh jinchengchenghh marked this pull request as ready for review December 23, 2025 08:22
@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

@jinchengchenghh jinchengchenghh changed the title [GLUTEN-11088][VL] Enable CSV suite in Spark-4.0 [GLUTEN-11088][VL] Fallback CSV reader Dec 23, 2025
@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

@github-actions
Copy link
Copy Markdown

github-actions bot commented Jan 5, 2026

Run Gluten Clickhouse CI on x86

@github-actions
Copy link
Copy Markdown

github-actions bot commented Jan 6, 2026

Run Gluten Clickhouse CI on x86

BloomFilterMightContainJointRewriteRule.apply(
c.session,
c.caller.isBloomFilterStatFunction()))
injector.injectPreTransform(c => ArrowScanReplaceRule.apply(c.session))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to confirm: is CSV format no longer supported, or do we only need a fallback for Spark 40 and later versions?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CSV format is no longer supported

@github-actions
Copy link
Copy Markdown

github-actions bot commented Jan 7, 2026

Run Gluten Clickhouse CI on x86

@jinchengchenghh
Copy link
Copy Markdown
Contributor Author

Could you help approve? Thanks! @zhztheplayer

Copy link
Copy Markdown
Member

@philo-he philo-he left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jinchengchenghh, if Arrow CSV reader is not required, can we directly use the official Apache Arrow Jar to replace the Jar locally built by developers? cc @zhouyuan

@jinchengchenghh
Copy link
Copy Markdown
Contributor Author

jinchengchenghh commented Jan 19, 2026

I remember there is several patches applied to arrow 15, not only csv reader related change, for arrow 18(Spark4.0), we use the official release @philo-he

@philo-he
Copy link
Copy Markdown
Member

I remember there is several patches applied to arrow 15, not only csv reader related change, for arrow 15, we use the official release @philo-he

@jinchengchenghh, do we need to remove those CSV-reader-specific patches under ep/build-velox/src/? At some time point, we may directly use official Arrow JAR if some higher Arrow version used by Gluten includes the remaining patches or the remaining patches are only related to Arrow C++, not Java.

@jinchengchenghh
Copy link
Copy Markdown
Contributor Author

This patch only fallbacks the csv reader, we does not remove all the csv related code from java code, when we decide to remove it, we will also remove the patch, I'm not sure if some customer may be interested on it.

@philo-he philo-he changed the title [GLUTEN-11088][VL] Fallback CSV reader [GLUTEN-11088][VL] Fall back CSV reader Jan 19, 2026
Copy link
Copy Markdown
Member

@philo-he philo-he left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the clarification.

@philo-he philo-he merged commit e121903 into apache:main Jan 19, 2026
163 of 168 checks passed
@baibaichen
Copy link
Copy Markdown
Contributor

@jinchengchenghh would you please also fallback csv for spark 4.1?

@jinchengchenghh
Copy link
Copy Markdown
Contributor Author

Yes, csv fall back for all the Spark version in this PR @baibaichen

@baibaichen
Copy link
Copy Markdown
Contributor

Yes, csv fall back for all the Spark version in this PR @baibaichen

Oh, right. We also need to re-enable the CSV-related suites in Spark 4.1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CORE works for Gluten Core VELOX

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants