[SPARK-53737][SQL][SS] Add Real-time Mode trigger#52473
[SPARK-53737][SQL][SS] Add Real-time Mode trigger#52473jerrypeng wants to merge 2 commits intoapache:masterfrom
Conversation
6826c97 to
98b7202
Compare
6c78c39 to
f75d7a5
Compare
f75d7a5 to
fe2d73f
Compare
viirya
left a comment
There was a problem hiding this comment.
Looks good to me. cc @HeartSaVioR
|
Merged to master. Thanks @jerrypeng |
|
@viirya thank you for merging the PR in! |
|
Thank you so much for If you have more PRs in your plan, shall we have an umbrella JIRA issue to advertise your contribution actively for Apache Spark 4.1.0? Currently, I saw two PRs including this and put them under SPARK-51166 temporarily to give a better visibility, but independent umbrella JIRA issue will be perfect for you. |
|
@jerrypeng I remember you said you created https://issues.apache.org/jira/browse/SPARK-53736 as an umbrella JIRA for real-time mode PRs. Could you put the SPARK-53737 and SPARK-53784 under it? |
|
Thank you @dongjoon-hyun |
|
Oh, it exists. Thank you, @viirya . I'll move them to SPARK-53736 . |
### What changes were proposed in this pull request? Introduce a new trigger type for Real-time Mode (RTM) in Structured Streaming. This new trigger will be how users enable their Structured Streaming query to run in Real-time Mode. Please note this PR just adds the trigger. Users cannot yet run queries in Real-time Mode. Other functionality will come in later PRs. ### Why are the changes needed? This serves as the first PR to add Real-time mode to Structured Streaming. ### Does this PR introduce _any_ user-facing change? Yes, it adds a new trigger type to Structured Streaming. This change does not effect or change any existing behaviors. ### How was this patch tested? Unit test added. ### Was this patch authored or co-authored using generative AI tooling? No Closes apache#52473 from jerrypeng/SPARK-53737. Authored-by: Jerry Peng <jerry.peng@databricks.com> Signed-off-by: Liang-Chi Hsieh <viirya@gmail.com>
* Move GlutenStreamingQuerySuite to correct package * Add Spark 4.1 new test suites for Gluten * Enable new and existing Gluten test suites for Spark 4.1 UT * Update workflow trigger paths to exclude Spark 4.0 and 4.1 shims directories for clickhouse backend * Add support for Spark 4.1 in build script * Merge Spark 4.1.0 sql-tests into Gluten Spark 4.1 (three-way merge) Three-way merge performed using Git: - Base: Spark 4.0.1 (29434ea766b) - Left: Spark 4.1.0 (e221b56be7b) - Right: Gluten Spark 4.1 backends-velox Summary: - Auto-merged: 165 files - New tests added: 31 files (collations, edge cases, recursion, spatial, etc.) - Modified tests: 134 files - Deleted tests: 2 files (collations.sql -> split into 4 files, timestamp-ntz.sql) Conflicts resolved: - inputs/timestamp-ntz.sql: Right deleted + Left modified -> DELETED (per resolution rule) New test suites from Spark 4.1.0: - Collations (4 files): aliases, basic, padding-trim, string-functions - Edge cases (6 files): alias-resolution, extract-value, join-resolution, etc. - Advanced features: cte-recursion, generators, kllquantiles, thetasketch, time - Name resolution: order-by-alias, session-variable-precedence, runtime-replaceable - Spatial functions: st-functions (ANSI and non-ANSI variants) - Various resolution edge cases Total files after merge: 671 (up from 613) * Enable additional Spark 4.1 SQL tests by resolving TODOs * Add new Spark 4.1 test files to VeloxSQLQueryTestSettings * [Fix] Replace `RuntimeReplaceable` with its `replacement` to fix UT. see apache/spark#50287 * [4.1.0] Exclude "infer shredding with mixed scale" see apache/spark#52406 * [Fix] Implement Kryo serialization for CachedColumnarBatch see apache/spark#50599 * [4.1.0] Exclude GlutenMapStatusEndToEndSuite and configure parallelism see apache/spark#50230 * [4.1.0] Exclude Spark Structure Steaming tests in Gluten see - apache/spark#52473 - apache/spark#52870 - apache/spark#52891 * [4.1.0] Exclude failing SQL tests on Spark 4.1 * Replace SparkException.require with standard require in ColumnarCachedBatchSerializer to work across different spark versions * [Fix] Replace `RuntimeReplaceable` with its `replacement` to fix UT. see apache/spark#50287 * Exclude Spark 4.0 and 4.1 paths in clickhouse_be_trigger using `!` prefix * [Fix] Update GlutenShowNamespacesParserSuite to use GlutenSQLTestsBaseTrait
What changes were proposed in this pull request?
Introduce a new trigger type for Real-time Mode (RTM) in Structured Streaming. This new trigger will be how users enable their Structured Streaming query to run in Real-time Mode.
Please note this PR just adds the trigger. Users cannot yet run queries in Real-time Mode. Other functionality will come in later PRs.
Why are the changes needed?
This serves as the first PR to add Real-time mode to Structured Streaming.
Does this PR introduce any user-facing change?
Yes, it adds a new trigger type to Structured Streaming. This change does not effect or change any existing behaviors.
How was this patch tested?
Unit test added.
Was this patch authored or co-authored using generative AI tooling?
No