[Managed Iceberg] unbounded source#33504
Conversation
…erg_streaming_source
…erg_streaming_source
|
Checks are failing. Will not request review until checks are succeeding. If you'd like to override that behavior, comment |
|
R: @kennknowles Can y'all take a look? I still have to write some tests, but it's at a good spot for a first round of reviews. I ran a bunch of pipelines (w/Legacy DataflowRunner) at different scales and the throughput/scalability looks good. |
|
Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control. If you'd like to restart, comment |
…erg_streaming_source
…erg_streaming_source
kennknowles
left a comment
There was a problem hiding this comment.
Overall, I think all the pieces are in the right place. Just a question about why an SDF is the way it is and a couple code-level comments.
This seems like something you want to test a lot of different ways before it gets into a release. Maybe get another set of eyes like @chamikaramj or @Abacn too. But I'm approving and leaving to your judgment.
sdks/java/io/iceberg/src/main/java/org/apache/beam/sdk/io/iceberg/SnapshotRange.java
Outdated
Show resolved
Hide resolved
sdks/java/io/iceberg/src/main/java/org/apache/beam/sdk/io/iceberg/ReadTask.java
Outdated
Show resolved
Hide resolved
sdks/java/io/iceberg/src/main/java/org/apache/beam/sdk/io/iceberg/ReadTaskDescriptor.java
Outdated
Show resolved
Hide resolved
sdks/java/io/iceberg/src/main/java/org/apache/beam/sdk/io/iceberg/IcebergIO.java
Outdated
Show resolved
Hide resolved
sdks/java/io/iceberg/src/main/java/org/apache/beam/sdk/io/iceberg/ReadFromGroupedTasks.java
Outdated
Show resolved
Hide resolved
kennknowles
left a comment
There was a problem hiding this comment.
Wait actually I forgot I want to have the discussion about the high level toggle between incremental scan source and bounded source.
sdks/java/io/iceberg/src/main/java/org/apache/beam/sdk/io/iceberg/IcebergIO.java
Outdated
Show resolved
Hide resolved
sdks/java/io/iceberg/src/main/java/org/apache/beam/sdk/io/iceberg/ReadFromGroupedTasks.java
Outdated
Show resolved
Hide resolved
…erg_streaming_source
|
@chamikaramj this is ready for another review |
…erg_streaming_source
sdks/java/io/iceberg/src/main/java/org/apache/beam/sdk/io/iceberg/IncrementalScanSource.java
Outdated
Show resolved
Hide resolved
sdks/java/io/iceberg/src/main/java/org/apache/beam/sdk/io/iceberg/IncrementalScanSource.java
Outdated
Show resolved
Hide resolved
sdks/java/io/iceberg/src/main/java/org/apache/beam/sdk/io/iceberg/ReadFromTasks.java
Outdated
Show resolved
Hide resolved
sdks/java/io/iceberg/src/main/java/org/apache/beam/sdk/io/iceberg/ReadFromTasks.java
Outdated
Show resolved
Hide resolved
sdks/java/io/iceberg/src/main/java/org/apache/beam/sdk/io/iceberg/ReadFromTasks.java
Outdated
Show resolved
Hide resolved
sdks/java/io/iceberg/src/main/java/org/apache/beam/sdk/io/iceberg/WatchForSnapshots.java
Outdated
Show resolved
Hide resolved
sdks/java/io/iceberg/src/main/java/org/apache/beam/sdk/io/iceberg/IcebergIO.java
Outdated
Show resolved
Hide resolved
sdks/java/io/iceberg/src/main/java/org/apache/beam/sdk/io/iceberg/IcebergIO.java
Show resolved
Hide resolved
sdks/java/managed/src/main/java/org/apache/beam/sdk/managed/Managed.java
Outdated
Show resolved
Hide resolved
…remove window step; add --strea ming=true validation; add IO links to Managed java doc
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #33504 +/- ##
============================================
+ Coverage 56.28% 56.32% +0.03%
Complexity 3286 3286
============================================
Files 1166 1172 +6
Lines 178704 178936 +232
Branches 3398 3398
============================================
+ Hits 100591 100786 +195
- Misses 74860 74897 +37
Partials 3253 3253
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
…erg_streaming_source
…Getsize; add resilience to watch snapshots transform
|
LGTM. Thanks! |
sdks/java/io/iceberg/src/main/java/org/apache/beam/sdk/io/iceberg/CreateReadTasksDoFn.java
Show resolved
Hide resolved
sdks/java/io/iceberg/src/main/java/org/apache/beam/sdk/io/iceberg/CreateReadTasksDoFn.java
Outdated
Show resolved
Hide resolved
sdks/java/io/iceberg/src/main/java/org/apache/beam/sdk/io/iceberg/ReadFromTasks.java
Outdated
Show resolved
Hide resolved
sdks/java/io/iceberg/src/main/java/org/apache/beam/sdk/io/iceberg/ReadFromTasks.java
Outdated
Show resolved
Hide resolved
| /** Helper class for source operations. */ | ||
| public class ReadUtils { | ||
| // default is 8MB. keep this low to avoid overwhelming memory | ||
| static final int MAX_FILE_BUFFER_SIZE = 1 << 18; // 256KB |
There was a problem hiding this comment.
might want this to be configurble or pipeline option.
There was a problem hiding this comment.
I'd rather we wait until there's a need to expose it
sdks/java/io/iceberg/src/main/java/org/apache/beam/sdk/io/iceberg/TableCache.java
Outdated
Show resolved
Hide resolved
sdks/java/io/iceberg/src/main/java/org/apache/beam/sdk/io/iceberg/TableCache.java
Outdated
Show resolved
Hide resolved
sdks/java/io/iceberg/src/main/java/org/apache/beam/sdk/io/iceberg/WatchForSnapshots.java
Outdated
Show resolved
Hide resolved
sdks/java/io/iceberg/src/main/java/org/apache/beam/sdk/io/iceberg/TableCache.java
Outdated
Show resolved
Hide resolved
sdks/java/io/iceberg/src/main/java/org/apache/beam/sdk/io/iceberg/TableCache.java
Outdated
Show resolved
Hide resolved
sdks/java/io/iceberg/src/main/java/org/apache/beam/sdk/io/iceberg/TableCache.java
Outdated
Show resolved
Hide resolved
sdks/java/io/iceberg/src/main/java/org/apache/beam/sdk/io/iceberg/ReadUtils.java
Show resolved
Hide resolved
1f47712 to
1c0f7d7
Compare
|
Thanks for the review @scwhittle! If all else is good, I'll merge it when tests pass |
…u98/beam into iceberg_streaming_source
Unbounded (streaming) source for Managed Iceberg.
See design doc for high level overview: https://s.apache.org/beam-iceberg-incremental-source
Fixes #33092