[Managed Iceberg] Support writing to partitioned tables#32102
[Managed Iceberg] Support writing to partitioned tables#32102ahmedabu98 merged 15 commits intoapache:masterfrom
Conversation
|
Checks are failing. Will not request review until checks are succeeding. If you'd like to override that behavior, comment |
|
Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control. If you'd like to restart, comment |
|
assign set of reviewers |
|
Assigning reviewers. If you would like to opt out of this review, comment R: @m-trieu for label java. Available commands:
The PR bot will only process comments in the main thread (not review comments). |
sdks/java/io/iceberg/src/main/java/org/apache/beam/sdk/io/iceberg/RecordWriterManager.java
Outdated
Show resolved
Hide resolved
sdks/java/io/iceberg/src/main/java/org/apache/beam/sdk/io/iceberg/RecordWriterManager.java
Outdated
Show resolved
Hide resolved
sdks/java/io/iceberg/src/main/java/org/apache/beam/sdk/io/iceberg/RecordWriterManager.java
Show resolved
Hide resolved
| * <p>After closing, the resulting {@link ManifestFile}s can be retrieved using {@link | ||
| * #getManifestFiles()}. | ||
| */ | ||
| class RecordWriterManager { |
There was a problem hiding this comment.
Based on the documentation, this might be accurately named PartitionedRecordWriter or something. That communicates better than "Manager" which could mean almost anything. And it seems it could be Autocloseable which would enable using it in try-with-resources blocks, no?
There was a problem hiding this comment.
I originally had it as PartitionedRecordWriter but thought it may be misleading since the class is also used for unpartitioned writes
Added Autocloseable implementation
There was a problem hiding this comment.
Fair enough. A reason I don't like the "Manager" name is that I don't know what it is doing in the managing, and it really doesn't communicate that this is a thing that does the writing. We have so many things called "manager" and none of them have anything in common.
sdks/java/io/iceberg/src/main/java/org/apache/beam/sdk/io/iceberg/WriteGroupedRowsToFiles.java
Outdated
Show resolved
Hide resolved
sdks/java/io/iceberg/src/main/java/org/apache/beam/sdk/io/iceberg/RecordWriter.java
Show resolved
Hide resolved
kennknowles
left a comment
There was a problem hiding this comment.
LGTM to get it merged for the release.
* support writing partitioned data * trigger integration tests * partitioned record writer to manage writers for different partitions * partitioned record writer * reject rows when we are saturated with record writers * refactor record writer manager * add tests * add more tests * make record writer manager transient * clean up test path * cleanup * cleanup * address comments * revert readability change * add to changes md
Fixes #31943
Adds support for writing to partitioned Iceberg tables.
A record writer manager is introduced to open and close writers as necessary. An Iceberg data writer instance is configured to write to only one partition, so multiple writers are needed to write to multiple partitions.
The behavior remains unchanged when writing to unpartitioned tables.
Also some small but key changes:
<warehouse>/<table>/<data-file><table>/metadata/