Encode paneinfo with PaneInfoCoder.#34824
Conversation
a51cee8 to
62b2c44
Compare
|
Checks are failing. Will not request review until checks are succeeding. If you'd like to override that behavior, comment |
|
Assigning reviewers. If you would like to opt out of this review, comment R: @liferoad for label python. Available commands:
The PR bot will only process comments in the main thread (not review comments). |
FastPrimitiveCoder ends up falling back to PickleCoder and much bigger encodings. PaneInfoCoder encodes it using 1 byte. |
I don't get why we need to expose I may be missing some important piece of background in #34348, but regarding can we do the following? return key, window.GlobalWindows.windowed_value(value, timestamp, pane_info)In this case, the coder of this output will be |
That logic specifically avoids the window param for performance reasons #4933. See the comments |
Hmmmm, my understanding of that comment is different. I think the optimization in #4933 is that for global windows we don't need to put WindowParam in the DoFn. Anyway, just want to call out here. def reify_timestamps(element, timestamp=DoFn.TimestampParam):
... |
| self._register_coder_internal(bytes, coders.BytesCoder) | ||
| self._register_coder_internal(bool, coders.BooleanCoder) | ||
| self._register_coder_internal(str, coders.StrUtf8Coder) | ||
| self._register_coder_internal(windowed_value.PaneInfo, coders.PaneInfoCoder) |
There was a problem hiding this comment.
This fix has a large blast radius. It changed registered standard coders which is default behavior globally. Shall we do this or just put a coder implementation in new ReShuffle?
There was a problem hiding this comment.
Are you suggesting to define PaneInfoCoder here
and then useCoderRegistry.register_coder?
I don't have strong opinion, is there a disadvantage to doing it here? It seems like an 'internal' coder use case?
There was a problem hiding this comment.
@kennknowles What do you think about this? We should use PaneInfoCoder but it could break the job update.
There was a problem hiding this comment.
is there a disadvantage to doing it here
This introduces a breaking change such that windowed_value.PaneInfo has changed default coder. This is not limited to ReShuffle but also user pipeline.
Since test results shows PaneInfoCoder is much more efficient than FastPrimitiveCoder for PaneInfo, I am fine with the change, just need to know this is another breaking change introduced in 2.65.0
If we keep this change I recommend to also mention this in https://github.com/apache/beam/blob/master/CHANGES.md#breaking-changes-1
There was a problem hiding this comment.
Yes, we should update https://github.com/apache/beam/blob/master/CHANGES.md#breaking-changes-1 at least.
CC @robertwb (the author of $4933) regarding the WindowParam comment here. |
* Encode paneinfo with PaneInfoCoder. * Fix tests. * Fix import. * Add typecoder test. * Implement eq and hash. --------- Co-authored-by: Claude <cvandermerwe@google.com>
Please add a meaningful description for your change here
Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
addresses #123), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, commentfixes #<ISSUE NUMBER>instead.CHANGES.mdwith noteworthy changes.See the Contributor Guide for more tips on how to make review process smoother.
To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md
GitHub Actions Tests Status (on master branch)
See CI.md for more information about GitHub Actions CI or the workflows README to see a list of phrases to trigger workflows.