Skip to content

Add outputWindowedValue capability to Java SDK#29616

Merged
kennknowles merged 1 commit intoapache:masterfrom
kennknowles:outputWindowedValue
Dec 13, 2023
Merged

Add outputWindowedValue capability to Java SDK#29616
kennknowles merged 1 commit intoapache:masterfrom
kennknowles:outputWindowedValue

Conversation

@kennknowles
Copy link
Copy Markdown
Member

This adds the ability (which we don't expect to use much, and not really at all from users) to output a full windowed value. This allows a proper implementation of Reshuffle in the SDK and could be useful in some other testing scenarios.

Pulling this off #28853 for simpler review while working out final issues there.


Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Mention the appropriate issue in your description (for example: addresses #123), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment fixes #<ISSUE NUMBER> instead.
  • Update CHANGES.md with noteworthy changes.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md

GitHub Actions Tests Status (on master branch)

Build python source distribution and wheels
Python tests
Java tests
Go tests

See CI.md for more information about GitHub Actions CI or the workflows README to see a list of phrases to trigger workflows.

@kennknowles
Copy link
Copy Markdown
Member Author

R: @robertwb

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Dec 5, 2023

Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control

@kennknowles kennknowles force-pushed the outputWindowedValue branch 5 times, most recently from c25575f to 087382e Compare December 5, 2023 18:40
@kennknowles
Copy link
Copy Markdown
Member Author

R: @johnjcasey since you also already took a look earlier

Copy link
Copy Markdown
Contributor

@johnjcasey johnjcasey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just some small comments.

I'm far from expert on the FnApiDoFnRunner, so you may want another set of eyes there

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like some typos in the docstring

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes. Copy/paste/forget of course :-)

Thanks!

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated to this pr, but this coder lookup pattern might be useful to extract to a utility

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, I need to see about this. I am actually the original author of the "infer coder from raw values" logic back in the day. It has been improved and also had the schema stuff added so I don't 100% know how best to generalize.

Also I noticed in building this that window types are not in the registry, because WindowFn already tells the pipeline what coder to use. Probably not a big deal since it never mattered until this PR.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is validity jus that the timestamp is in the window?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBH I just copy/pasted the comment. Filed #29637 and linked it to each of the TODOs.

Copy link
Copy Markdown
Member Author

@kennknowles kennknowles left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the excellent eye for detail.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes. Copy/paste/forget of course :-)

Thanks!

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, I need to see about this. I am actually the original author of the "infer coder from raw values" logic back in the day. It has been improved and also had the schema stuff added so I don't 100% know how best to generalize.

Also I noticed in building this that window types are not in the registry, because WindowFn already tells the pipeline what coder to use. Probably not a big deal since it never mattered until this PR.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBH I just copy/pasted the comment. Filed #29637 and linked it to each of the TODOs.

@kennknowles kennknowles force-pushed the outputWindowedValue branch 3 times, most recently from 39023cd to a75e53a Compare December 8, 2023 13:21
@kennknowles
Copy link
Copy Markdown
Member Author

Rebased to get test fixes but actually if we read configuration from master then it was not needed. Now tests will time out sooner and still upload test results I think. We were being blocked by #28957 and there is no known issue with this PR. But it is a big change and I want to merge on green, despite this causing weeks of delay :-(

@kennknowles kennknowles merged commit ef0ee76 into apache:master Dec 13, 2023
@kennknowles kennknowles deleted the outputWindowedValue branch December 13, 2023 20:56
* <p><i>Note:</i> A splittable {@link DoFn} is not allowed to output from {@link StartBundle}
* or {@link FinishBundle} methods.
*/
public abstract void outputWindowedValue(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This new abstract method introduces a breaking change in Beam 2.54.0 for classes implemented OutputReceiver


void outputWithTimestamp(T output, Instant timestamp);

void outputWindowedValue(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually it is this one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants