Skip to content

[BEAM-7806] Revert "[BEAM-7785] synchronize watermark update with bundle processing"#9151

Merged
boyuanzz merged 1 commit intoapache:masterfrom
kennknowles:revert-sync-watermark
Jul 25, 2019
Merged

[BEAM-7806] Revert "[BEAM-7785] synchronize watermark update with bundle processing"#9151
boyuanzz merged 1 commit intoapache:masterfrom
kennknowles:revert-sync-watermark

Conversation

@kennknowles
Copy link
Copy Markdown
Member

This reverts commit 84ef8aa.

After this commit, org.apache.beam.sdk.extensions.sql.meta.provider.pubsub.PubsubJsonIT.testSelectsPayloadContent began failing. Rolling back to restore green/red signal while we deduce whether the problem is with the test or this change.


Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Choose reviewer(s) and mention them in a comment (R: @username).
  • Format the pull request title like [BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replace BEAM-XXX with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

Post-Commit Tests Status (on master branch)

Lang SDK Apex Dataflow Flink Gearpump Samza Spark
Go Build Status --- --- Build Status --- --- Build Status
Java Build Status Build Status Build Status Build Status
Build Status
Build Status
Build Status Build Status Build Status
Build Status
Python Build Status
Build Status
Build Status
Build Status
--- Build Status
Build Status
Build Status --- --- Build Status

Pre-Commit Tests Status (on master branch)

--- Java Python Go Website
Non-portable Build Status Build Status Build Status Build Status
Portable --- Build Status --- ---

See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.

This reverts commit 84ef8aa.

After this commit,
org.apache.beam.sdk.extensions.sql.meta.provider.pubsub.PubsubJsonIT.testSelectsPayloadContent
began failing. Rolling back to restore green/red signal while we deduce whether
the problem is with the test or this change.
@kennknowles kennknowles requested review from akedin and boyuanzz July 25, 2019 03:15
@kennknowles
Copy link
Copy Markdown
Member Author

CC: @je-ik

As according to https://beam.apache.org/contribute/postcommits-policies-details/index.html#rollback_first I am rolling back first. And then we can figure out what needs to be done.

@kennknowles
Copy link
Copy Markdown
Member Author

run sql postcommit

@kennknowles
Copy link
Copy Markdown
Member Author

Noting from the Jira:

@akedin
Copy link
Copy Markdown
Contributor

akedin commented Jul 25, 2019

+1 to rollback first.

Though it's not obvious how this change would directly cause the test to fail. The failing run seems to be unable to get the result signal within a timeout, is there a chance that it's a flake? (I recognize it's not likely given the scans :) The test itself I think is also relatively flaky and complicated and is probably full of race conditions, so depending on what makes it pass/fail this change can be either fixing something that was wrong and allowed the test to accidentally pass, or the test accidentally exercises some edge case that this change breaks, unclear. However, there is also another test that is @Ignored in the same file at the moment because it fails consistently and I don't know why. Maybe this change triggers a similar failure in this test

Copy link
Copy Markdown
Contributor

@akedin akedin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Copy Markdown
Contributor

@boyuanzz boyuanzz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, Ken!

@boyuanzz boyuanzz merged commit 9f933a6 into apache:master Jul 25, 2019
@kennknowles
Copy link
Copy Markdown
Member Author

No, it is not a flake. It went perma-red after the commit for many runs, and then I could reproduce it locally every time.

@kennknowles
Copy link
Copy Markdown
Member Author

But we should definitely consider that it could be the test that is broken. However, I commented on the other PR that the way to get consistency is not through locks but by immutable snapshots. It could even be that it just created a major performance problem that took more than the timeout. TBH I am not sure.

@boyuanzz
Copy link
Copy Markdown
Contributor

Given that post commit back to green now: https://builds.apache.org/job/beam_PostCommit_SQL/, I would think that watermark changes break the test, or at least has side effect on that.

@je-ik
Copy link
Copy Markdown
Contributor

je-ik commented Jul 25, 2019

Yes, the PR accidentally caused deadlock in the test. Synchronizing the watermark update unintentionally limited parallelism of bundle processing to 1. This didn't cause any issues with existing tests or validates runner, but affected the integration test, because it performs blocking IO (and depends on another bundle to be unblocked). I have added test case to direct runner that exhibits this behavior and will fix the original issue without limiting the parallelism.

@kennknowles kennknowles deleted the revert-sync-watermark branch January 4, 2024 20:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants