Skip to content

Fixes breakages of the upgrade feature#29731

Merged
chamikaramj merged 5 commits intoapache:masterfrom
chamikaramj:transform_upgrade_test_bq_fix
Dec 14, 2023
Merged

Fixes breakages of the upgrade feature#29731
chamikaramj merged 5 commits intoapache:masterfrom
chamikaramj:transform_upgrade_test_bq_fix

Conversation

@chamikaramj
Copy link
Copy Markdown
Contributor

@chamikaramj chamikaramj commented Dec 12, 2023

Seems like this feature broke due to.

  • Object serialization issues (specifically BQ coder and BigQueryServicesImpl).
  • A new field addition to KafkaIO.

Makes the implementation more robust and adds the Kafka upgrade module to IO precommit.

This fixes #29730


Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Mention the appropriate issue in your description (for example: addresses #123), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment fixes #<ISSUE NUMBER> instead.
  • Update CHANGES.md with noteworthy changes.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md

GitHub Actions Tests Status (on master branch)

Build python source distribution and wheels
Python tests
Java tests
Go tests

See CI.md for more information about GitHub Actions CI or the workflows README to see a list of phrases to trigger workflows.

@chamikaramj
Copy link
Copy Markdown
Contributor Author

R: @johnjcasey

@github-actions
Copy link
Copy Markdown
Contributor

Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it doesn't look like you actually set the default coder here

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default coder is set by the source transform if I don't set anything here.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would one know which fields could fail to deserialize?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mentioned the field in the log (and should also be in the exception stacktrace). I only handle specific cases here. If deserialization fails for any other cases, it would be a hard fail during job submission.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is sufficient for correct behavior, why do we deserialize the service bytes at all?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately this object is set by default for BQ source/sink transforms and BQ transforms cannot be constructed without it. See below.

So I have to set it to re-build source/sink transform object here.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the transform behavior changes whether or not an error handler is passed at all, so this could break if deserialization fails.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think ErrorHandler has to be instantiated with it's own schema but will do this in a separate PR.
For now, updating the code so that we do a hard fail if the user set this property.

Copy link
Copy Markdown
Contributor Author

@chamikaramj chamikaramj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. PTAL.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mentioned the field in the log (and should also be in the exception stacktrace). I only handle specific cases here. If deserialization fails for any other cases, it would be a hard fail during job submission.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately this object is set by default for BQ source/sink transforms and BQ transforms cannot be constructed without it. See below.

So I have to set it to re-build source/sink transform object here.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default coder is set by the source transform if I don't set anything here.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think ErrorHandler has to be instantiated with it's own schema but will do this in a separate PR.
For now, updating the code so that we do a hard fail if the user set this property.

@johnjcasey
Copy link
Copy Markdown
Contributor

LGTM for the cherry pick. We should brainstorm how to make error handling more robust going forward

@chamikaramj
Copy link
Copy Markdown
Contributor Author

Thanks.

@chamikaramj chamikaramj force-pushed the transform_upgrade_test_bq_fix branch from 3b2110d to 8ba9ab4 Compare December 13, 2023 23:43
@chamikaramj
Copy link
Copy Markdown
Contributor Author

I don't think PreComit failures are related to this PR.

@Abacn @jrmccluskey I noticed that there are ongoing efforts to fix PreCommit test suites (for example, #28957, #29671). Should there be a release blocker for fixing the PreCommit to perform the release in a healthy state ?

@github-actions github-actions bot removed the build label Dec 14, 2023
@chamikaramj chamikaramj merged commit 4264c2c into apache:master Dec 14, 2023
chamikaramj added a commit to chamikaramj/beam that referenced this pull request Dec 14, 2023
* Fixes breakages of the upgrade feature

* Fix spotless

* Addressing reviewer comments

* Removing unused import

* Reverting the PreCommit update
@Abacn
Copy link
Copy Markdown
Contributor

Abacn commented Dec 14, 2023

#29744 should reduce flakiness of Java PreCommit on master

jrmccluskey pushed a commit that referenced this pull request Dec 15, 2023
* Fixes breakages of the upgrade feature

* Fix spotless

* Addressing reviewer comments

* Removing unused import

* Reverting the PreCommit update
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Transform upgrade feature is broken due to a recent field addition and object serialization issues

3 participants