Conversation
|
Failing checks are mostly due to java license issues or unrelated flaky tests, none seem to be real issues |
|
Checks are failing. Will not request review until checks are succeeding. If you'd like to override that behavior, comment |
|
assign set of reviewers |
|
Assigning reviewers: R: @liferoad for label python. Note: If you would like to opt out of this review, comment Available commands:
The PR bot will only process comments in the main thread (not review comments). |
|
ML tests are red because of #36377 |
| from apache_beam.transforms.util import GroupByEncryptedKey | ||
| from apache_beam.transforms.util import Secret | ||
|
|
||
| secret = Secret.parse_secret_option(replace_with_gbek_secret) |
There was a problem hiding this comment.
shall we somehow retry get_secret_bytes?
There was a problem hiding this comment.
This gets called during setup, so it should get automatic retries from the runner -
| pcoll | ||
| | beam.ParDo(_EncryptMessage(self._hmac_key, key_coder, value_coder)) | ||
| | beam.GroupByKey() | ||
| | gbk |
There was a problem hiding this comment.
I guess this breaks the update compatibility given the line number changes.
There was a problem hiding this comment.
That's true - that's why I think we should do #36251 (comment) to permanently avoid this problem.
I can follow up with a PR
liferoad
left a comment
There was a problem hiding this comment.
Please fix the failed workflows.
They're red from #36377 - I'll merge in master once that's in to confirm everything is green |
…mccorm/enforce_gbek
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces a --gbek pipeline option to enforce encryption for GroupByKey operations, which is a great step towards enhancing data security at rest. The implementation correctly replaces GroupByKey with GroupByEncryptedKey when the option is present and prevents recursive replacement.
My review includes a couple of suggestions. First, a minor correction to the help text of the new --gbek option to avoid confusion. More importantly, I've proposed a refactoring of the parse_secret_option method in util.py to improve its robustness against malformed inputs and to make it more extensible for future secret types. The current parsing logic has a bug when parameter values contain colons.
The tests are comprehensive and cover the new functionality well.
Add the ability for a python pipeline to specify a
--gbekoption which will cause all data passing through a GBK to get encrypted. This doesn't handle x-lang GBKs yet. Next steps include supporting this in Java, making sure we respect x-lang pipeline options, and making sure that x-lang GBKs in languages other than Python/Java are validated away.Part of #36214
Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
addresses #123), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, commentfixes #<ISSUE NUMBER>instead.CHANGES.mdwith noteworthy changes.See the Contributor Guide for more tips on how to make review process smoother.
To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md
GitHub Actions Tests Status (on master branch)
See CI.md for more information about GitHub Actions CI or the workflows README to see a list of phrases to trigger workflows.