Fix python examples tests not running in Dataflow#23546
Fix python examples tests not running in Dataflow#23546TheNeuralBit merged 8 commits intoapache:masterfrom
Conversation
|
Run Python Examples_Direct |
|
Run Python Examples_Dataflow |
|
@tvalentyn could you help me to approve running workflows to test my fixes? |
|
Run Python Examples_Flink |
|
Run Python Examples_Spark |
Codecov Report
@@ Coverage Diff @@
## master #23546 +/- ##
==========================================
- Coverage 73.46% 73.09% -0.37%
==========================================
Files 718 729 +11
Lines 95884 98231 +2347
==========================================
+ Hits 70438 71799 +1361
- Misses 24135 25121 +986
Partials 1311 1311
Flags with carried forward coverage won't be shown. Click here to find out more.
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
|
Run Python Examples_Spark |
|
@tvalentyn I've fixed some of the examples so those can run in Dataflow, but a couple of them are failing https://ci-beam.apache.org/job/beam_PostCommit_Python_Examples_Dataflow_PR/20/ not sure why those are having differences in the assertions, do you have some insight to fix them easily or do you think it is better to sickbay those for the Dataflow suite and fill a new issue? |
|
Filing issues and sickbaying sounds good |
|
thank you |
|
Run Python Examples_Dataflow |
|
Assigning reviewers. If you would like to opt out of this review, comment R: @TheNeuralBit for label python. Available commands:
The PR bot will only process comments in the main thread (not review comments). |
TheNeuralBit
left a comment
There was a problem hiding this comment.
Thank you! I have a few suggestions
| try: | ||
| from apache_beam.io.gcp import gcsio | ||
| except ImportError: | ||
| gcsio = None |
There was a problem hiding this comment.
Does this actually protect us? It looks like this would just change the error to a more confusing one: None has no attribure GcsIO, when gcsio is used. I think just letting the ImportError raise would be preferable.
Alternatively we could add a skipIf(gcsio is None), but that might lead to us unintentionally skipping it indefinitely.
| logging.info('Creating file: %s', path) | ||
| gcs = gcsio.GcsIO() | ||
| with gcs.open(path, 'w') as f: | ||
| f.write(str.encode(contents, 'utf-8')) |
There was a problem hiding this comment.
nit: I think it would be better if these utilities used the Filesystems API, see here for an example:
beam/sdks/python/apache_beam/examples/dataframe/taxiride_it_test.py
Lines 93 to 100 in 45cc085
It would also be good to extract these out into testing.utils rather than copying them.
There was a problem hiding this comment.
Thanks, @TheNeuralBit, I applied some of your suggestions
|
Run Python Examples_Dataflow |
|
Run Python Examples_Direct |
|
Reminder, please take a look at this pr: @TheNeuralBit |
|
Run Python PreCommit |
1 similar comment
|
Run Python PreCommit |
|
Run Python 3.8 PostCommit |
|
Thanks, @TheNeuralBit! |
* Fix tests for examples not running in Dataflow * Remove unused test * Add todos to enable test for Dataflow * Refactor utilities functions to create and read files * Fix lint errors * Fix lint errors and skip tests that require gcsio and is not available * Refactor read file function and remove gcsio dependency
Resolves #22983
Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
R: @username).addresses #123), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, commentfixes #<ISSUE NUMBER>instead.CHANGES.mdwith noteworthy changes.See the Contributor Guide for more tips on how to make review process smoother.
To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md
GitHub Actions Tests Status (on master branch)
See CI.md for more information about GitHub Actions CI.