Skip to content

[BEAM-4181] Add ReadFiles transform to TfRecordIO#9088

Merged
pabloem merged 1 commit intoapache:masterfrom
RyanSkraba:BEAM-4181-tfrecord-readfiles
Jul 25, 2019
Merged

[BEAM-4181] Add ReadFiles transform to TfRecordIO#9088
pabloem merged 1 commit intoapache:masterfrom
RyanSkraba:BEAM-4181-tfrecord-readfiles

Conversation

@RyanSkraba
Copy link
Copy Markdown
Contributor

@RyanSkraba RyanSkraba commented Jul 17, 2019

Minor improvement to add a FileIO-based readFiles() to the TensorFlow IO.

https://issues.apache.org/jira/browse/BEAM-4181


Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Choose reviewer(s) and mention them in a comment (R: @username).
  • Format the pull request title like [BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replace BEAM-XXX with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

Post-Commit Tests Status (on master branch)

Lang SDK Apex Dataflow Flink Gearpump Samza Spark
Go Build Status --- --- Build Status --- --- Build Status
Java Build Status Build Status Build Status Build Status
Build Status
Build Status
Build Status Build Status Build Status
Build Status
Python Build Status
Build Status
--- Build Status
Build Status
Build Status --- --- Build Status

Pre-Commit Tests Status (on master branch)

--- Java Python Go Website
Non-portable Build Status Build Status Build Status Build Status
Portable --- Build Status --- ---

See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.

@RyanSkraba
Copy link
Copy Markdown
Contributor Author

R: @chamikaramj Can you take a look?

@RyanSkraba
Copy link
Copy Markdown
Contributor Author

Run JavaPortabilityApi PreCommit

@iemejia iemejia requested a review from chamikaramj July 17, 2019 20:58
@iemejia iemejia changed the title [BEAM-4181] Add readFiles transform to TfRecordIO. [BEAM-4181] Add ReadFiles transform to TfRecordIO. Jul 18, 2019
@iemejia iemejia changed the title [BEAM-4181] Add ReadFiles transform to TfRecordIO. [BEAM-4181] Add ReadFiles transform to TfRecordIO Jul 18, 2019
@RyanSkraba
Copy link
Copy Markdown
Contributor Author

RyanSkraba commented Jul 23, 2019

Heyo -- just a quick question after thinking about this. Should I refactor the existing Read/read() transform to use this proposed ReadFiles/readFiles() ? I see that some other FileIOs do this.

If yes, how do you feel about the read().withHintMatchesManyFiles() to switch between the original read() and new readFiles() technique? I see TextIO and AvroIO do this, but I'm not sure whether this is "best practices".

It's weird to me to refactor the original read() to require a "hint" to use the new implementation, when the new implement is ... just right there below it! WDYT?

@chamikaramj
Copy link
Copy Markdown
Contributor

R: @pabloem

Sorry about the delay here. Pablo agreed to take over this review so passing to him.

@chamikaramj
Copy link
Copy Markdown
Contributor

Usually, the difference between Read and ReadFiles transforms is that, ReadFiles currently does not support dynamic work rebalancing while Read does (this will change with SDF).

I see that, for TFRecordIO, we do not support dynamic work rebalancing anyways, so refactoring Read transform to use proposed ReadFiles should be fine.

I don't think we need the hint if we use ReadFiles for both implementations.

@pabloem
Copy link
Copy Markdown
Member

pabloem commented Jul 24, 2019

Thanks Cham. It's useful to get your advice here : )

I've looked at the implementation, and it LGTM. @RyanSkraba let me know if you'd like to do any refactoring, or if you'd just like me to merge this.

@RyanSkraba
Copy link
Copy Markdown
Contributor Author

Thanks for the review! Please go ahead and merge it -- before doing any additional internal refactoring (which wouldn't make much difference here), I'd like to take a tour around the existing file-based IOs to identify where they're consistent or different.

@pabloem pabloem merged commit c791104 into apache:master Jul 25, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants