Skip to content

Report File Lineage on directory#32662

Merged
Abacn merged 4 commits intoapache:masterfrom
Abacn:dirlineage
Oct 8, 2024
Merged

Report File Lineage on directory#32662
Abacn merged 4 commits intoapache:masterfrom
Abacn:dirlineage

Conversation

@Abacn
Copy link
Copy Markdown
Contributor

@Abacn Abacn commented Oct 4, 2024

supercedes #32642

  • FileBasedSource will report every read-in file if number of file <= 100

  • FileBasedSource will report every unique directory (one level up) if number of file > 100 but number of unique directory <= 100

  • FileBasedSource will report bucket otherwise

  • ReadAll will report every file if number of file <= 100

  • ReadAll will report bucket otherwise

  • FileBasedSink will report every write-to file if shards <= 100

  • FileBasedSink (for each destination) will report single directory to write if number of file > 100

In contrast to read, we are able to report single directory because shards are in the same directory.

Please add a meaningful description for your change here


Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Mention the appropriate issue in your description (for example: addresses #123), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment fixes #<ISSUE NUMBER> instead.
  • Update CHANGES.md with noteworthy changes.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md

GitHub Actions Tests Status (on master branch)

Build python source distribution and wheels
Python tests
Java tests
Go tests

See CI.md for more information about GitHub Actions CI or the workflows README to see a list of phrases to trigger workflows.

Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants