Skip to content

[BEAM-10937] Tour of Beam Windowing notebook#14962

Merged
aaltay merged 2 commits intoapache:masterfrom
davidcavazos:tour-of-beam
Jul 2, 2021
Merged

[BEAM-10937] Tour of Beam Windowing notebook#14962
aaltay merged 2 commits intoapache:masterfrom
davidcavazos:tour-of-beam

Conversation

@davidcavazos
Copy link
Copy Markdown

@davidcavazos davidcavazos commented Jun 7, 2021

Adds the Windowing notebook for the Tour of Beam.

R: @aaltay

FYI: @anguillanneuf

Staged:

Notebook: https://colab.research.google.com/github/davidcavazos/beam/blob/tour-of-beam/examples/notebooks/tour-of-beam/windowing.ipynb
Page entry: http://apache-beam-website-pull-requests.storage.googleapis.com/14962/get-started/tour-of-beam/index.html


Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Choose reviewer(s) and mention them in a comment (R: @username).
  • Format the pull request title like [BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replace BEAM-XXX with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
  • Update CHANGES.md with noteworthy changes.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

ValidatesRunner compliance status (on master branch)

Lang ULR Dataflow Flink Samza Spark Twister2
Go --- Build Status Build Status --- Build Status ---
Java Build Status Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Python --- Build Status
Build Status
Build Status
Build Status
Build Status
--- Build Status ---
XLang Build Status Build Status Build Status --- Build Status ---

Examples testing status on various runners

Lang ULR Dataflow Flink Samza Spark Twister2
Go --- --- --- --- --- --- ---
Java --- Build Status
Build Status
Build Status
--- --- --- --- ---
Python --- --- --- --- --- --- ---
XLang --- --- --- --- --- --- ---

Post-Commit SDK/Transform Integration Tests Status (on master branch)

Go Java Python
Build Status Build Status Build Status
Build Status
Build Status

Pre-Commit Tests Status (on master branch)

--- Java Python Go Website Whitespace Typescript
Non-portable Build Status
Build Status
Build Status
Build Status
Build Status
Build Status Build Status Build Status Build Status
Portable --- Build Status Build Status --- --- ---

See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.

GitHub Actions Tests Status (on master branch)

Build python source distribution and wheels
Python tests
Java tests

See CI.md for more information about GitHub Actions CI.

@anguillanneuf
Copy link
Copy Markdown
Contributor

anguillanneuf commented Jun 17, 2021

Overall looks great. Gardening project, moon phases, the notebook has got personality and I like it a lot :)

Some comments as I was going through the notebook:

  1. s/lets/let's
  2. Suggested edit - redundancy: "In our example, the "processing" is done by PrintElementInfo which simply prints the element with its window information. For windows of three months every month, each element is processed three times, one time per window."
  3. Suggested edit - use a bulleted list: "Sliding windows allow us to do just that. We need to specify the window size in seconds just like with FixedWindows. We also need to specify a window period in seconds, which is how often we want to emit each window."
  4. Suggested edit - redundancy - future tense: "If the next event happens within the next 30 days or less, like 20 days after the previous event, the session window [will extend and cover..] extends and covers that as well. If there are no new events for the next 30 days, the session window [will close..] closes and is emitted."
  5. I had to scroll up to remember what input events look like in the later examples. It would be nice to include a screenshot of the input events to the right of the nice bar charts, then these screenshots can be easily included by other tutorials/talks.

@aaltay
Copy link
Copy Markdown
Member

aaltay commented Jun 29, 2021

What is the next step on this PR? Could you address the open comments?

@davidcavazos
Copy link
Copy Markdown
Author

davidcavazos commented Jun 29, 2021

Overall looks great. Gardening project, moon phases, the notebook has got personality and I like it a lot :)

Some comments as I was going through the notebook:

  1. s/lets/let's

Thanks, done.

  1. Suggested edit - redundancy: "In our example, the "processing" is done by PrintElementInfo which simply prints the element with its window information. For windows of three months every month, each element is processed three times, one time per window."

Changed

  1. Suggested edit - use a bulleted list: "Sliding windows allow us to do just that. We need to specify the window size in seconds just like with FixedWindows. We also need to specify a window period in seconds, which is how often we want to emit each window."

Changed

  1. Suggested edit - redundancy - future tense: "If the next event happens within the next 30 days or less, like 20 days after the previous event, the session window [will extend and cover..] extends and covers that as well. If there are no new events for the next 30 days, the session window [will close..] closes and is emitted."

Changed

  1. I had to scroll up to remember what input events look like in the later examples. It would be nice to include a screenshot of the input events to the right of the nice bar charts, then these screenshots can be easily included by other tutorials/talks.

I actually ended up inlining the data on each snippet. It makes each sample a little more self contained, and simplified some indirection. It also makes it easier for people to modify the inputs and see how it affects things.

@davidcavazos
Copy link
Copy Markdown
Author

davidcavazos commented Jun 29, 2021

@aaltay sorry for the delay on this one, but comments have been addressed and should be ready for review.

R: @pcoet

FYI: @rosetn

@davidcavazos davidcavazos requested a review from aaltay June 29, 2021 17:45
@aaltay aaltay merged commit d0ed701 into apache:master Jul 2, 2021
@davidcavazos davidcavazos deleted the tour-of-beam branch July 2, 2021 21:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants