Improve bernoulli rng-bit-generation memory footprint by yeounoh · Pull Request #5581 · pytorch/xla

yeounoh · 2023-09-14T23:55:37Z

Bernoulli rng-bit-generation is using f32 -> u32 precision, which is not needed when the range is small. We should be able to opt in for lower precision to save memory, when it's applicable. We don't compute the dynamic range on the fly to avoid/minimize computation.

Tested with GPT-2 benchmark, for the same loss/convergence and for the lower memory profile.

This reverts commit a8763b3.

JackCaoG · 2023-09-15T00:16:33Z

      xla::One(probability.builder(), probability_shape.element_type());
-  xla::XlaOp noise = RngUniform(seed, probability_shape, zero, one);
+  xla::XlaOp noise =
+      RngUniform(seed, probability_shape, zero, one, /*downcast=*/true);


is there a reason we only downcast for Bernoulli. Would it be better if we always downcast?

Yea, I think so. For instance we could do for multinomial, since its also using it to create a random mask, and I don't think the extra precision matters. I guess we could try to extend the application when it becomes an issue or surface in the actual use case like Bernoulli, and it's easier to verify and test its benefit :)

* Allow downcasting RngUniform genenration for Bernoulli

* Allow downcasting RngUniform genenration for Bernoulli Co-authored-by: Yeounoh Chung <yeounoh@google.com>

* Allow downcasting RngUniform genenration for Bernoulli

* Allow downcasting RngUniform genenration for Bernoulli Co-authored-by: Yeounoh Chung <yeounoh@google.com>

* Handle dynamo function without input (#5565) (#5577) * Make cpu tensor on XLA dynamo backend a warning instead of error (#5549) (#5576) * [author: jluntamazon] Adding more explicit HLO lowering control by exposing LoweringContext… (#5431) (#5580) * Adding more explicit HLO lowering control by exposing LoweringContext (and utilities) to python for Neuron * fixing linter issues * fixing spacing * apply comments and fix compilation errors * add test for new apis * fix linter * update test * update test * modify test * reverse back to GetIrValue() * update test inputs with random numbers * skip unittest because it only fails in CI --------- Co-authored-by: aws-kingrj <78175353+aws-kingrj@users.noreply.github.com> Co-authored-by: Ubuntu <ubuntu@ip-172-31-3-186.us-west-2.compute.internal> Co-authored-by: seanlatias <seanlatias@gmail.com> * fixing num_local_processes typo (#5573) (#5579) Co-authored-by: aws-kingrj <78175353+aws-kingrj@users.noreply.github.com> * Move where clear pending IR is called to avoid crash (#5552) (#5582) * Move where clear pending IR is called to avoid crash * fix CI * fix CI and add some debugging messages * Fix release branch and tag patterns for GitHub Actions (#5587) (#5590) * Improve bernoulli rng-bit-generation memory footprint (#5581) (#5589) * Allow downcasting RngUniform genenration for Bernoulli Co-authored-by: Yeounoh Chung <yeounoh@google.com> * Enable xla:gpu autocast for bfloat16 if not restricted (#5570) (#5591) * Enable autocast for XLA:GPU * linter fix * XLA autocast test for GPU and TPU * linter fix * Ensure that xla autocast is properly enabled for GPU and does not crash when torch cuda is not available. * linter fix * Add tests * Support bf16 * linter fix * exclude unsupported test cases * increase GPU test timeout to 300 Co-authored-by: Yeounoh Chung <yeounoh@google.com> * Cherry-pick: Don't trigger CI build on release tag push (#5595) Copy of #5594 on release branch * formatting --------- Co-authored-by: JackCaoG <59073027+JackCaoG@users.noreply.github.com> Co-authored-by: Wonjoo Lee <wonjoo@google.com> Co-authored-by: aws-kingrj <78175353+aws-kingrj@users.noreply.github.com> Co-authored-by: Ubuntu <ubuntu@ip-172-31-3-186.us-west-2.compute.internal> Co-authored-by: seanlatias <seanlatias@gmail.com> Co-authored-by: Manfei <41607353+ManfeiBai@users.noreply.github.com> Co-authored-by: Yeounoh Chung <yeounoh@google.com>

…h#5581) (pytorch#5589)" This reverts commit fa5d132.

Revert "Improve bernoulli rng-bit-generation memory footprint (pytorch#5581)…

yeounoh self-assigned this Sep 14, 2023

yeounoh added the performance label Sep 14, 2023

yeounoh requested review from JackCaoG and wonjoo-wj September 14, 2023 23:57

yeounoh added 3 commits September 14, 2023 17:02

Allow downcasting RngUniform genenration

5075daf

Transpose the bernoulli mask to avoid copying

9d6b48b

Revert "Transpose the bernoulli mask to avoid copying"

c33fbaf

This reverts commit a8763b3.

yeounoh force-pushed the yeounoh_bernoulli branch from 5199b04 to c33fbaf Compare September 15, 2023 00:02

JackCaoG reviewed Sep 15, 2023

View reviewed changes

JackCaoG approved these changes Sep 15, 2023

View reviewed changes

yeounoh merged commit 26a81a1 into master Sep 15, 2023

ManfeiBai pushed a commit that referenced this pull request Sep 15, 2023

Improve bernoulli rng-bit-generation memory footprint (#5581)

3e4eee3

* Allow downcasting RngUniform genenration for Bernoulli

ManfeiBai mentioned this pull request Sep 15, 2023

Improve bernoulli rng-bit-generation memory footprint (#5581) #5589

Merged

will-cromar pushed a commit that referenced this pull request Sep 15, 2023

Improve bernoulli rng-bit-generation memory footprint (#5581) (#5589)

fa5d132

* Allow downcasting RngUniform genenration for Bernoulli Co-authored-by: Yeounoh Chung <yeounoh@google.com>

zpcore pushed a commit that referenced this pull request Sep 18, 2023

Improve bernoulli rng-bit-generation memory footprint (#5581)

8a5beaf

* Allow downcasting RngUniform genenration for Bernoulli

will-cromar pushed a commit that referenced this pull request Sep 18, 2023

Improve bernoulli rng-bit-generation memory footprint (#5581) (#5589)

06903dc

* Allow downcasting RngUniform genenration for Bernoulli Co-authored-by: Yeounoh Chung <yeounoh@google.com>

jeffhataws added a commit to jeffhataws/xla that referenced this pull request Dec 17, 2023

Revert "Improve bernoulli rng-bit-generation memory footprint (pytorc…

242dbe6

…h#5581) (pytorch#5589)" This reverts commit fa5d132.

sssrijan-amazon added a commit to jeffhataws/xla that referenced this pull request Dec 18, 2023

Merge pull request #6 from jeffhataws/v2.1.1_updated_revert_5581

99c84ed

Revert "Improve bernoulli rng-bit-generation memory footprint (pytorch#5581)…

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve bernoulli rng-bit-generation memory footprint#5581

Improve bernoulli rng-bit-generation memory footprint#5581
yeounoh merged 3 commits intomasterfrom
yeounoh_bernoulli

yeounoh commented Sep 14, 2023 •

edited

Loading

Uh oh!

JackCaoG Sep 15, 2023

Uh oh!

yeounoh Sep 15, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

yeounoh commented Sep 14, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JackCaoG Sep 15, 2023

Choose a reason for hiding this comment

Uh oh!

yeounoh Sep 15, 2023

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

yeounoh commented Sep 14, 2023 •

edited

Loading