Skip to content

[perf][OSS] tensor views for bucketing#300

Merged
blefaudeux merged 7 commits intomasterfrom
oss_tensor_views_test
Jan 11, 2021
Merged

[perf][OSS] tensor views for bucketing#300
blefaudeux merged 7 commits intomasterfrom
oss_tensor_views_test

Conversation

@blefaudeux
Copy link
Copy Markdown
Contributor

@blefaudeux blefaudeux commented Jan 8, 2021

Before submitting

  • Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
  • Did you read the contributor guideline?
  • Did you make sure to update the docs?
  • Did you write any new necessary tests?

What does this PR do?

Speedups by a couple of %, removes a lot of code, saves some memory (buckets are free)

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

Did you have fun?

Make sure you had fun coding 🙃

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jan 8, 2021
@blefaudeux
Copy link
Copy Markdown
Contributor Author

@msbaines I'm not sure how that works for torch1.5 and 1.6, I don't think that we're actually testing it nowadays (since the pip default change)

@blefaudeux
Copy link
Copy Markdown
Contributor Author

@msbaines I'm not sure how that works for torch1.5 and 1.6, I don't think that we're actually testing it nowadays (since the pip default change)

I've just tested, this part (tensor views) is fine, but a previous PR (#297) was not in fact pytorch 1.5 compatible (cc @pritamdamania87), and CI did not catch that..

Copy link
Copy Markdown
Contributor

@min-xu-ai min-xu-ai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really like the simplification and performance/memory gain!

torch.distributed group (default: group.WORLD)
broadcast_buffer_size (int):
the size of the buffer used to batch the small parameter tensors (default 128k).
the max size of the buffer used to batch the small parameter tensors, in number of elements (default 16M).
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

talk a bit about the impact on per-GPU memory usage here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the beauty of this PR is that the impact on GPU usage is now.. 0 (zero) :) the small parameters are just a view of a bigger buffer, so the bucketing takes no extra space (on top of taking no extra cpu cycle).
I just realized that I should resize it to match the last bucketed parameter though, right now the remainder is unused and do increase the gpu memory a tiny bit

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, the test jobs do show a slightly larger memory consumption, looking into that

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, makes sense given the initial extra memory allocation, before the parameters have been folded to the buffer, I've updated the docstring to reflect that

@blefaudeux blefaudeux merged commit 6219b57 into master Jan 11, 2021
@blefaudeux blefaudeux deleted the oss_tensor_views_test branch January 11, 2021 18:10
@blefaudeux blefaudeux linked an issue Jan 12, 2021 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[ShardedOptimizer] Use views in the buckets / save memory

3 participants