Skip to content

[feat] Add a memory usage regression test to the OSS benchmark#62

Merged
blefaudeux merged 10 commits intomasterfrom
oss_benchmark_memory
Sep 3, 2020
Merged

[feat] Add a memory usage regression test to the OSS benchmark#62
blefaudeux merged 10 commits intomasterfrom
oss_benchmark_memory

Conversation

@blefaudeux
Copy link
Copy Markdown
Contributor

Before submitting

  • Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
  • Did you read the contributor guideline?
  • [-] Did you make sure to update the docs?
  • Did you write any new necessary tests?

What does this PR do?

  • Adds a check on the benchmark to make sure that the memory consumption of OSS does not regress
  • Adds a state consolidation step to the benchmark to make sure that this codepath is tested for regression, and future speedups on that front are visible

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

Did you have fun?

definitely 🙃

assert (mean - 3.0 * std) < reference_speed, "Regression detected"
print(f"[Regression Test] Mean speed: {mean:.2f} +/- {std:.2f}")
assert (mean - 3.0 * std) < reference_speed, "Speed regression detected"
assert max_memory < 1.05 * reference_memory, "Memory use regression detected"
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a 5% tolerance here, guessing that some CUDA or torch version changes could affect (I don't have any STD for this value, it's the max memory used over the whole run)

@blefaudeux blefaudeux marked this pull request as draft September 3, 2020 18:30
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Sep 3, 2020
@blefaudeux blefaudeux marked this pull request as ready for review September 3, 2020 19:52
@blefaudeux blefaudeux merged commit ee38e1e into master Sep 3, 2020
@blefaudeux blefaudeux deleted the oss_benchmark_memory branch September 3, 2020 20:18
myleott pushed a commit that referenced this pull request Feb 22, 2021
* Do reduce-scatter in a separate CUDA stream

* Add _post_backward_stream to stubs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants