Skip to content

[feat] Simple macro OSS benchmark#47

Merged
blefaudeux merged 13 commits intomasterfrom
oss_benchmark
Aug 21, 2020
Merged

[feat] Simple macro OSS benchmark#47
blefaudeux merged 13 commits intomasterfrom
oss_benchmark

Conversation

@blefaudeux
Copy link
Copy Markdown
Contributor

Before submitting

  • Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
  • Did you read the contributor guideline?
  • [ ] Did you make sure to update the docs?
  • Did you write any new necessary tests?

What does this PR do?

Adds some numbers behind #42, for a workload relevant for computer vision.

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

Did you have fun?

Yes that was pretty fun 🙃

@blefaudeux blefaudeux self-assigned this Aug 20, 2020
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Aug 20, 2020
@blefaudeux
Copy link
Copy Markdown
Contributor Author

blefaudeux commented Aug 20, 2020

example output:

Benchmark OSS
[0] : Epoch 0 - processed 104.37 img per sec
[0] : Epoch 1 - processed 106.04 img per sec
[0] : Epoch 2 - processed 104.27 img per sec
[0] : Epoch 3 - processed 105.76 img per sec
[0] : Epoch 4 - processed 107.96 img per sec
[0] : Epoch 5 - processed 107.64 img per sec
[0] : Epoch 6 - processed 106.97 img per sec
[0] : Epoch 7 - processed 107.19 img per sec
[0] : Epoch 8 - processed 106.15 img per sec
[0] : Epoch 9 - processed 105.56 img per sec
[1] Peak memory: 8252.5MiB
[0] : Training done. 105.76 img per sec overall
[0] Peak memory: 8247.7MiB
Benchmark vanilla SGD
[0] : Epoch 0 - processed 88.46 img per sec
[0] : Epoch 1 - processed 87.95 img per sec
[0] : Epoch 2 - processed 87.04 img per sec
[0] : Epoch 3 - processed 89.04 img per sec
[0] : Epoch 4 - processed 88.58 img per sec
[0] : Epoch 5 - processed 89.56 img per sec
[0] : Epoch 6 - processed 89.06 img per sec
[0] : Epoch 7 - processed 89.05 img per sec
[0] : Epoch 8 - processed 89.05 img per sec
[0] : Epoch 9 - processed 89.00 img per sec
[1] Peak memory: 8335.7MiB
[0] : Training done. 88.60 img per sec overall```

@codecov
Copy link
Copy Markdown

codecov bot commented Aug 20, 2020

Codecov Report

Merging #47 into master will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master      #47   +/-   ##
=======================================
  Coverage   94.18%   94.18%           
=======================================
  Files          35       35           
  Lines        2065     2065           
=======================================
  Hits         1945     1945           
  Misses        120      120           
Flag Coverage Δ
#Python 94.18% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
fairscale/optim/adam.py 93.04% <ø> (ø)
fairscale/optim/oss.py 100.00% <ø> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c2d6f4b...a20fa51. Read the comment docs.

@blefaudeux
Copy link
Copy Markdown
Contributor Author

ping @min-xu-ai or @msbaines, this should be simple enough (no change to the core lib, just an outsider benchmark)

@blefaudeux blefaudeux linked an issue Aug 20, 2020 that may be closed by this pull request
Copy link
Copy Markdown
Contributor

@msbaines msbaines left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great. Would you mind adding an assertion at the end so that we can use this as a regression test. To create the assertion, run the test 5 times (you can do this directly from circleci) and then calculating mean and standard deviation. Then set the threshold to mean - 3*standard deviation.

@msbaines
Copy link
Copy Markdown
Contributor

This looks great. Would you mind adding an assertion at the end so that we can use this as a regression test. To create the assertion, run the test 5 times (you can do this directly from circleci) and then calculating mean and standard deviation. Then set the threshold to mean - 3*standard deviation.

Or you could just use the mean/stdev for the 10 epochs and set the threshold that way.

Copy link
Copy Markdown
Contributor

@min-xu-ai min-xu-ai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM and like Mandeep's suggestions.

blefaudeux and others added 3 commits August 21, 2020 15:03
Set the torch seed for tests. xfail mixed precision and memory-efficient mixed-precision state_dict tests due to their states being cast to FP16 and back to FP32 during load_state_dict.

Co-authored-by: Jun Ru Anderson <andersonic@fb.com>
@blefaudeux blefaudeux merged commit 46c3776 into master Aug 21, 2020
@blefaudeux blefaudeux deleted the oss_benchmark branch August 21, 2020 22:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Faster OSS

5 participants