Enable failing diffs on regression#136551
Enable failing diffs on regression#136551laithsakka wants to merge 20 commits intogh/laithsakka/71/basefrom
Conversation
[ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/136551
Note: Links to docs will display an error until the docs builds have been completed. ❗ 1 Active SEVsThere are 1 currently active SEVs. If your PR is affected, please view them below: ✅ No FailuresAs of commit 32b2b90 with merge base d2455b9 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang amjames rec [ghstack-poisoned]
regression introduced by setting low expected ``` **REGRESSION** benchmark add_loop_eager_dynamic failed, actual instruction count 5486451976 is higher than expected 1 with noise margin 0.01 if this is an expected regression, please update the expected instruction count in the benchmark. ``` win introduced by setting high expected ``` **WIN** benchmark add_loop_eager failed, actual instruction count 2758450573 is lower than expected 10000000000 with noise margin 0.01 please update the expected instruction count in the benchmark. ``` I will follow up with diffs that enable the regressions at diff time. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang amjames rec [ghstack-poisoned]
regression introduced by setting low expected ``` **REGRESSION** benchmark add_loop_eager_dynamic failed, actual instruction count 5486451976 is higher than expected 1 with noise margin 0.01 if this is an expected regression, please update the expected instruction count in the benchmark. ``` win introduced by setting high expected ``` **WIN** benchmark add_loop_eager failed, actual instruction count 2758450573 is lower than expected 10000000000 with noise margin 0.01 please update the expected instruction count in the benchmark. ``` I will follow up with diffs that enable the regressions at diff time. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang amjames rec [ghstack-poisoned]
regression introduced by setting low expected ``` **REGRESSION** benchmark add_loop_eager_dynamic failed, actual instruction count 5486451976 is higher than expected 1 with noise margin 0.01 if this is an expected regression, please update the expected instruction count in the benchmark. ``` win introduced by setting high expected ``` **WIN** benchmark add_loop_eager failed, actual instruction count 2758450573 is lower than expected 10000000000 with noise margin 0.01 please update the expected instruction count in the benchmark. ``` I will follow up with diffs that enable the regressions at diff time. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang amjames rec [ghstack-poisoned]
regression introduced by setting low expected ``` **REGRESSION** benchmark add_loop_eager_dynamic failed, actual instruction count 5486451976 is higher than expected 1 with noise margin 0.01 if this is an expected regression, please update the expected instruction count in the benchmark. ``` win introduced by setting high expected ``` **WIN** benchmark add_loop_eager failed, actual instruction count 2758450573 is lower than expected 10000000000 with noise margin 0.01 please update the expected instruction count in the benchmark. ``` I will follow up with diffs that enable the regressions at diff time. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang amjames rec [ghstack-poisoned]
|
You're putting the expected count number in a Python file, which means when you write the auto-updater it will be harder to programatically update. Can we plan for programattic updates now, or have you decided you don't want them? |
|
Earlier today we discussed how you should log each PR you block so that we can know/track when a PR is blocked. Should we add that functionality before landing this? |
|
Doesn't this PR have those logs? |
mhmm i see i can change the way we do it to have them in a separate file |
|
To be super explicit, I don't want to land this /without/ the autoupdater. It's a package deal. |
regression introduced by setting low expected ``` **REGRESSION** benchmark add_loop_eager_dynamic failed, actual instruction count 5486451976 is higher than expected 1 with noise margin 0.01 if this is an expected regression, please update the expected instruction count in the benchmark. ``` win introduced by setting high expected ``` **WIN** benchmark add_loop_eager failed, actual instruction count 2758450573 is lower than expected 10000000000 with noise margin 0.01 please update the expected instruction count in the benchmark. ``` I will follow up with diffs that enable the regressions at diff time. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang amjames rec [ghstack-poisoned]
regression introduced by setting low expected ``` **REGRESSION** benchmark add_loop_eager_dynamic failed, actual instruction count 5486451976 is higher than expected 1 with noise margin 0.01 if this is an expected regression, please update the expected instruction count in the benchmark. ``` win introduced by setting high expected ``` **WIN** benchmark add_loop_eager failed, actual instruction count 2758450573 is lower than expected 10000000000 with noise margin 0.01 please update the expected instruction count in the benchmark. ``` I will follow up with diffs that enable the regressions at diff time. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang amjames rec [ghstack-poisoned]
test this by running
python check_results.py test_check_result/expected_test.csv test_check_result/result_test.csv
results
```
WIN: benchmark ('a', ' instruction count') failed, actual result 90 is 18.18% lower than expected 110 ±1.00% please update the expected results.
REGRESSION: benchmark ('b', ' memory') failed, actual result 200 is 100.00% higher than expected 100 ±10.00% if this is an expected regression, please update the expected results.
MISSING REGRESSION TEST: benchmark ('d', ' missing-test') does not have a regression test enabled for it
```
MISSING REGRESSION TEST does not fail but its logged.
cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang amjames rec
[ghstack-poisoned]
test this by running
python check_results.py test_check_result/expected_test.csv test_check_result/result_test.csv
results
```
WIN: benchmark ('a', ' instruction count') failed, actual result 90 is 18.18% lower than expected 110 ±1.00% please update the expected results.
REGRESSION: benchmark ('b', ' memory') failed, actual result 200 is 100.00% higher than expected 100 ±10.00% if this is an expected regression, please update the expected results.
MISSING REGRESSION TEST: benchmark ('d', ' missing-test') does not have a regression test enabled for it
```
MISSING REGRESSION TEST does not fail but its logged.
cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang amjames rec
[ghstack-poisoned]
test this by running
python check_results.py test_check_result/expected_test.csv test_check_result/result_test.csv
results
```
WIN: benchmark ('a', ' instruction count') failed, actual result 90 is 18.18% lower than expected 110 ±1.00% please update the expected results.
REGRESSION: benchmark ('b', ' memory') failed, actual result 200 is 100.00% higher than expected 100 ±10.00% if this is an expected regression, please update the expected results.
MISSING REGRESSION TEST: benchmark ('d', ' missing-test') does not have a regression test enabled for it
```
MISSING REGRESSION TEST does not fail but its logged.
cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang amjames rec
[ghstack-poisoned]
test this by running
python check_results.py test_check_result/expected_test.csv test_check_result/result_test.csv
results
```
WIN: benchmark ('a', ' instruction count') failed, actual result 90 is 18.18% lower than expected 110 ±1.00% please update the expected results.
REGRESSION: benchmark ('b', ' memory') failed, actual result 200 is 100.00% higher than expected 100 ±10.00% if this is an expected regression, please update the expected results.
MISSING REGRESSION TEST: benchmark ('d', ' missing-test') does not have a regression test enabled for it
```
MISSING REGRESSION TEST does not fail but its logged.
cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang amjames rec
[ghstack-poisoned]
test this by running
python check_results.py test_check_result/expected_test.csv test_check_result/result_test.csv
results
```
WIN: benchmark ('a', ' instruction count') failed, actual result 90 is 18.18% lower than expected 110 ±1.00% please update the expected results.
REGRESSION: benchmark ('b', ' memory') failed, actual result 200 is 100.00% higher than expected 100 ±10.00% if this is an expected regression, please update the expected results.
MISSING REGRESSION TEST: benchmark ('d', ' missing-test') does not have a regression test enabled for it
```
MISSING REGRESSION TEST does not fail but its logged.
cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang amjames rec
[ghstack-poisoned]
test this by running
python check_results.py test_check_result/expected_test.csv test_check_result/result_test.csv
results
```
WIN: benchmark ('a', ' instruction count') failed, actual result 90 is 18.18% lower than expected 110 ±1.00% please update the expected results.
REGRESSION: benchmark ('b', ' memory') failed, actual result 200 is 100.00% higher than expected 100 ±10.00% if this is an expected regression, please update the expected results.
MISSING REGRESSION TEST: benchmark ('d', ' missing-test') does not have a regression test enabled for it
```
MISSING REGRESSION TEST does not fail but its logged.
cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang amjames rec
[ghstack-poisoned]
|
|
||
| if result < low: | ||
| fail = True | ||
| ratio = (float)(entry.expected_value - result) * 100 / entry.expected_value |
There was a problem hiding this comment.
the heck is this, just do float(...) like a normal person lol
There was a problem hiding this comment.
lol pardon my c++ background and lack of python your majesty, i will update it
| print( | ||
| f"WIN: benchmark {key} failed, actual result {result} is {ratio:.2f}% lower than " | ||
| f"expected {entry.expected_value} ±{entry.noise_margin*100:.2f}% " | ||
| f"please update the expected results." |
There was a problem hiding this comment.
OK, so are you going to write the updater script too?
There was a problem hiding this comment.
not in this diff, I will follow up in a different diff.
1. example of failing diff #136740 2. test this by running python check_results.py test_check_result/expected_test.csv test_check_result/result_test.csv results ``` WIN: benchmark ('a', ' instruction count') failed, actual result 90 is 18.18% lower than expected 110 ±1.00% please update the expected results. REGRESSION: benchmark ('b', ' memory') failed, actual result 200 is 100.00% higher than expected 100 ±10.00% if this is an expected regression, please update the expected results. MISSING REGRESSION TEST: benchmark ('d', ' missing-test') does not have a regression test enabled for it ``` MISSING REGRESSION TEST does not fail but its logged. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang amjames rec [ghstack-poisoned]
|
adress comments |
1. example of failing diff #136740 2. test this by running python check_results.py test_check_result/expected_test.csv test_check_result/result_test.csv results ``` WIN: benchmark ('a', ' instruction count') failed, actual result 90 is 18.18% lower than expected 110 ±1.00% please update the expected results. REGRESSION: benchmark ('b', ' memory') failed, actual result 200 is 100.00% higher than expected 100 ±10.00% if this is an expected regression, please update the expected results. MISSING REGRESSION TEST: benchmark ('d', ' missing-test') does not have a regression test enabled for it ``` MISSING REGRESSION TEST does not fail but its logged. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang amjames rec [ghstack-poisoned]
1. example of failing diff #136740 2. test this by running python check_results.py test_check_result/expected_test.csv test_check_result/result_test.csv results ``` WIN: benchmark ('a', ' instruction count') failed, actual result 90 is 18.18% lower than expected 110 ±1.00% please update the expected results. REGRESSION: benchmark ('b', ' memory') failed, actual result 200 is 100.00% higher than expected 100 ±10.00% if this is an expected regression, please update the expected results. MISSING REGRESSION TEST: benchmark ('d', ' missing-test') does not have a regression test enabled for it ``` MISSING REGRESSION TEST does not fail but its logged. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang amjames rec [ghstack-poisoned]
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Stack from ghstack (oldest at bottom):
example of failing diff
[no land] test fail due to win #136740
test this by running
python check_results.py test_check_result/expected_test.csv test_check_result/result_test.csv
results
MISSING REGRESSION TEST does not fail but its logged.
cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @chenyang78 @kadeng @chauhang @amjames @rec