[WIP] Install and test against mkldnn#6132
Conversation
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
|
Does this install mkldnn for every configuration? Would it make sense to have one configuration that doesn't have mkldnn to make sure we don't break that case? |
|
It only installs mkldnn on the conda enabled configurations, so the pip-only ones will run without it. OTOH, I'm pretty unhappy about the current state of the tests, where you have to build in a specific configuration to exercise certain tests. Enabling a feature shouldn't change the tests you run; it should just add to it. So then we don't have to all of these funny contortions to turn cudnn on/off to make sure we exercise both paths, for example. |
|
Running CPU perf test again: https://ci.pytorch.org/jenkins/job/pytorch-builds/job/short-perf-test-cpu/2287/ |
|
@yf225 It still failed? |
|
yeah third attempt failed too on a different machine, so it's probably legit: https://ci.pytorch.org/jenkins/job/pytorch-builds/job/short-perf-test-cpu/2291/console |
|
@mingfeima https://github.com/pytorch/examples/tree/perftests/mnist seems to run slower with the mkldnn package. Do you mind taking a look? |
|
@yf225 i tested https://github.com/pytorch/examples/tree/perftests/mnist on my machine, Xeon(R) Platinum 8180 CPU @ 2.50GHz,
however, mnist is a very
And also the mkldnn integration is just a kickoff, the current performance is roughly 20% of the best number. We will continue work on that. i got a few questions, if you want to use the same number of threads for mkl, no need to set MKL_NUM_THREADS, it will follow OMP_NUM_THREADS. MKL_NUM_THREADS is only needed when you want to use different number threads from OMP_NUM_THREADS. also KMP_AFFINITY has big impact on CPU performance. if you are testing using only 4 cores, this doesn't really make any difference. but if you use all the cores on CPU, |
|
@mingfeima Thanks a lot for the advice - we will move over to use Convnet for our benchmark. Currently we are using Mac mini i7 as our test machine, which is using 2.3GHz The test machines would probably only have 4 cores - are the improvements we have mostly on CPUs that have more cores? Also we are in the process of revamping our CPU perf test setup, and one issue is that wall-clock runtime for CPU perf tests is not reliable since we can't guarantee that there won't be other programs running on the machine at the time. We are thinking about using |
|
@yf225 ah, looks we have a huge difference in testing cpu performance. :( First of all, Intel has quite a long product line, i5/i7 for desktop and Atom (low power low performance) for mobile. In AI, Intel is promoting Xeon (high power high performance) which is mostly used in data centers, CSP(cloud service provider). Inside Intel, Presumably, MKLDNN should provide speedup for both i5/i7 and atom, but actually MKLDNN is designed for Xeon. We highly recommend to include Xeon in CPU performance testing, better skylake. As for the second question, it must be guaranteed that NO other process interrupts the performance benchmarking, otherwise the result is useless. This is also true for GPU performance benchmarking. We have a 32-node skylake cluster, managed by Also use the metric of |
|
Hi @yf225, so do you think we should adjust the CPU timing and then accept this patch? |
|
@ezyang Yes sorry for getting back to this late - I think we should accept it since it gives bigger gains to more compute-intensive models. @mingfeima Thanks a lot for the advice! We are moving over to using bare metal machines for CPU performance testing, which allows us to have dedicated CPU cores for running the test suite, so no other process will interrupt the performance benchmarking anymore. On the bare metal machines we will measure actual wall-clock time, which will give an accurate estimate of program runtime. We probably have to stick with Atom from a cost perspective for our PR cpu perf test, but we plan to run larger perf tests nightly and weekly on Xeon machines in the near future, and the benefits of MKLDNN will definitely show there. |
|
@pytorchbot retest this please |
|
But I guess, see also #7974 |
|
@cpuhrsch Should we merge this one? |
|
At this point, I'm not really sure what would need to happen to merge this PR. |
Also took the opportunity to remove obsolete mkl and mkl-devel settings.
Signed-off-by: Edward Z. Yang ezyang@fb.com