Fix for AVX2 support in Visual Studio by scottmudge · Pull Request #13525 · tensorflow/tensorflow

scottmudge · 2017-10-06T14:58:10Z

This is a fix for issue #10199. Visual Studio 2015 (possibly other versions) lacks definitions for _mm256_extract_epi8, -16, -32, or -64 in the immintrin.h header, nor in the associated runtime, so it must be implemented manually.

For wider portability these functions are renamed based on their required extraction indices. These intrinsics should be just as fast as the externally linked versions provided by GCC.

tensorflow-jenkins · 2017-10-06T14:58:12Z

Can one of the admins verify this patch?

googlebot · 2017-10-06T14:58:14Z

Thanks for your pull request. It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please visit https://cla.developers.google.com/ to sign.

Once you've signed, please reply here (e.g. I signed it!) and we'll verify. Thanks.

If you've already signed a CLA, it's possible we don't have your GitHub username or you're using a different email address. Check your existing CLA data and verify that your email is set on your git commits.
If your company signed a CLA, they designated a Point of Contact who decides which employees are authorized to participate. You may need to contact the Point of Contact for your company and ask to be added to the group of authorized contributors. If you don't know who your Point of Contact is, direct the project maintainer to go/cla#troubleshoot.
In order to pass this check, please resolve this problem and have the pull request author add another comment and the bot will run again.

scottmudge · 2017-10-06T14:59:36Z

I signed it!

googlebot · 2017-10-06T14:59:44Z

CLAs look good, thanks!

scottmudge · 2017-10-06T16:09:17Z

Also note that AVX and AVX2 are not enabled by default in CMakeLists.txt, even when native arch optimization is enabled. I will place another pull request to fix this in the future, along with a number of other CMake fixes.

frankchn · 2017-10-06T17:26:21Z

Jenkins, test this please.

gunan · 2017-10-06T21:38:38Z

Jenkins, test this please.

scottmudge · 2017-10-06T23:23:52Z

Looks like the CI server is having some unrelated failures? Is that common?

gunan · 2017-10-06T23:27:23Z

Unfortunately more common than we would like.
Retrying tests.
Jenkins, test this please.

mrry · 2017-10-06T23:49:05Z

The change looks good to me, but I'll defer to @benoitsteiner since it's in Eigen code. (I'm not sure how or whether we pull in Eigen code from other repositories, and whether it would be better to make the change upstream first.)

yang0773 · 2017-10-12T23:42:10Z

@scottmudge, thanks for your job about avx2, great!
I pulled the latest tensorflow project which included your commits about avx2 and compiled successfully by command:
cmake .. -A x64 -DCMAKE_BUILD_TYPE=Release -DSWIG_EXECUTABLE=C:\D\tools\swigwin-3.0.12\swig.exe -DPYTHON_EXECUTABLE=C:\D\tools\Anaconda3\python.exe -DPYTHON_LIBRARIES=C:\D\tools\Anaconda3\python35.lib -Dtensorflow_ENABLE_GPU=ON -DCUDNN_HOME="C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0" -Dtensorflow_WIN_CPU_SIMD_OPTIONS=/arch:AVX

but it still hints when running,
Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2

What else should I do to enable AVX2 for tensorflow on WINDOWS? Thanks so much.

Frank

scottmudge · 2017-10-13T00:44:24Z

Hey Frank,

Yes the CMakeLists.txt file in TensorFlow needs some modifications; it does not properly set the AVX/AVX2 flags.

Find this line in the CMakeLists.txt file in ./tensorflow/contrib/cmake/:

if (tensorflow_OPTIMIZE_FOR_NATIVE_ARCH)
  include(CheckCXXCompilerFlag)
  CHECK_CXX_COMPILER_FLAG("-march=native" COMPILER_OPT_ARCH_NATIVE_SUPPORTED)
  if (COMPILER_OPT_ARCH_NATIVE_SUPPORTED)
    set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -march=native")
  endif()
endif()

And change it to:

if (tensorflow_OPTIMIZE_FOR_NATIVE_ARCH)
  include(CheckCXXCompilerFlag)
  CHECK_CXX_COMPILER_FLAG("-march=native" COMPILER_OPT_ARCH_NATIVE_SUPPORTED)
  if (WIN32)
	  if (COMPILER_OPT_ARCH_NATIVE_SUPPORTED)
		set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} /arch:AVX /arch:AVX2")
	  endif()
  else()
	if (COMPILER_OPT_ARCH_NATIVE_SUPPORTED)
		set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -march=native")
	endif()
  endif()
endif()

Sort of a hacky way to force enable it, but it'll do the job for now. I need to do another pull request with a better fix.

If you get error C1001 when compiling with GPU support, take a look at this thread:

#9470

I'm not sure if it does it on the master branch, but I had the problem as of v1.3.1

yang0773 · 2017-10-15T07:31:14Z

@scottmudge

I tried your hacky way, but it still indicated the issue of “was not compiled to use: AVX2”. I noticed the information in configuration stage on my platform,
-- Performing Test COMPILER_OPT_ARCH_NATIVE_SUPPORTED
-- Performing Test COMPILER_OPT_ARCH_NATIVE_SUPPORTED - Failed
-- Performing Test COMPILER_OPT_WIN_CPU_SIMD_SUPPORTED
-- Performing Test COMPILER_OPT_WIN_CPU_SIMD_SUPPORTED - Success

I got the code from tensorflow:master with the latest commit as follow,
commit 10c871e
Merge: 87ac990 188297f
Author: Shanqing Cai cais@google.com
Date: Mon Oct 9 09:34:12 2017 -0400

Finally I modified one line in CMakeLists.txt with adding "/arch:AVX2", and it seems to work. Haha, it is another hacky way.

if (tensorflow_WIN_CPU_SIMD_OPTIONS)
if (WIN32)
CHECK_CXX_COMPILER_FLAG("${tensorflow_WIN_CPU_SIMD_OPTIONS}" COMPILER_OPT_WIN_CPU_SIMD_SUPPORTED)
if(COMPILER_OPT_WIN_CPU_SIMD_SUPPORTED)
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${tensorflow_WIN_CPU_SIMD_OPTIONS} /arch:AVX2")
else()
message(FATAL_ERROR "${tensorflow_WIN_CPU_SIMD_OPTIONS} not supported")
endif()
endif()
endif()

Thanks so much for your help.

iNomaD · 2017-10-30T14:57:55Z

@yang0773
Can you please share python wheels for Windows with AVX2 support?
I've been trying to build tensorflow for couple of days, but still get stupid fatal error C1002...

scottmudge · 2017-10-30T15:53:33Z

@iNomaD Here is one I compiled, v1.3.1 w/ AVX2, GPU (up to CUDA 6.1), x64 for Windows:

https://github.com/scottmudge/tensorflow/releases/download/v1.3.1_mod/tensorflow_gpu-1.3.1-cp36-cp36m-win_amd64.whl

caikehe · 2017-11-24T03:03:29Z

Actually tf on windows can be built like this:

cmake .. -A x64 -DCMAKE_BUILD_TYPE=Release ^
-DSWIG_EXECUTABLE=C:\local\swigwin-3.0.10\swig.exe ^
-DPYTHON_EXECUTABLE=C:\local\Anaconda3-4.1.1-Windows-x86_64\python.exe ^
-DPYTHON_LIBRARIES=C:\local\Anaconda3-4.1.1-Windows-x86_64\libs\python35.lib ^
-Dtensorflow_ENABLE_GPU=ON ^
-DCUDNN_HOME="C:\local\cudnn-8.0-v5.1\cuda" ^
-Dtensorflow_WIN_CPU_SIMD_OPTIONS=/arch:AVX2

The last option can be: =/arch:AVX2, then log "Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2" is disappeared.

cuevas1208 · 2017-11-29T14:29:08Z

can you build it using AVX2 and AVX at the same time?

gunan · 2017-11-29T18:28:09Z

https://docs.microsoft.com/en-us/cpp/preprocessor/predefined-macros

In Visual C++, when you set /arch:AVX2 both AVX and AVX2 are used.

Summary: Fix missing functions for MSVC 2015 Inspired by tensorflow/tensorflow#13525 Closes #9023 Reviewed By: soumith Differential Revision: D8694046 Pulled By: ezyang fbshipit-source-id: 92cb7b9efd76d97a264c12a1521be550176f58d5

Summary: Fix missing functions for MSVC 2015 Inspired by tensorflow/tensorflow#13525 Closes pytorch/pytorch#9023 Reviewed By: soumith Differential Revision: D8694046 Pulled By: ezyang fbshipit-source-id: 92cb7b9efd76d97a264c12a1521be550176f58d5

Summary: Fix missing functions for MSVC 2015 Inspired by tensorflow/tensorflow#13525 Closes pytorch#9023 Reviewed By: soumith Differential Revision: D8694046 Pulled By: ezyang fbshipit-source-id: 92cb7b9efd76d97a264c12a1521be550176f58d5

scottmudge added 2 commits October 6, 2017 10:49

Fixed AVX2 support for Visual Studio 2015.

16fac9e

Fixed for portability.

9398637

googlebot added the cla: no label Oct 6, 2017

googlebot added cla: yes and removed cla: no labels Oct 6, 2017

scottmudge mentioned this pull request Oct 6, 2017

Built tensorflow CPU mode with SIMD_OPTIONS but when when opening session it warns it wasn't compiled to use SSE #10199

Closed

frankchn requested a review from benoitsteiner October 6, 2017 17:26

frankchn assigned benoitsteiner Oct 6, 2017

frankchn added the awaiting review Pull request awaiting review label Oct 6, 2017

gunan requested a review from mrry October 6, 2017 21:38

benoitsteiner approved these changes Oct 6, 2017

View reviewed changes

mrry approved these changes Oct 7, 2017

View reviewed changes

gunan added awaiting testing (then merge) and removed awaiting review Pull request awaiting review labels Oct 7, 2017

caisq merged commit 159dfb5 into tensorflow:master Oct 9, 2017

aluo-x mentioned this pull request Nov 5, 2017

Anyone please post pre-compiled wheels for windows? yaroslavvb/tensorflow-community-wheels#13

Open

peterjc123 mentioned this pull request Jun 29, 2018

Fix CUDA 8 for Windows pytorch/pytorch#9023

Closed

Conversation

scottmudge commented Oct 6, 2017

Uh oh!

tensorflow-jenkins commented Oct 6, 2017

Uh oh!

googlebot commented Oct 6, 2017

Uh oh!

scottmudge commented Oct 6, 2017

Uh oh!

googlebot commented Oct 6, 2017

Uh oh!

scottmudge commented Oct 6, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

frankchn commented Oct 6, 2017

Uh oh!

gunan commented Oct 6, 2017

Uh oh!

scottmudge commented Oct 6, 2017

Uh oh!

gunan commented Oct 6, 2017

Uh oh!

mrry commented Oct 6, 2017

Uh oh!

yang0773 commented Oct 12, 2017

Uh oh!

scottmudge commented Oct 13, 2017

Uh oh!

yang0773 commented Oct 15, 2017

Uh oh!

iNomaD commented Oct 30, 2017

Uh oh!

scottmudge commented Oct 30, 2017

Uh oh!

caikehe commented Nov 24, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cuevas1208 commented Nov 29, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gunan commented Nov 29, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

12 participants

scottmudge commented Oct 6, 2017 •

edited

Loading

caikehe commented Nov 24, 2017 •

edited

Loading

cuevas1208 commented Nov 29, 2017 •

edited

Loading

gunan commented Nov 29, 2017 •

edited

Loading