Skip to content

Fix for AVX2 support in Visual Studio#13525

Merged
caisq merged 2 commits intotensorflow:masterfrom
scottmudge:master
Oct 9, 2017
Merged

Fix for AVX2 support in Visual Studio#13525
caisq merged 2 commits intotensorflow:masterfrom
scottmudge:master

Conversation

@scottmudge
Copy link
Contributor

This is a fix for issue #10199. Visual Studio 2015 (possibly other versions) lacks definitions for _mm256_extract_epi8, -16, -32, or -64 in the immintrin.h header, nor in the associated runtime, so it must be implemented manually.

For wider portability these functions are renamed based on their required extraction indices. These intrinsics should be just as fast as the externally linked versions provided by GCC.

@tensorflow-jenkins
Copy link
Collaborator

Can one of the admins verify this patch?

@googlebot
Copy link

Thanks for your pull request. It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please visit https://cla.developers.google.com/ to sign.

Once you've signed, please reply here (e.g. I signed it!) and we'll verify. Thanks.


  • If you've already signed a CLA, it's possible we don't have your GitHub username or you're using a different email address. Check your existing CLA data and verify that your email is set on your git commits.
  • If your company signed a CLA, they designated a Point of Contact who decides which employees are authorized to participate. You may need to contact the Point of Contact for your company and ask to be added to the group of authorized contributors. If you don't know who your Point of Contact is, direct the project maintainer to go/cla#troubleshoot.
  • In order to pass this check, please resolve this problem and have the pull request author add another comment and the bot will run again.

@scottmudge
Copy link
Contributor Author

I signed it!

@googlebot
Copy link

CLAs look good, thanks!

@scottmudge
Copy link
Contributor Author

scottmudge commented Oct 6, 2017

Also note that AVX and AVX2 are not enabled by default in CMakeLists.txt, even when native arch optimization is enabled. I will place another pull request to fix this in the future, along with a number of other CMake fixes.

@frankchn
Copy link
Contributor

frankchn commented Oct 6, 2017

Jenkins, test this please.

@frankchn frankchn requested a review from benoitsteiner October 6, 2017 17:26
@frankchn frankchn added the awaiting review Pull request awaiting review label Oct 6, 2017
@gunan
Copy link
Contributor

gunan commented Oct 6, 2017

Jenkins, test this please.

@gunan gunan requested a review from mrry October 6, 2017 21:38
@scottmudge
Copy link
Contributor Author

Looks like the CI server is having some unrelated failures? Is that common?

@gunan
Copy link
Contributor

gunan commented Oct 6, 2017

Unfortunately more common than we would like.
Retrying tests.
Jenkins, test this please.

@mrry
Copy link
Contributor

mrry commented Oct 6, 2017

The change looks good to me, but I'll defer to @benoitsteiner since it's in Eigen code. (I'm not sure how or whether we pull in Eigen code from other repositories, and whether it would be better to make the change upstream first.)

@gunan gunan added awaiting testing (then merge) and removed awaiting review Pull request awaiting review labels Oct 7, 2017
@caisq caisq merged commit 159dfb5 into tensorflow:master Oct 9, 2017
@yang0773
Copy link

@scottmudge, thanks for your job about avx2, great!
I pulled the latest tensorflow project which included your commits about avx2 and compiled successfully by command:
cmake .. -A x64 -DCMAKE_BUILD_TYPE=Release -DSWIG_EXECUTABLE=C:\D\tools\swigwin-3.0.12\swig.exe -DPYTHON_EXECUTABLE=C:\D\tools\Anaconda3\python.exe -DPYTHON_LIBRARIES=C:\D\tools\Anaconda3\python35.lib -Dtensorflow_ENABLE_GPU=ON -DCUDNN_HOME="C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0" -Dtensorflow_WIN_CPU_SIMD_OPTIONS=/arch:AVX

but it still hints when running,
Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2

What else should I do to enable AVX2 for tensorflow on WINDOWS? Thanks so much.

Frank

@scottmudge
Copy link
Contributor Author

Hey Frank,

Yes the CMakeLists.txt file in TensorFlow needs some modifications; it does not properly set the AVX/AVX2 flags.

Find this line in the CMakeLists.txt file in ./tensorflow/contrib/cmake/:

if (tensorflow_OPTIMIZE_FOR_NATIVE_ARCH)
  include(CheckCXXCompilerFlag)
  CHECK_CXX_COMPILER_FLAG("-march=native" COMPILER_OPT_ARCH_NATIVE_SUPPORTED)
  if (COMPILER_OPT_ARCH_NATIVE_SUPPORTED)
    set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -march=native")
  endif()
endif()

And change it to:

if (tensorflow_OPTIMIZE_FOR_NATIVE_ARCH)
  include(CheckCXXCompilerFlag)
  CHECK_CXX_COMPILER_FLAG("-march=native" COMPILER_OPT_ARCH_NATIVE_SUPPORTED)
  if (WIN32)
	  if (COMPILER_OPT_ARCH_NATIVE_SUPPORTED)
		set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} /arch:AVX /arch:AVX2")
	  endif()
  else()
	if (COMPILER_OPT_ARCH_NATIVE_SUPPORTED)
		set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -march=native")
	endif()
  endif()
endif()

Sort of a hacky way to force enable it, but it'll do the job for now. I need to do another pull request with a better fix.

If you get error C1001 when compiling with GPU support, take a look at this thread:

#9470

I'm not sure if it does it on the master branch, but I had the problem as of v1.3.1

@yang0773
Copy link

@scottmudge

I tried your hacky way, but it still indicated the issue of “was not compiled to use: AVX2”. I noticed the information in configuration stage on my platform,
-- Performing Test COMPILER_OPT_ARCH_NATIVE_SUPPORTED
-- Performing Test COMPILER_OPT_ARCH_NATIVE_SUPPORTED - Failed
-- Performing Test COMPILER_OPT_WIN_CPU_SIMD_SUPPORTED
-- Performing Test COMPILER_OPT_WIN_CPU_SIMD_SUPPORTED - Success

I got the code from tensorflow:master with the latest commit as follow,
commit 10c871e
Merge: 87ac990 188297f
Author: Shanqing Cai cais@google.com
Date: Mon Oct 9 09:34:12 2017 -0400

Finally I modified one line in CMakeLists.txt with adding "/arch:AVX2", and it seems to work. Haha, it is another hacky way.

if (tensorflow_WIN_CPU_SIMD_OPTIONS)
if (WIN32)
CHECK_CXX_COMPILER_FLAG("${tensorflow_WIN_CPU_SIMD_OPTIONS}" COMPILER_OPT_WIN_CPU_SIMD_SUPPORTED)
if(COMPILER_OPT_WIN_CPU_SIMD_SUPPORTED)
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${tensorflow_WIN_CPU_SIMD_OPTIONS} /arch:AVX2")
else()
message(FATAL_ERROR "${tensorflow_WIN_CPU_SIMD_OPTIONS} not supported")
endif()
endif()
endif()

Thanks so much for your help.

@iNomaD
Copy link

iNomaD commented Oct 30, 2017

@yang0773
Can you please share python wheels for Windows with AVX2 support?
I've been trying to build tensorflow for couple of days, but still get stupid fatal error C1002...

@scottmudge
Copy link
Contributor Author

@iNomaD Here is one I compiled, v1.3.1 w/ AVX2, GPU (up to CUDA 6.1), x64 for Windows:

https://github.com/scottmudge/tensorflow/releases/download/v1.3.1_mod/tensorflow_gpu-1.3.1-cp36-cp36m-win_amd64.whl

@caikehe
Copy link

caikehe commented Nov 24, 2017

Actually tf on windows can be built like this:

cmake .. -A x64 -DCMAKE_BUILD_TYPE=Release ^
-DSWIG_EXECUTABLE=C:\local\swigwin-3.0.10\swig.exe ^
-DPYTHON_EXECUTABLE=C:\local\Anaconda3-4.1.1-Windows-x86_64\python.exe ^
-DPYTHON_LIBRARIES=C:\local\Anaconda3-4.1.1-Windows-x86_64\libs\python35.lib ^
-Dtensorflow_ENABLE_GPU=ON ^
-DCUDNN_HOME="C:\local\cudnn-8.0-v5.1\cuda" ^
-Dtensorflow_WIN_CPU_SIMD_OPTIONS=/arch:AVX2

The last option can be: =/arch:AVX2, then log "Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2" is disappeared.

@cuevas1208
Copy link

cuevas1208 commented Nov 29, 2017

can you build it using AVX2 and AVX at the same time?

@gunan
Copy link
Contributor

gunan commented Nov 29, 2017

https://docs.microsoft.com/en-us/cpp/preprocessor/predefined-macros

In Visual C++, when you set /arch:AVX2 both AVX and AVX2 are used.

facebook-github-bot pushed a commit to pytorch/pytorch that referenced this pull request Jun 29, 2018
Summary:
Fix missing functions for MSVC 2015
Inspired by tensorflow/tensorflow#13525
Closes #9023

Reviewed By: soumith

Differential Revision: D8694046

Pulled By: ezyang

fbshipit-source-id: 92cb7b9efd76d97a264c12a1521be550176f58d5
zdevito pushed a commit to zdevito/ATen that referenced this pull request Jun 29, 2018
Summary:
Fix missing functions for MSVC 2015
Inspired by tensorflow/tensorflow#13525
Closes pytorch/pytorch#9023

Reviewed By: soumith

Differential Revision: D8694046

Pulled By: ezyang

fbshipit-source-id: 92cb7b9efd76d97a264c12a1521be550176f58d5
zdevito pushed a commit to zdevito/ATen that referenced this pull request Jul 13, 2018
Summary:
Fix missing functions for MSVC 2015
Inspired by tensorflow/tensorflow#13525
Closes pytorch/pytorch#9023

Reviewed By: soumith

Differential Revision: D8694046

Pulled By: ezyang

fbshipit-source-id: 92cb7b9efd76d97a264c12a1521be550176f58d5
goodlux pushed a commit to goodlux/pytorch that referenced this pull request Aug 15, 2018
Summary:
Fix missing functions for MSVC 2015
Inspired by tensorflow/tensorflow#13525
Closes pytorch#9023

Reviewed By: soumith

Differential Revision: D8694046

Pulled By: ezyang

fbshipit-source-id: 92cb7b9efd76d97a264c12a1521be550176f58d5
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.