Skip to content

Fix detection of NCCL_INCLUDE_DIR#2877

Merged
soumith merged 4 commits intopytorch:masterfrom
ducksoup:nccl-detection
Sep 29, 2017
Merged

Fix detection of NCCL_INCLUDE_DIR#2877
soumith merged 4 commits intopytorch:masterfrom
ducksoup:nccl-detection

Conversation

@ducksoup
Copy link
Contributor

NCCL_INCLUDE_DIR is now correctly detected when libnccl* is installed in /usr/lib/x86_64-linux-gnu/, /usr/lib/powerpc64le-linux-gnu/ or /usr/lib/aarch64-linux-gnu/.

The new code allows the headers to reside in any include folder which is a sub-folder of any parent of NCCL_LIB_DIR.

@soumith
Copy link
Collaborator

soumith commented Sep 27, 2017

the problem is that this is currently not allowed by gloo's cmake detection, which is why I enforced that NCCL_INCLUDE_DIR="$NCCL_LIB_DIR/../include" or $NCCL_ROOT_DIR/include.

Out of all the post-fixes you posted, the only one allowed in the NCCL detection for gloo to have a subdir is ${NCCL_ROOT_DIR}/lib/x86_64-linux-gnu
See: https://github.com/facebookincubator/gloo/blob/master/cmake/Modules/Findnccl.cmake#L22

I think this can be easily fixed by sending a PR to gloo first that allows it to specify NCCL_INCLUDE_DIR and NCCL_LIB_DIR separately as well, instead of just NCCL_ROOT_DIR

cc: @pietern

@ducksoup
Copy link
Contributor Author

I guess there are three separate issues here:

  1. nccl.py is not able to correctly detect nccl when the lib is in any of the /usr/lib/${ARCH}-linux-gnu folders and the header in /usr/local (this is the default when installing nccl 2 from the deb package in Ubuntu).
  2. gloo's current cmake only handles x86_64-linux-gnu, but fails to locate nccl in powerpc64le-linux-gnu and aarch64-linux-gnu
  3. gloo's current cmake doesn't have the option form manually-specified NCCL_INCLUDE_DIR and NCCL_LIB_DIR

I'm not entirely sure how to solve (3) as I've zero experience with cmake, but I could modify this PR to only solve (1) and create a PR for gloo to solve (2).

@pietern
Copy link
Contributor

pietern commented Sep 27, 2017

@soumith @ducksoup Easy enough, let me change that. It only checks the x86_64-linux-gnu directory because that's the tree that's packaged in the tarball. Prefixing NCCL_INCLUDE_DIR and NCCL_LIB_DIR looks like a good solution. It won't matter anymore where you store it.

@pietern
Copy link
Contributor

pietern commented Sep 27, 2017

@ducksoup Can you verify that the linked PR solved the issue for you?

facebook-github-bot pushed a commit to pytorch/gloo that referenced this pull request Sep 27, 2017
Summary:
It is flexible to have both XYZ_INCLUDE_DIR and XYZ_LIB_DIR for each dependency instead of just XYZ_ROOT_DIR. The gloo build should work with whatever directory structure is used for dependencies.

This change also slightly changes the CMake variable naming to remove some cognitive overhead regarding upcase/downcase variables and multiple messages in the CMake output.

Also see pytorch/pytorch#2877
Closes #91

Differential Revision: D5925849

Pulled By: pietern

fbshipit-source-id: 66bf7699f736f51e197366c61a9d946bdf1ccec4
@ngimel
Copy link
Collaborator

ngimel commented Sep 27, 2017

Can we get this merged? In the current state there's no way to use deb-installed nccl, because after #2853 nccl.h is supposed to be in ../include/ relative to where library is, and that's not the case for deb-installed nccl (nccl.h is in /usr/include, libnccl is in /usr/lib/(ARCH)-linux-gnu)

if os.getenv('NCCL_INCLUDE_DIR') is not None:
warnings.warn("Ignoring environment variable NCCL_INCLUDE_DIR because "
"NCCL_INCLUDE_DIR is implicitly assumed as "
"$NCCL_ROOT_DIR/include or $NCCL_LIB_DIR/../include")

@ducksoup
Copy link
Contributor Author

@pietern Confirmed, issue is solved with the linked PR.

@ducksoup
Copy link
Contributor Author

@soumith @ngimel PR updated to follow a similar logic to the new one in gloo. This builds successfully with nccl 2 and CUDA 9 on Ubuntu 16.04, potentially solving #2755

@soumith soumith merged commit e67c2bc into pytorch:master Sep 29, 2017
@soumith
Copy link
Collaborator

soumith commented Sep 29, 2017

thanks @ducksoup !

xuhdev added a commit to xuhdev/pytorch that referenced this pull request Jul 23, 2019
…ipt.

---

How does the current code subsume all detections in the deleted `nccl.py`?

- The dependency of `USE_NCCL` on the OS and `USE_CUDA` is handled as dependency options in `CMakeLists.txt`.

- The main NCCL detection happens in [FindNCCL.cmake](https://github.com/pytorch/pytorch/blob/8377d4b32c12206a0f9401e81a5e5796c8fc01a8/cmake/Modules/FindNCCL.cmake), which is called by [nccl.cmake](https://github.com/pytorch/pytorch/blob/8377d4b32c12206a0f9401e81a5e5796c8fc01a8/cmake/External/nccl.cmake). When `USE_SYSTEM_NCCL` is false, the previous Python code defer the detection to `find_package(NCCL)`. The change in `nccl.cmake` retains this.

- `USE_STATIC_NCCL` in the previous Python code simply changes the name of the detected library. This is done in `IF (USE_STATIC_NCCL)`.

- Now we only need to look at how the lines below line 20 in `nccl.cmake` are subsumed. These lines list paths to header and library directories that NCCL headers and libraries may reside in and try to search these directories for the key header and library files in turn. These are done by `find_path` for headers and `find_library` for the library files in `FindNCCL.cmake`.
  * The call of [find_path](https://cmake.org/cmake/help/v3.8/command/find_path.html) (Search for `NO_DEFAULT_PATH` in the link) by default searches for headers in `<prefix>/include` for each `<prefix>` in `CMAKE_PREFIX_PATH` and `CMAKE_SYSTEM_PREFIX_PATH`. Like the Python code, this commit sets `CMAKE_PREFIX_PATH` to search for `<prefix>` in `NCCL_ROOT_DIR` and home to CUDA.  `CMAKE_SYSTEM_PREFIX_PATH` includes the standard directories such as `/usr/local` and `/usr`. `NCCL_INCLUDE_DIR` is also specifically handled.

  * Similarly, the call of [find_library](https://cmake.org/cmake/help/v3.8/command/find_library.html) (Search for `NO_DEFAULT_PATH` in the link) by default searches for libraries in directories including `<prefix>/lib` for each `<prefix>` in `CMAKE_PREFIX_PATH` and `CMAKE_SYSTEM_PREFIX_PATH`. But it also handles the edge cases intended to be solved in the Python code more properly:
     - It only searches for `<prefix>/lib64` (and `<prefix>/lib32`) if it is appropriate on the system.
     - It only searches for `<prefix>/lib/<arch>` for the right `<arch>`, unlike the Python code searches for `lib/<arch>` in a generic way (e.g., the Python code searches for `/usr/lib/x86_64-linux-gnu` but in reality systems have `/usr/lib/x86_64-some-customized-name-linux-gnu`, see https://unix.stackexchange.com/a/226180/38242 ).

---

Regarding for relevant issues:

- pytorch#12063 and pytorch#2877: These are properly handled, as explained in the updated comment.
- pytorch#2941 does not changes NCCL detection specifically for Windows (it changed CUDA detection).
- b7e258f A versioned library detection is added, but the order is reversed: The unversioned library becomes preferred. This is because normally unversioned libraries are linked to versioned libraries and preferred by users, and local installation by users are often unversioned. Like the document of [find_library](https://cmake.org/cmake/help/v3.8/command/find_library.html) suggests:

> When using this to specify names with and without a version suffix, we recommend specifying the unversioned name first so that locally-built packages can be found before those provided by distributions.
facebook-github-bot pushed a commit that referenced this pull request Jul 23, 2019
…ipt. (#22930)

Summary:
 ---

How does the current code subsume all detections in the deleted `nccl.py`?

- The dependency of `USE_NCCL` on the OS and `USE_CUDA` is handled as dependency options in `CMakeLists.txt`.

- The main NCCL detection happens in [FindNCCL.cmake](https://github.com/pytorch/pytorch/blob/8377d4b32c12206a0f9401e81a5e5796c8fc01a8/cmake/Modules/FindNCCL.cmake), which is called by [nccl.cmake](https://github.com/pytorch/pytorch/blob/8377d4b32c12206a0f9401e81a5e5796c8fc01a8/cmake/External/nccl.cmake). When `USE_SYSTEM_NCCL` is false, the previous Python code defer the detection to `find_package(NCCL)`. The change in `nccl.cmake` retains this.

- `USE_STATIC_NCCL` in the previous Python code simply changes the name of the detected library. This is done in `IF (USE_STATIC_NCCL)`.

- Now we only need to look at how the lines below line 20 in `nccl.cmake` are subsumed. These lines list paths to header and library directories that NCCL headers and libraries may reside in and try to search these directories for the key header and library files in turn. These are done by `find_path` for headers and `find_library` for the library files in `FindNCCL.cmake`.
  * The call of [find_path](https://cmake.org/cmake/help/v3.8/command/find_path.html) (Search for `NO_DEFAULT_PATH` in the link) by default searches for headers in `<prefix>/include` for each `<prefix>` in `CMAKE_PREFIX_PATH` and `CMAKE_SYSTEM_PREFIX_PATH`. Like the Python code, this commit sets `CMAKE_PREFIX_PATH` to search for `<prefix>` in `NCCL_ROOT_DIR` and home to CUDA.  `CMAKE_SYSTEM_PREFIX_PATH` includes the standard directories such as `/usr/local` and `/usr`. `NCCL_INCLUDE_DIR` is also specifically handled.

  * Similarly, the call of [find_library](https://cmake.org/cmake/help/v3.8/command/find_library.html) (Search for `NO_DEFAULT_PATH` in the link) by default searches for libraries in directories including `<prefix>/lib` for each `<prefix>` in `CMAKE_PREFIX_PATH` and `CMAKE_SYSTEM_PREFIX_PATH`. But it also handles the edge cases intended to be solved in the Python code more properly:
     - It only searches for `<prefix>/lib64` (and `<prefix>/lib32`) if it is appropriate on the system.
     - It only searches for `<prefix>/lib/<arch>` for the right `<arch>`, unlike the Python code searches for `lib/<arch>` in a generic way (e.g., the Python code searches for `/usr/lib/x86_64-linux-gnu` but in reality systems have `/usr/lib/x86_64-some-customized-name-linux-gnu`, see https://unix.stackexchange.com/a/226180/38242 ).

 ---

Regarding for relevant issues:

- #12063 and #2877: These are properly handled, as explained in the updated comment.
- #2941 does not changes NCCL detection specifically for Windows (it changed CUDA detection).
- b7e258f A versioned library detection is added, but the order is reversed: The unversioned library becomes preferred. This is because normally unversioned libraries are linked to versioned libraries and preferred by users, and local installation by users are often unversioned. Like the document of [find_library](https://cmake.org/cmake/help/v3.8/command/find_library.html) suggests:

> When using this to specify names with and without a version suffix, we recommend specifying the unversioned name first so that locally-built packages can be found before those provided by distributions.
Pull Request resolved: #22930

Differential Revision: D16440275

Pulled By: ezyang

fbshipit-source-id: 11fe80743d4fe89b1ed6f96d5d996496e8ec01aa
zdevito pushed a commit to zdevito/ATen that referenced this pull request Jul 23, 2019
…ipt. (#22930)

Summary:
 ---

How does the current code subsume all detections in the deleted `nccl.py`?

- The dependency of `USE_NCCL` on the OS and `USE_CUDA` is handled as dependency options in `CMakeLists.txt`.

- The main NCCL detection happens in [FindNCCL.cmake](https://github.com/pytorch/pytorch/blob/8377d4b32c12206a0f9401e81a5e5796c8fc01a8/cmake/Modules/FindNCCL.cmake), which is called by [nccl.cmake](https://github.com/pytorch/pytorch/blob/8377d4b32c12206a0f9401e81a5e5796c8fc01a8/cmake/External/nccl.cmake). When `USE_SYSTEM_NCCL` is false, the previous Python code defer the detection to `find_package(NCCL)`. The change in `nccl.cmake` retains this.

- `USE_STATIC_NCCL` in the previous Python code simply changes the name of the detected library. This is done in `IF (USE_STATIC_NCCL)`.

- Now we only need to look at how the lines below line 20 in `nccl.cmake` are subsumed. These lines list paths to header and library directories that NCCL headers and libraries may reside in and try to search these directories for the key header and library files in turn. These are done by `find_path` for headers and `find_library` for the library files in `FindNCCL.cmake`.
  * The call of [find_path](https://cmake.org/cmake/help/v3.8/command/find_path.html) (Search for `NO_DEFAULT_PATH` in the link) by default searches for headers in `<prefix>/include` for each `<prefix>` in `CMAKE_PREFIX_PATH` and `CMAKE_SYSTEM_PREFIX_PATH`. Like the Python code, this commit sets `CMAKE_PREFIX_PATH` to search for `<prefix>` in `NCCL_ROOT_DIR` and home to CUDA.  `CMAKE_SYSTEM_PREFIX_PATH` includes the standard directories such as `/usr/local` and `/usr`. `NCCL_INCLUDE_DIR` is also specifically handled.

  * Similarly, the call of [find_library](https://cmake.org/cmake/help/v3.8/command/find_library.html) (Search for `NO_DEFAULT_PATH` in the link) by default searches for libraries in directories including `<prefix>/lib` for each `<prefix>` in `CMAKE_PREFIX_PATH` and `CMAKE_SYSTEM_PREFIX_PATH`. But it also handles the edge cases intended to be solved in the Python code more properly:
     - It only searches for `<prefix>/lib64` (and `<prefix>/lib32`) if it is appropriate on the system.
     - It only searches for `<prefix>/lib/<arch>` for the right `<arch>`, unlike the Python code searches for `lib/<arch>` in a generic way (e.g., the Python code searches for `/usr/lib/x86_64-linux-gnu` but in reality systems have `/usr/lib/x86_64-some-customized-name-linux-gnu`, see https://unix.stackexchange.com/a/226180/38242 ).

 ---

Regarding for relevant issues:

- pytorch/pytorch#12063 and pytorch/pytorch#2877: These are properly handled, as explained in the updated comment.
- pytorch/pytorch#2941 does not changes NCCL detection specifically for Windows (it changed CUDA detection).
- b7e258f81ef61d19b884194cdbcd6c7089636d46 A versioned library detection is added, but the order is reversed: The unversioned library becomes preferred. This is because normally unversioned libraries are linked to versioned libraries and preferred by users, and local installation by users are often unversioned. Like the document of [find_library](https://cmake.org/cmake/help/v3.8/command/find_library.html) suggests:

> When using this to specify names with and without a version suffix, we recommend specifying the unversioned name first so that locally-built packages can be found before those provided by distributions.
Pull Request resolved: pytorch/pytorch#22930

Differential Revision: D16440275

Pulled By: ezyang

fbshipit-source-id: 11fe80743d4fe89b1ed6f96d5d996496e8ec01aa
cosmicEarth pushed a commit to cosmicEarth/gloo that referenced this pull request Apr 9, 2024
Summary:
It is flexible to have both XYZ_INCLUDE_DIR and XYZ_LIB_DIR for each dependency instead of just XYZ_ROOT_DIR. The gloo build should work with whatever directory structure is used for dependencies.

This change also slightly changes the CMake variable naming to remove some cognitive overhead regarding upcase/downcase variables and multiple messages in the CMake output.

Also see pytorch/pytorch#2877
Closes pytorch/gloo#91

Differential Revision: D5925849

Pulled By: pietern

fbshipit-source-id: 66bf7699f736f51e197366c61a9d946bdf1ccec4
rraminen pushed a commit to rraminen/pytorch that referenced this pull request Jan 20, 2026
…A 3.5 (Strix Point/Halo) (pytorch#2877)

Reverting prefered hipBLASLt backend for RDNA 3 and RDNA 3.5 (Strix
Point/Halo) due to several internal tickets opened for perf regression
compared to torch builds before merging
ROCm#2741
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants