Skip to content

Build the cupy.cuda.cub module by default#2584

Merged
kmaehashi merged 24 commits intocupy:masterfrom
leofang:cub_header2
Jul 4, 2020
Merged

Build the cupy.cuda.cub module by default#2584
kmaehashi merged 24 commits intocupy:masterfrom
leofang:cub_header2

Conversation

@leofang
Copy link
Copy Markdown
Member

@leofang leofang commented Oct 29, 2019

Close #3078. Close #3075. Close #3108. Close #3507.

This PR includes all the CUB v1.8.0 headers (see notes below) so that the cupy.cuda.cub module can be built by default. This also avoids the need of documenting how to build it.

Note that the CUB_PATH env variable is no longer needed.

TODO:

Note:

  1. The CUB project is open source under the BSD license, so redistributing its source code is allowed if the license file is included.
  2. Since CUB is a header-only library, many of its files are actually needed. Below is the list of CUB headers included when compiling cupy/cuda/cupy_cub.cu (obtained via cpp -MM -I(...) cupy/cuda/cupy_cub.cu):
cub/agent/agent_reduce.cuh
cub/agent/agent_reduce_by_key.cuh
cub/agent/agent_scan.cuh
cub/agent/single_pass_scan_operators.cuh
cub/block/block_discontinuity.cuh
cub/block/block_exchange.cuh
cub/block/block_load.cuh
cub/block/block_raking_layout.cuh
cub/block/block_reduce.cuh
cub/block/block_scan.cuh
cub/block/block_store.cuh
cub/block/specializations/block_reduce_raking.cuh
cub/block/specializations/block_reduce_raking_commutative_only.cuh
cub/block/specializations/block_reduce_warp_reductions.cuh
cub/block/specializations/block_scan_raking.cuh
cub/block/specializations/block_scan_warp_scans.cuh
cub/device/device_reduce.cuh
cub/device/device_segmented_reduce.cuh
cub/device/dispatch/dispatch_reduce.cuh
cub/device/dispatch/dispatch_reduce_by_key.cuh
cub/device/dispatch/dispatch_scan.cuh
cub/grid/grid_even_share.cuh
cub/grid/grid_mapping.cuh
cub/grid/grid_queue.cuh
cub/iterator/arg_index_input_iterator.cuh
cub/iterator/cache_modified_input_iterator.cuh
cub/iterator/constant_input_iterator.cuh
cub/thread/thread_load.cuh
cub/thread/thread_operators.cuh
cub/thread/thread_reduce.cuh
cub/thread/thread_scan.cuh
cub/thread/thread_store.cuh
cub/util_arch.cuh
cub/util_debug.cuh
cub/util_device.cuh
cub/util_macro.cuh
cub/util_namespace.cuh
cub/util_ptx.cuh
cub/util_type.cuh
cub/warp/specializations/warp_reduce_shfl.cuh
cub/warp/specializations/warp_reduce_smem.cuh
cub/warp/specializations/warp_scan_shfl.cuh
cub/warp/specializations/warp_scan_smem.cuh
cub/warp/warp_reduce.cuh
cub/warp/warp_scan.cuh

Since so many of them are needed, we might as well copy the entire cub folder for future extensibility and easier maintainability.
3. CUB v1.8.0 is quite stable (no new release since Feb 2018), suitable to be a dependency.

@leofang
Copy link
Copy Markdown
Member Author

leofang commented Oct 29, 2019

Since so many of them are needed, we might as well copy the entire cub folder for future extensibility and easier maintainability.

For example, #2562 needs to include more CUB headers in addition to those listed above.

@leofang leofang changed the title Build the cupy.cuda.cub module by default [WIP] Build the cupy.cuda.cub module by default Oct 30, 2019
@leofang leofang changed the title [WIP] Build the cupy.cuda.cub module by default Build the cupy.cuda.cub module by default Nov 4, 2019
@leofang
Copy link
Copy Markdown
Member Author

leofang commented Nov 5, 2019

Removed the WIP tag from title since the test is separated from this PR.

@leofang
Copy link
Copy Markdown
Member Author

leofang commented Nov 6, 2019

I found that Thrust comes with recent CUDA Toolkits, and CUB is part of Thrust in /usr/local/cuda/include/thrust/system/cuda/detail/cub/. So one could also build the cupy.cuda.cub module by detecting if the CUB header is present there. But, I don't know about the version requirement for CUB for the cub module. I have neither the resource nor time to test them all...

@kmaehashi
Copy link
Copy Markdown
Member

+1 to build CUB by default, and to disable CUB support for backward compatibility.

How about bundling CUB as submodule instead of copying all code tree into CuPy? It makes the version bundled with CuPy clear, and it is also easer to bump version of CUB when new version is released.

Using CUB under thrust directory seems not recommended by NVIDIA: https://devtalk.nvidia.com/default/topic/1035921/unable-to-include-cub/

@leofang
Copy link
Copy Markdown
Member Author

leofang commented Nov 8, 2019

How about bundling CUB as submodule instead of copying all code tree into CuPy? It makes the version bundled with CuPy clear, and it is also easer to bump version of CUB when new version is released.

That's a great idea I actually considered. I just wasn't sure whether the team likes a submodule or not, because it'd need some changes in both the CI and programers' daily workflows. Given that Thrust headers were copied over, I thought I may give it a try.

A submodule is also good in that I won't have to contribute over 40k lines in a PR...Sounds a bit crazy 😜

Using CUB under thrust directory seems not recommended by NVIDIA: https://devtalk.nvidia.com/default/topic/1035921/unable-to-include-cub/

Good finding, thanks a lot @kmaehashi! I wish NVIDIA folks document this kind of things. The devtalk forum is a horrible place to search and refer to...

@leofang
Copy link
Copy Markdown
Member Author

leofang commented Nov 8, 2019

Done. PTAL. @kmaehashi, You may need to change your CI infrastructure to checkout the repo recursively in order to test this PR.

When this PR is (about to be) merged, it's also better to make an announcement to Slack/mailing list/whatever channel you have, so that internal and external contributors are aware of this interruptive change. Git submodule can be a bit painful to work with for inexperienced developers (including me...)

@jakirkham
Copy link
Copy Markdown
Member

cc @anaruse (for awareness)

@leofang
Copy link
Copy Markdown
Member Author

leofang commented Nov 8, 2019

Thanks, @jakirkham. Also pinging @grlee77 @y1r, although this PR is still on an early stage.

@leofang
Copy link
Copy Markdown
Member Author

leofang commented Nov 8, 2019

@kmaehashi This might be a future concern, but I could also try to address it here if you can provide some instructions: How to not build the cub module if the build target is ROCm/HIP?

@kmaehashi
Copy link
Copy Markdown
Member

Could you squash all commits into one, as the commit 6237353 is huge?

You may need to change your CI infrastructure to checkout the repo recursively in order to test this PR.

Sure I will.

@kmaehashi
Copy link
Copy Markdown
Member

kmaehashi commented Nov 11, 2019

When this PR is (about to be) merged, it's also better to make an announcement to Slack/mailing list/whatever channel you have, so that internal and external contributors are aware of this interruptive change. Git submodule can be a bit painful to work with for inexperienced developers (including me...)

This is optional comment, but if you have time, could you update the build script to print some nice message (submodule is not checked out) when cub directory is empty?

@leofang
Copy link
Copy Markdown
Member Author

leofang commented Nov 11, 2019

@kmaehashi By the way, how about #2584 (comment)?

Could you squash all commits into one, as the commit 6237353 is huge?

Done.

This is optional comment, but if you have time, could you update the build script to print some nice message (submodule is not checked out) when cub directory is empty?

Done. PTAL.

@leofang
Copy link
Copy Markdown
Member Author

leofang commented Nov 11, 2019

https://ci.appveyor.com/project/cupy/cupy/builds/28761775#L164 shows the error message works 😆

@leofang
Copy link
Copy Markdown
Member Author

leofang commented Nov 12, 2019

Now the CUB module is imported everywhere, but whether it's used or not solely depends on the flag cupy.cuda.cub_enabled.

@kmaehashi
Copy link
Copy Markdown
Member

@kmaehashi By the way, how about #2584 (comment)?

You have nothing to do about it, only cuda and cusolver are used during ROCm build. (c.f. #2631)

@grlee77
Copy link
Copy Markdown
Member

grlee77 commented Nov 12, 2019

Thanks for working on this @leofang. I agree with the submodule approach

I haven't really worked with submodules on the CI services, but maybe something like the following would get Appveyor to checkout the submodule: appveyor/ci#899 (comment) (there are others potential solutions in that thread as well).

@kmaehashi
Copy link
Copy Markdown
Member

Hi @kmaehashi, I resolved the conflicts with master and addressed what's discussed in #3445. But I think this PR will conflict with #3461, so let me know which one you'll merge first and I can act accordingly. Thanks.

Sorry for the late reply, I'm going to merge this PR first.
pfnCI, test this please.

@pfn-ci-bot
Copy link
Copy Markdown
Collaborator

Successfully created a job for commit 3f0a3dd:

@chainer-ci
Copy link
Copy Markdown
Member

Jenkins CI test (for commit 3f0a3dd, target branch master) failed with status FAILURE.

@kmaehashi
Copy link
Copy Markdown
Member

kmaehashi commented Jul 2, 2020

blocked by chainer/chainer-test#583 done

@leofang
Copy link
Copy Markdown
Member Author

leofang commented Jul 2, 2020

Note: in cupy-py3-cub the CUB tests from #2598 and #3428 are skipped because CUB_DISABLED is unset in chainer/chainer-test#582. They will be re-enabled when #3461 is tested.

@kmaehashi
Copy link
Copy Markdown
Member

pfnCI, test this please.

@pfn-ci-bot
Copy link
Copy Markdown
Collaborator

Successfully created a job for commit 3f0a3dd:

@chainer-ci
Copy link
Copy Markdown
Member

Jenkins CI test (for commit 3f0a3dd, target branch master) failed with status FAILURE.

@leofang
Copy link
Copy Markdown
Member Author

leofang commented Jul 3, 2020

Is chainer-doc on CUDA 8.0?

@kmaehashi
Copy link
Copy Markdown
Member

pfnCI, test this please.

@pfn-ci-bot
Copy link
Copy Markdown
Collaborator

Successfully created a job for commit 3f0a3dd:

@kmaehashi
Copy link
Copy Markdown
Member

For some reason chainer-doc is not skipped for v8 tests, I'm going to resolve shortly.

@kmaehashi
Copy link
Copy Markdown
Member

chainer/chainer-test#585 should fix

@chainer-ci
Copy link
Copy Markdown
Member

Jenkins CI test (for commit 3f0a3dd, target branch master) failed with status FAILURE.

@kmaehashi
Copy link
Copy Markdown
Member

pfnCI, test this please.

@pfn-ci-bot
Copy link
Copy Markdown
Collaborator

Successfully created a job for commit 3f0a3dd:

@chainer-ci
Copy link
Copy Markdown
Member

Jenkins CI test (for commit 3f0a3dd, target branch master) succeeded!

@kmaehashi kmaehashi merged commit 50bac85 into cupy:master Jul 4, 2020
@kmaehashi
Copy link
Copy Markdown
Member

LGTM!

@leofang leofang deleted the cub_header2 branch July 4, 2020 12:09
@leofang
Copy link
Copy Markdown
Member Author

leofang commented Jul 4, 2020

Thank you, @kmaehashi! Glad to see it's in!

Quick question: Do you plan to backport this? 😛

@jakirkham
Copy link
Copy Markdown
Member

Awesome! Thank you all for the hard work here! 😀

@chainer-ci
Copy link
Copy Markdown
Member

Jenkins CI test (for commit 3f0a3dd, target branch master) failed with status FAILURE.

@kmaehashi kmaehashi added cat:enhancement Improvements to existing features no-compat Disrupts backward compatibility labels Jul 8, 2020
@kmaehashi
Copy link
Copy Markdown
Member

Quick question: Do you plan to backport this? 😛

This changed the minimum g++ requirement, thus no backport 😉

@leofang
Copy link
Copy Markdown
Member Author

leofang commented Jul 8, 2020

Ah, I see. I didn't consider that. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cat:enhancement Improvements to existing features no-compat Disrupts backward compatibility

Projects

None yet

9 participants