[REVIEW] Add Fused L2 Expanded KNN kernel by mdoijade · Pull Request #339 · rapidsai/raft

mdoijade · 2021-09-27T13:09:01Z

-- adds fused L2 expanded kNN kernel, this is faster by at least 20-25% on higher dimensions (D >= 128) than L2 unexpanded version.
-- also on smaller dimension (D <=32) L2 expanded is always faster by 10-15%
-- slight improvement in updateSortedWarpQ device function by reducing redundant instruction.
-- Fix incorrect output for NN >32 case when taking prod-cons knn merge path, this was caught in HDBSCAN pytest.

…n higher dimensions than L2 unexpanded version

…teWarpQ

…path

ci/prtest.config

…encountered, then the current atomicCAS based implementation

mdoijade · 2021-09-30T15:17:27Z

@cjnolet @teju85 @dantegd I think this PR is now in review state, please help with review whenever possible.

…ed. make function namings consistent

…licit fusedL2KNN function call

mdoijade · 2021-10-06T10:32:35Z

@cjnolet I've now reverted ball cover tests to use brute_force_knn instead of using l2_unexpanded_knn as with this PR that functionality is back in brute_force_knn

…n, make customAtomicMax float only by removing the template as it is float specific function

cjnolet

Changeds look great overall. Mostly minor/mechanical thing but we still need explicit gtests for these like we've done w/ other knn primitives that don't just proxy to FAISS (such as ball cover and haversine knn).

ci/prtest.config

cpp/include/raft/spatial/knn/detail/fused_l2_knn.cuh

cpp/include/raft/spatial/knn/detail/knn_brute_force_faiss.cuh

cpp/include/raft/spatial/knn/knn.hpp

cjnolet · 2021-10-08T18:41:53Z

cpp/test/spatial/ball_cover.cu

  return result;
 }

-template <typename value_t>


Thanks for reverting this! Though we no longer need to invoke the fused knn directly, I do still benefit to keeping the additional helper function so it's more simple to change the bfknn call across all the gtests in the future.

cjnolet · 2021-10-08T18:50:21Z

cpp/test/spatial/ball_cover.cu


-    compute_bfknn(handle, d_train_inputs.data(), d_train_inputs.data(), n, d, k,
-                  metric, d_ref_D.data(), d_ref_I.data());
+    raft::spatial::knn::detail::brute_force_knn_impl<uint32_t, int64_t>(


Now that we have a knn that's not just proxying down to faiss, we should be gtesting it accordingly, similar to what's being done w/ the haversine and ball cover gtests. It's also important going forward because RAFT is beginning to get used by more projects and thus the impact of breaking tests is more than just cuml.

My suggestion is to test l2_unexpanded_knn and l2_expanded_knn directly in the gtests and then we can test the brute_force_knn more generally.

certainly a rigorous tests within RAFT are needed for them, I'll add gtests for them separately, now instead of l2_unexpanded_knn and l2_expanded_knn we have single entry fusedL2Knn function for both of them.
For these kernel I relied on cuML cpp knn tests & pytests so far which is no longer correct as you rightly mention it.

this PR is still WIP, working on adding tests and some fixes needed due to API changes.

a new test tests\spatial\fused_l2_knn.cu is added which does testing for both L2 exp/unexp cases which compares its output with faiss bfknn call.
I've also polished the fp32 atomicMax device function which I believe is more faster than atomicCAS based version and also takes care of NaNs.

Apologies for the delay in updating this PR with the unit test

…o separate function which is now part of fused_l2_knn.cuh

…s not used to mimic faiss, fix issues in deviceMax atomic to filter NaNs

…old output

GPUtester · 2021-11-02T11:06:27Z

Can one of the admins verify this patch?

cjnolet · 2021-11-02T11:25:53Z

add to allowlist

…redunant header

ChuckHastings

Shouldn't affect cugraph

…ith same distance value exists and faiss picks one vs fusedL2KNN another, so we verify both vec index as well as distance val

mdoijade · 2021-11-11T08:02:39Z

@cjnolet can this PR get merged?

mdoijade · 2021-11-15T17:38:27Z

@cjnolet I see this PR is marked for v22.02 so you'll be auto merging it or there is additional action required from my side?

cjnolet · 2021-11-15T19:03:13Z

@mdoijade, I scraped through the PRs a couple weeks ago and aligned them to expected releases. Looking back through my reivew, I'm really happy for the new tests but it looks like there are still a couple (very minor) things to address.

…e_force_knn

mdoijade · 2021-11-17T16:45:56Z

@mdoijade, I scraped through the PRs a couple weeks ago and aligned them to expected releases. Looking back through my reivew, I'm really happy for the new tests but it looks like there are still a couple (very minor) things to address.

@cjnolet I believe now I have addressed all the points in this PR, the build failure is coming from ../test/sparse/dist_coo_spmv.cu in the CI, but locally I cannot repro this build failure.

cjnolet

LGTM

cjnolet · 2021-11-23T21:11:42Z

@gpucibot merge

-- adds fused L2 expanded kNN kernel, this is faster by at least 20-25% on higher dimensions (D >= 128) than L2 unexpanded version. -- also on smaller dimension (D <=32) L2 expanded is always faster by 10-15% -- slight improvement in updateSortedWarpQ device function by reducing redundant instruction. -- Fix incorrect output for NN >32 case when taking prod-cons knn merge path, this was caught in HDBSCAN pytest. Authors: - Mahesh Doijade (https://github.com/mdoijade) - Corey J. Nolet (https://github.com/cjnolet) Approvers: - Chuck Hastings (https://github.com/ChuckHastings) - Corey J. Nolet (https://github.com/cjnolet) URL: rapidsai#339

mdoijade added 3 commits September 24, 2021 22:18

add fused L2 expanded kNN kernel, this is faster by at least 20-25% o…

75c9f27

…n higher dimensions than L2 unexpanded version

use lid > firsActiveLane instead of bitwise left shift and & for upda…

7a1e1e6

…teWarpQ

Merge branch 'branch-21.12' into fusedL2ExpandedKNN

e655cd4

mdoijade requested review from a team as code owners September 27, 2021 13:09

github-actions bot added the cpp label Sep 27, 2021

mdoijade added 4 commits September 28, 2021 21:48

fix incorrect output for NN >32 case when taking prod-cons knn merge …

290d28d

…path

Merge branch 'branch-21.12' into fusedL2ExpandedKNN

60d9201

fix clang format issues

5f3cea1

enable testing of cuml using this raft fork

5b5f7a0

mdoijade requested a review from a team as a code owner September 28, 2021 16:30

github-actions bot added the gpuCI label Sep 28, 2021

ajschmidt8 reviewed Sep 28, 2021

View reviewed changes

ci/prtest.config Outdated Show resolved Hide resolved

mdoijade added 3 commits September 29, 2021 21:02

add custom atomicMax function which works fine if negative zeros are …

738c604

…encountered, then the current atomicCAS based implementation

merge branch-21.12 and test customAtomicMax without +0 addition

15cbda8

fix hang in raft atomicMax of fp32 when the inputs are NaNs

352cc2d

remove redundant processing.hpp included in fused_l2_knn

aa8ef09

cjnolet added 3 - Ready for Review improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Oct 5, 2021

mdoijade added 2 commits October 6, 2021 12:54

refactor fused L2 KNN main function to call both L2 expanded/unexpand…

6072281

…ed. make function namings consistent

revert ball cover test to use brute_force_knn function instead of exp…

ae14f75

…licit fusedL2KNN function call

mdoijade added 2 commits October 7, 2021 16:26

use isnan only if DeviceMax/Min operations in atomicCAS based functio…

53b6415

…n, make customAtomicMax float only by removing the template as it is float specific function

fix clang format issues

1d9ade3

cjnolet requested changes Oct 8, 2021

View reviewed changes

revert prtest.config changes, move fusedL2kNN launch/selection code t…

62cff7b

…o separate function which is now part of fused_l2_knn.cuh

github-actions bot removed the gpuCI label Oct 11, 2021

cjnolet changed the title ~~Add Fused L2 Expanded KNN kernel~~ [WIP] Add Fused L2 Expanded KNN kernel Oct 12, 2021

mdoijade added 3 commits October 13, 2021 19:45

fix bug in updateSortedWarpQ for NN > 32, disable use of sqrt as it i…

9164a64

…s not used to mimic faiss, fix issues in deviceMax atomic to filter NaNs

allocate workspace when resize is required for using prod-cons mutexes

abc2b11

add unit test for fused L2 KNN exp/unexp cases using faiss bfknn as g…

ec0cc32

…old output

github-actions bot added the CMake label Nov 2, 2021

mdoijade added 3 commits November 2, 2021 16:58

merge branch-21.12 and update fused_l2_knn.cuh with those changes

700318d

move customAtomicMax to generic atomicMax specialization, and remove …

2b64775

…redunant header

fix clang format errors

ef9a898

mdoijade changed the title ~~[WIP] Add Fused L2 Expanded KNN kernel~~ [REVIEW] Add Fused L2 Expanded KNN kernel Nov 2, 2021

ChuckHastings approved these changes Nov 2, 2021

View reviewed changes

mdoijade added 2 commits November 3, 2021 14:12

call faiss before fusedL2knn kernel in the test

b317a12

fix issues in verification function as it can happen that 2 vectors w…

9e2e19e

…ith same distance value exists and faiss picks one vs fusedL2KNN another, so we verify both vec index as well as distance val

cjnolet changed the base branch from branch-21.12 to branch-22.02 November 15, 2021 18:50

mdoijade added 2 commits November 17, 2021 20:18

Merge branch 'branch-22.02' into fusedL2ExpandedKNN

395beff

revert ball_cover test to use compute_bfknn which is wrapper for brut…

f0fd7b4

…e_force_knn

Merge branch 'branch-21.12' into fusedL2ExpandedKNN

bb099ca

github-actions bot added the gpuCI label Nov 17, 2021

cjnolet added 2 commits November 23, 2021 13:45

Merge branch 'branch-22.02' into fusedL2ExpandedKNN

a2f1dee

Adjusting rng.cuh

bdce263

github-actions bot removed the gpuCI label Nov 23, 2021

cjnolet approved these changes Nov 23, 2021

View reviewed changes

rapids-bot bot merged commit 6166a47 into rapidsai:branch-22.02 Nov 23, 2021

Conversation

mdoijade commented Sep 27, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

mdoijade commented Sep 30, 2021

Uh oh!

mdoijade commented Oct 6, 2021

Uh oh!

cjnolet left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cjnolet Oct 8, 2021

Choose a reason for hiding this comment

Uh oh!

cjnolet Oct 8, 2021

Choose a reason for hiding this comment

Uh oh!

mdoijade Oct 11, 2021

Choose a reason for hiding this comment

Uh oh!

mdoijade Oct 12, 2021

Choose a reason for hiding this comment

Uh oh!

mdoijade Nov 2, 2021

Choose a reason for hiding this comment

Uh oh!

mdoijade Nov 2, 2021

Choose a reason for hiding this comment

Uh oh!

GPUtester commented Nov 2, 2021

Uh oh!

cjnolet commented Nov 2, 2021

Uh oh!

ChuckHastings left a comment

Choose a reason for hiding this comment

Uh oh!

mdoijade commented Nov 11, 2021

Uh oh!

mdoijade commented Nov 15, 2021

Uh oh!

cjnolet commented Nov 15, 2021

Uh oh!

mdoijade commented Nov 17, 2021

Uh oh!

cjnolet left a comment

Choose a reason for hiding this comment

Uh oh!

cjnolet commented Nov 23, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

mdoijade commented Sep 27, 2021 •

edited

Loading