[REVIEW] Add Fused L2 Expanded KNN kernel#339
[REVIEW] Add Fused L2 Expanded KNN kernel#339rapids-bot[bot] merged 29 commits intorapidsai:branch-22.02from
Conversation
…n higher dimensions than L2 unexpanded version
…encountered, then the current atomicCAS based implementation
…ed. make function namings consistent
…licit fusedL2KNN function call
|
@cjnolet I've now reverted ball cover tests to use |
…n, make customAtomicMax float only by removing the template as it is float specific function
cjnolet
left a comment
There was a problem hiding this comment.
Changeds look great overall. Mostly minor/mechanical thing but we still need explicit gtests for these like we've done w/ other knn primitives that don't just proxy to FAISS (such as ball cover and haversine knn).
| return result; | ||
| } | ||
|
|
||
| template <typename value_t> |
There was a problem hiding this comment.
Thanks for reverting this! Though we no longer need to invoke the fused knn directly, I do still benefit to keeping the additional helper function so it's more simple to change the bfknn call across all the gtests in the future.
|
|
||
| compute_bfknn(handle, d_train_inputs.data(), d_train_inputs.data(), n, d, k, | ||
| metric, d_ref_D.data(), d_ref_I.data()); | ||
| raft::spatial::knn::detail::brute_force_knn_impl<uint32_t, int64_t>( |
There was a problem hiding this comment.
Now that we have a knn that's not just proxying down to faiss, we should be gtesting it accordingly, similar to what's being done w/ the haversine and ball cover gtests. It's also important going forward because RAFT is beginning to get used by more projects and thus the impact of breaking tests is more than just cuml.
My suggestion is to test l2_unexpanded_knn and l2_expanded_knn directly in the gtests and then we can test the brute_force_knn more generally.
There was a problem hiding this comment.
certainly a rigorous tests within RAFT are needed for them, I'll add gtests for them separately, now instead of l2_unexpanded_knn and l2_expanded_knn we have single entry fusedL2Knn function for both of them.
For these kernel I relied on cuML cpp knn tests & pytests so far which is no longer correct as you rightly mention it.
There was a problem hiding this comment.
this PR is still WIP, working on adding tests and some fixes needed due to API changes.
There was a problem hiding this comment.
a new test tests\spatial\fused_l2_knn.cu is added which does testing for both L2 exp/unexp cases which compares its output with faiss bfknn call.
I've also polished the fp32 atomicMax device function which I believe is more faster than atomicCAS based version and also takes care of NaNs.
There was a problem hiding this comment.
Apologies for the delay in updating this PR with the unit test
…o separate function which is now part of fused_l2_knn.cuh
…s not used to mimic faiss, fix issues in deviceMax atomic to filter NaNs
|
Can one of the admins verify this patch? |
|
add to allowlist |
ChuckHastings
left a comment
There was a problem hiding this comment.
Shouldn't affect cugraph
…ith same distance value exists and faiss picks one vs fusedL2KNN another, so we verify both vec index as well as distance val
|
@cjnolet can this PR get merged? |
|
@cjnolet I see this PR is marked for v22.02 so you'll be auto merging it or there is additional action required from my side? |
|
@mdoijade, I scraped through the PRs a couple weeks ago and aligned them to expected releases. Looking back through my reivew, I'm really happy for the new tests but it looks like there are still a couple (very minor) things to address. |
@cjnolet I believe now I have addressed all the points in this PR, the build failure is coming from |
|
@gpucibot merge |
-- adds fused L2 expanded kNN kernel, this is faster by at least 20-25% on higher dimensions (D >= 128) than L2 unexpanded version. -- also on smaller dimension (D <=32) L2 expanded is always faster by 10-15% -- slight improvement in updateSortedWarpQ device function by reducing redundant instruction. -- Fix incorrect output for NN >32 case when taking prod-cons knn merge path, this was caught in HDBSCAN pytest. Authors: - Mahesh Doijade (https://github.com/mdoijade) - Corey J. Nolet (https://github.com/cjnolet) Approvers: - Chuck Hastings (https://github.com/ChuckHastings) - Corey J. Nolet (https://github.com/cjnolet) URL: rapidsai#339
-- adds fused L2 expanded kNN kernel, this is faster by at least 20-25% on higher dimensions (D >= 128) than L2 unexpanded version.
-- also on smaller dimension (D <=32) L2 expanded is always faster by 10-15%
-- slight improvement in updateSortedWarpQ device function by reducing redundant instruction.
-- Fix incorrect output for NN >32 case when taking prod-cons knn merge path, this was caught in HDBSCAN pytest.