-
Notifications
You must be signed in to change notification settings - Fork 242
Fix bugs in ParallelLeiden #1328
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- code inconsistent with the comment - missed initialization Also removed the warning, based on a comparison vs GVE Leiden
|
Cc: @CxVercility @fabratu |
|
Hi there, |
|
Good catch and thanks for the PR. Based on your test results, the error is minor after all. I also looked into the code in the past, but overlooked this rather simple mistakes sigh There is still one error on the pipeline for MSVC. The described error points towards an ill-sized vector in ParallelLeiden. Based on past experience this also might be kind of a false positive. Do you have time to investigate? Otherwise I can take care of this. Just out of curiosity - what network (or networks) did you use for your comparison? It makes sense, that NetworKit Leiden is able to produce higher modularity scores, since GVE-Leiden uses a more simplistic heuristic for detecting and fixing disconnected communities. But this also might be due to non-deterministic effects. I would assume though, that the performance of GVE should be higher due to some tricks (one of them being the more simplistic cut-logic). |
Not super familiar with windows. Would appreciate help. I used the LiveJournal graph for this test. The rest of the time goes into converting between python <-> C++ vectors. Do you have any thoughts on how to optimize this? Perhaps we can discuss on a separate issue. |
|
@adsharma Alright, I will look into the Windows error. Might take 1-2 days. I also found your other pull request. The problem / use case you are trying to solve here is to have a parallel Leiden algorithm with Python front-end, correct? And the reported running time in this PR is from your proposed Python interface for GVE-Leiden against the Python interface of ParallelLeiden in NetworKit? If so, that would explain the running time parity. In the GVE-Leiden paper, the authors show a measurable performance gain with respect to ParallelLeiden in C++. Some of it is likely not reproducible if we integrate GVE into NetworKit due to existing design decisions. We currently have a student project to merge the ideas of GVE-Leiden (+ other implementations) and ParallelLeiden, thereby also fixing the previous implementation. Since the fixing part seems rather simple, we can merge this PR after I looked in the Windows error. Further improvements concerning the C++ part can then be handled by the student project. I would also recommend to discuss the overhead stemming from conversion time in another PR. Due to the number of algorithms available in NetworKit, there would a nice impact if the conversion process would be faster. Thanks again for your contribution :) |
|
Yes - let's take the discussion about further optimizations to #1331 and keep this one limited to a narrow bug fix. Thanks for looking into Windows CI check. |
|
@fabratu looks like windows CI is also passing. Please take another look. |
Also removed the warning, based on a comparison vs GVE Leiden
Related to: #1244