Add spectral bipartition community finding and greedy bipartition using node swaps#8347
Conversation
move kernighan_lin.py and tests to bipartitions.py and tests/test_bipartitions.py merge spectral with kernighan_lin and call it bipartitions merge new functions into bipartitions, tests and docs Adds modularity measure for communities. Sat Sep 8 19:24:56 2012 +0200 Co-authored-by: Moritz Emanuel Beber <m.beber@jacobs-university.de> Co-authored-by: Jeffrey Finkelstein <jeffrey.finkelstein@gmail.com>
| ] | ||
|
|
||
| for i in range(max_iter): | ||
| costs = list(_kernighan_lin_sweep(edges, side)) |
There was a problem hiding this comment.
Is there any value on having the lin_sweep being an iterator if we just convert it to list? we could make the method return the list. We could even move the the min_cost check inside and signal empty list of swaps as the condition to break.
There was a problem hiding this comment.
Are there advantages to returning a list over yielding and putting the list constructor in the main function? Similarly, would moving min inside the helper function be beneficial in some way?
dschult
left a comment
There was a problem hiding this comment.
I'm doing a review of this because it's not my code even though I refactored for this PR. And it's been long enough since my refactor for me to have "regained my innocence" and look with ~fresh eyes. I'll make these changes soon.
|
I will review too, but I'm holding off until after release since this modifies a submodule name... I don't think that's a big deal (or at least it's not a blocker for me), but I wouldn't want to introduce a technically backward-incompatible change during an RC! |
…-fold For K_L_sweep: improve names replace heap equality check with comparing sides. reduce inputs to 1 on helper function. For K-L_bisection: func remove indexing, use "seed" only when partition is not provided, split nodes in A, B early and set side based on that. For greedy_node_swap: rename C to C_best_so_far and related remove one extra deepcopy from inside the loop
rossbar
left a comment
There was a problem hiding this comment.
I took the liberty of pushing up some minor docstring/test nits but otherwise this LGTM!
I approve as-is but did leave a comment re: whether we want to do keyword-only args for the new spectral bipartition. I have no strong feeling about it and it's not a blocker for me - I leave it to @dschult !
|
|
||
|
|
||
| @nx.utils.not_implemented_for("multigraph") | ||
| def greedy_node_swap_bipartition(G, C_init=None, max_iter=10): |
There was a problem hiding this comment.
Should we enforce keyword-only args here? My sense is yes, though then it might make sense to have a slightly more descriptive kwarg name for C_init, like init_communities or init_partition. OTOH - the C_ var name matches better with the local variable names inside the function. WDYT? Alternatively it'd be fine to leave as-is!
| def greedy_node_swap_bipartition(G, C_init=None, max_iter=10): | |
| def greedy_node_swap_bipartition(G, *, C_init=None, max_iter=10): |
There was a problem hiding this comment.
Yes -- I think keyword-only args here would be good.
And you are right that C_init should be renamed.
Maybe something like initial_guess or initial_bipartition but they're kind of long.
We could go with guess, or start_from or swap_from. :)
There was a problem hiding this comment.
Personally I like precise, even if it's a bit long, so initial_bipartition is the one that stands out to me :). No blockers though, I'd be happy with whatever argument name (including the current one)!
There was a problem hiding this comment.
I have added kw-only args. And renamed C_init as init_split in this latest push. The rhyming and nice word "split" for "bipartition" were attractive. I also changed some doc wording. And I changed the varnames that used to use C to denote communities so they now use split, e.g. Cmax is now max_split and Cnext is next_split. See what you think.
Basically remove using `C` in varnames and use `split` instead.
rossbar
left a comment
There was a problem hiding this comment.
I have added kw-only args. And renamed C_init as init_split in this latest push. The rhyming and nice word "split" for "bipartition" were attractive. I also changed some doc wording. And I changed the varnames that used to use C to denote communities so they now use split, e.g. Cmax is now max_split and Cnext is next_split. See what you think.
These names all make sense to me, and everything else looks good! Thanks @dschult and @amcandio !
Add spectral community finding and greedy bipartition using node swaps. These functions are part of PR #1733 which is a reworking on PR #764 which is a collection of work from Ben Edward's GSoC project in 2011. Many other functions from that project have made their way into NetworkX. And now these bipartition functions join them. A bipartition is a splitting of a network into two communities. The idea here is to choose communities that maximize "modularity" (a measure of the strength of communities in the network based on counting edges within a community and edges between communities).
This PR adds
spectral_modularity_bipartitionandgreedy_node_swap_bipartitionfor modularity based bipartition.Adds tests and connections to docs.
It also moves some files:
kernighan_lin.pyand tests tobipartitions.pyandtests/test_bipartitions.py.spectral_modularity_bipartition()andgreedy_node_swap_bipartition()withkernighan_lin()inbipartitions.py.From the original text:
"Adds modularity measure for communities."
Sat Sep 8 19:24:56 2012 +0200