-
Notifications
You must be signed in to change notification settings - Fork 430
feat: Add Maximum Bipartite Matching Algorithm #740
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey, first off, thanks for the PR :) I left some comments but other than that, it looks good to me 👍 Maybe check if a rebase is necessary? (In particular instead of the merge, I am not sure which is preferred by petgraph)
I'm not a maintainer, so I don't have the final say for a merge, but will try to make someone else have a look at it too !
|
It's interesting to see benchmarks against the existing |
Here are the benchmarks from my machine. The generic implementation is the existing one and the specific one is the one from this PR. The numbers indicate a variable which indicates 3/2 times the node count (1000 -> 666 nodes in the graph): As can be seen, the difference seems to be noticeable on bigger graphs but negligible on smaller graphs. Note that I also added a new 30 benchmark to test if the generic implementation outperforms the specific one on small graphs. However, the difference increases when instead of Ford-Fulkerson, Dinics from this PR is used (I was unable to find out, why the other results change so drastically as well, since the only thing I change is the Max-Flow algorithm used in the maximum bipartite matching algorithm - I reran both benchmarks multiple times with the same results): |
The results are similar to the truth, but also impressive! Dinic's algorithm is theoretically indeed faster that current max-flow implementation without shortest path searching, and this benchmark confirms this. But to be sure, I would like to take a look at the benchmark code. Thank you for the good work done, I would like to include the benchmark code in the repository after the PR merge. |
|
Regarding:
I tried out some things, but was also unable to find a good approach. I thought about copying the entire graph and deleting all edges, but that added new problems, where the newly created graph was of type G which one would have to impose more trait bounds on. So maybe, @starovoid, you have an idea, and otherwise we just leave it at that. |
I see. What do you mean exactly by take a look at the benchmark code? The code of the benchmark is included in this PR and the linked Dinics PR in benches/matching.rs. I just copied the benchmark and used the generic maximum matching algorithm instead of the bipartite specific one: #[bench]
fn maximum_bipartite_matching_1000_generic_implementation(bench: &mut Bencher) {
let (g, partition_a, partition_b) = generate_bipartite(1_000);
let partition_a_ids: Vec<_> = partition_a
.iter()
.map(|&id| NodeIndexable::from_index(&g, id as usize))
.collect();
let partition_b_ids: Vec<_> = partition_b
.iter()
.map(|&id| NodeIndexable::from_index(&g, id as usize))
.collect();
bench.iter(|| maximum_matching(&g));
}Note that I also did not remove the partition a and b collection code since I was not sure if it counts towards the benchmark time. Probably not because it is not part of the bench closure, but I wanted to make sure |
The test from #739 on my PC showed the opposite result in the max-flow task, so the first step is to dive into First attemt Second attempt If the If it turns out to solve all the issues quickly, then I will include it in version |
Okay, interesting. I did not actually run those benchmarks yet, good catch. I will then briefly look into those as well. |
@starovoid about the Dinic's implementation and why the benchmarks in it were slower than Ford-Fulkersons, I added some commits and comments in #739 . Also, updated its description. |
2062b6b to
374679a
Compare
@RaoulLuque see Analysis > Special Cases from Dinics' Wikipedia article: |
|
added benchmarks for current |
|
Thanks for the great work 🌟 I'll have a look latest in the course of next week (does not mean someone else shouldn't / can't have a look in the meantime ^^) :) In the meantime you could check if the comments on the other PR about |
|
Hey, just wanted to let you know that I did not forget about reviewing, but currently I have exam phase at Uni and it was a bit more busy than I expected lately so I was not able to review your PR until now :o The phase will end on 24th however, and after that I should have enough time to review the PR so we can get the improvements and your work merged :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, got to the review a bit earlier than expected I guess :) Again, thanks a lot for your great work 🌞 Had only a few nitpicks 👍
(Probably you know this, but if you agree with suggested changes, you can just approve them here in GitHub :) )
| /// This function produces an undirected bipartite graph with two partitions: | ||
| /// * `partition_1`: nodes with indices divisible by 6 (i.e., `0, 6, 12, ...`); | ||
| /// * `partition_2`: nodes with odd indices (i.e., `1, 3, 5, ...`). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the docstring for the helper function is a great idea ! 🫶 However, technically I think the resulting graph has node_count_upper_bound many nodes. More precisely, the even indices not divisible by 6 will just be nodes without any edges adjacent to them. Not 100% sure, but this should be what happens when one calls from_edges. So maybe specify that as well. That is, that the resulting graph will have some node without any edges 👍
I would however suggest an alternative approach to generate the graph (with only the vertices with indices divisible by 6 or odd), by only actually adding the vertices that are needed (with indices divisible by 6 or odd) and returning a HashMap with your graph which maps the indices to the NodeIndexes in the graph.
(I am suggesting the latter since we are using these graphs to benchmark and thus they probably shouldn't contain unnecessary nodes since these might bias towards one algorithm or another if one algorithm does not handle "stale" nodes well).
| /// An edge is created between a node `a` in `partition_1` and a node `b` in `partition_b` | ||
| /// if `a` index is lower than `j` index. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| /// An edge is created between a node `a` in `partition_1` and a node `b` in `partition_b` | |
| /// if `a` index is lower than `j` index. | |
| /// An edge is created between a node with index `i` in `partition_1` and a node with index `j` in `partition_2` | |
| /// if `i` is smaller than or equal to `j`. |
|
|
||
| fn source_and_target_from_partitions<G>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| fn source_and_target_from_partitions<G>( | |
| /// Returns (endpoint_a, endpoint_b) such that endpoint_a is in `partition_a` and endpoint_b in `partition_b`. | |
| /// | |
| /// Panics if the source node is in neither partition. | |
| fn source_and_target_from_partitions<G>( |
See also #740 and #847 # Add Dinic's Maximum Flow Algorithm Computes the maximum flow of a weighted directed graph. Implementation inspired by existing [Ford Fulkerson](https://github.com/petgraph/petgraph/blob/9f778ec11b218ef6d1dfaaa51467b24c8c939fca/src/algo/maximum_flow/ford_fulkerson.rs) in Petgraph. Also, creates a `maximum_flow` submodule inside `algo` module to group flow network algorithms. References: * https://en.wikipedia.org/wiki/Dinic%27s_algorithm * https://cp-algorithms.com/graph/dinic.html I also added some more benches that build graphs where Dinics strongly outperforms Ford Fulkerson, as the graph built in the existent bench in the library favored Ford Fulkerson method. * Dinics ``` > cargo bench --bench dinics running 5 tests test dinics_bench ... bench: 52,870.19 ns/iter (+/- 5,405.82) test dinics_bench_dense_middle ... bench: 680,920.00 ns/iter (+/- 120,617.64) test dinics_bench_dense_middle_varying_weights ... bench: 1,527,980.00 ns/iter (+/- 281,240.50) test dinics_bench_many_edges ... bench: 335,548.12 ns/iter (+/- 59,469.62) test dinics_bench_wide ... bench: 163,606.25 ns/iter (+/- 4,729.17) test result: ok. 0 passed; 0 failed; 0 ignored; 5 measured; 0 filtered out; finished in 18.47s ``` * Ford Fulkerson ``` > cargo bench --bench ford_fulkerson running 5 tests test ford_fulkerson_bench ... bench: 15,360.68 ns/iter (+/- 3,189.80) test ford_fulkerson_bench_dense_middle ... bench: 1,129,970.00 ns/iter (+/- 194,473.25) test ford_fulkerson_bench_dense_middle_varying_weights ... bench: 14,432,700.00 ns/iter (+/- 1,127,474.00) test ford_fulkerson_bench_many_edges ... bench: 153,336.25 ns/iter (+/- 20,860.00) test ford_fulkerson_bench_wide ... bench: 3,733,340.00 ns/iter (+/- 957,471.00) test result: ok. 0 passed; 0 failed; 0 ignored; 5 measured; 0 filtered out; finished in 15.93s``` ``` --------- Co-authored-by: Raoul Luqué <125205120+RaoulLuque@users.noreply.github.com> Co-authored-by: Paul Buehne <64769435+cactusdualcore@users.noreply.github.com> Co-authored-by: cactusdualcore <cactusdualcore@noreply.codeberg.org>
See also petgraph#740 and petgraph#847 # Add Dinic's Maximum Flow Algorithm Computes the maximum flow of a weighted directed graph. Implementation inspired by existing [Ford Fulkerson](https://github.com/petgraph/petgraph/blob/9f778ec11b218ef6d1dfaaa51467b24c8c939fca/src/algo/maximum_flow/ford_fulkerson.rs) in Petgraph. Also, creates a `maximum_flow` submodule inside `algo` module to group flow network algorithms. References: * https://en.wikipedia.org/wiki/Dinic%27s_algorithm * https://cp-algorithms.com/graph/dinic.html I also added some more benches that build graphs where Dinics strongly outperforms Ford Fulkerson, as the graph built in the existent bench in the library favored Ford Fulkerson method. * Dinics ``` > cargo bench --bench dinics running 5 tests test dinics_bench ... bench: 52,870.19 ns/iter (+/- 5,405.82) test dinics_bench_dense_middle ... bench: 680,920.00 ns/iter (+/- 120,617.64) test dinics_bench_dense_middle_varying_weights ... bench: 1,527,980.00 ns/iter (+/- 281,240.50) test dinics_bench_many_edges ... bench: 335,548.12 ns/iter (+/- 59,469.62) test dinics_bench_wide ... bench: 163,606.25 ns/iter (+/- 4,729.17) test result: ok. 0 passed; 0 failed; 0 ignored; 5 measured; 0 filtered out; finished in 18.47s ``` * Ford Fulkerson ``` > cargo bench --bench ford_fulkerson running 5 tests test ford_fulkerson_bench ... bench: 15,360.68 ns/iter (+/- 3,189.80) test ford_fulkerson_bench_dense_middle ... bench: 1,129,970.00 ns/iter (+/- 194,473.25) test ford_fulkerson_bench_dense_middle_varying_weights ... bench: 14,432,700.00 ns/iter (+/- 1,127,474.00) test ford_fulkerson_bench_many_edges ... bench: 153,336.25 ns/iter (+/- 20,860.00) test ford_fulkerson_bench_wide ... bench: 3,733,340.00 ns/iter (+/- 957,471.00) test result: ok. 0 passed; 0 failed; 0 ignored; 5 measured; 0 filtered out; finished in 15.93s``` ``` --------- Co-authored-by: Raoul Luqué <125205120+RaoulLuque@users.noreply.github.com> Co-authored-by: Paul Buehne <64769435+cactusdualcore@users.noreply.github.com> Co-authored-by: cactusdualcore <cactusdualcore@noreply.codeberg.org>
See also petgraph#740 and petgraph#847 # Add Dinic's Maximum Flow Algorithm Computes the maximum flow of a weighted directed graph. Implementation inspired by existing [Ford Fulkerson](https://github.com/petgraph/petgraph/blob/9f778ec11b218ef6d1dfaaa51467b24c8c939fca/src/algo/maximum_flow/ford_fulkerson.rs) in Petgraph. Also, creates a `maximum_flow` submodule inside `algo` module to group flow network algorithms. References: * https://en.wikipedia.org/wiki/Dinic%27s_algorithm * https://cp-algorithms.com/graph/dinic.html I also added some more benches that build graphs where Dinics strongly outperforms Ford Fulkerson, as the graph built in the existent bench in the library favored Ford Fulkerson method. * Dinics ``` > cargo bench --bench dinics running 5 tests test dinics_bench ... bench: 52,870.19 ns/iter (+/- 5,405.82) test dinics_bench_dense_middle ... bench: 680,920.00 ns/iter (+/- 120,617.64) test dinics_bench_dense_middle_varying_weights ... bench: 1,527,980.00 ns/iter (+/- 281,240.50) test dinics_bench_many_edges ... bench: 335,548.12 ns/iter (+/- 59,469.62) test dinics_bench_wide ... bench: 163,606.25 ns/iter (+/- 4,729.17) test result: ok. 0 passed; 0 failed; 0 ignored; 5 measured; 0 filtered out; finished in 18.47s ``` * Ford Fulkerson ``` > cargo bench --bench ford_fulkerson running 5 tests test ford_fulkerson_bench ... bench: 15,360.68 ns/iter (+/- 3,189.80) test ford_fulkerson_bench_dense_middle ... bench: 1,129,970.00 ns/iter (+/- 194,473.25) test ford_fulkerson_bench_dense_middle_varying_weights ... bench: 14,432,700.00 ns/iter (+/- 1,127,474.00) test ford_fulkerson_bench_many_edges ... bench: 153,336.25 ns/iter (+/- 20,860.00) test ford_fulkerson_bench_wide ... bench: 3,733,340.00 ns/iter (+/- 957,471.00) test result: ok. 0 passed; 0 failed; 0 ignored; 5 measured; 0 filtered out; finished in 15.93s``` ``` --------- Co-authored-by: Raoul Luqué <125205120+RaoulLuque@users.noreply.github.com> Co-authored-by: Paul Buehne <64769435+cactusdualcore@users.noreply.github.com> Co-authored-by: cactusdualcore <cactusdualcore@noreply.codeberg.org>

Add Maximum Bipartite Matching Algorithm
Solves the Maximum Matching Problem for a Bipartite graph by reducing it to a Maximum Flow problem and using existent algorithm for Maximum Flow.
References:
Creating the Max Flow Instance
In the algorithm I manually copy all nodes and edges from original graph to create a network flow instance.
I believe there are faster ways to do this and optimize the algorithm, but I don't know how to do so, so I'm accepting some help on this.
Further Optimization
This algorithm can be further optimized when Dinic's Algorithm PR is accepted, because Dinic's is much faster than Ford Fulkerson method for finding maximum flows in unit networks.
I executed the bench tests with Ford Fulkerson and with the Dinic's algorithm, and this were the results:
using ford_fulkerson to find max flow:
using dinics to find max flow:
Current implementation benchmarks for the same bipartite instances: