Count number of connected components more efficiently than `length(connected_components(g))` by thchr · Pull Request #407 · JuliaGraphs/Graphs.jl

thchr · 2024-11-14T11:37:21Z

This adds a new function count_connected_components, which returns the same value as length(connected_components(g)) but substantially faster by avoiding unnecessary allocations. In particular, connected_components materializes component vectors that are not actually necessary for determining the number of components.
Similar reasoning also lets one optimize is_connected a bit: did that also.

While I was there, I also improved connected_components! slightly: previously, it was allocating a new queue for every new "starting vertex" in the search; but the queue is always empty when it's time to add a new vertex at that point, so there's no point in instantiating a new vector.
To enable users who might want to call connected_components! many times in a row to reduce allocations further (I am one such user), I also made it possible to pass this queue as an optimization.

Finally, connected_components! is very useful and would make sense to export; so I've done that here.

Cc @gdalle, if you have time to review.

…h(connected_components(g))`

thchr · 2024-11-14T11:43:52Z

For the doctest example of g = Graph(Edge.([1=>2, 2=>3, 3=>1, 4=>5, 5=>6, 6=>4, 7=>8])), count_connected_components is about twice as fast as length∘connected_components (179 ns vs. 290 ns). Using the buffers, it is faster still (105 ns).

codecov · 2024-11-14T12:25:03Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 97.29%. Comparing base (64eb48b) to head (3551cc7).
⚠️ Report is 1 commits behind head on master.

Additional details and impacted files

@@           Coverage Diff           @@
##           master     #407   +/-   ##
=======================================
  Coverage   97.28%   97.29%           
=======================================
  Files         123      123           
  Lines        7406     7421   +15     
=======================================
+ Hits         7205     7220   +15     
  Misses        201      201

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

simonschoelly · 2024-11-20T13:00:57Z

 # licensing details.
 """
-    connected_components!(label, g)
+    connected_components!(label, g, [search_queue])


I am all for performance improvements. But I am a bit skeptical if it is worth making the interface more complicated.

Almost all graph algorithms need some kind of of work buffer, so we could have something like in al algorithms but in the end it should be the job for Julia's allocator to verify if there is some suitable piece of memory lying around. We can help it by using sizehint! with a suitable heuristic.

I agree that this will usually not be relevant; in my case it is though, and is the main reason I made the changes. I also agree that there is a trade off between performance improvements and complications of the API. On the other hand, I think passing such work buffers as optional arguments is a good solution to such trade-offs: for most users, the complication can be safely ignored and shouldn't complicate their lives much.

As you say, there are potentially many algorithms in Graphs.jl that could take a work buffer; in light of that, maybe this could be more palatable if we settle on a unified name for these kinds of optional buffers, so that it lowers the complications by standardizing across methods.
Maybe just work_buffer (and, if there are multiple, work_buffer1, work_buffer2, etc?)

If we do this then all functions should take exactly one work_buffer (possibly a tuple) and have an appropriate function to initialize the buffer. I think it is a major change which should be discussed separately.

Work buffers for algorithms? #410

So I think if this is really important for your use case you can either

Create a version that uses a buffer in the Experimental submodule. Currently we don't guarantee semantic versioning there - this allows use to remove things in the future without breaking the API.

Or as this code is very simple you might just copy it to your own repository.

But just to clarify - your problem is not that you are building graphs by adding edges until they are connected? Because if that is the issue, there is a much better algorithm.

Create a version that uses a buffer in the Experimental submodule. Currently we don't guarantee semantic versioning there - this allows use to remove things in the future without breaking the API.

I won't be able to find the time for factoring this out into the Experimental submodule, unfortunately.
I'm happy to e.g., add an admonition to the docstring, indicating that the work buffer arguments are unstable API which are subject to breakage though. Factoring this into a submodule, piecing it back together, and adding multiple doc-strings across modules, and eventually loading this behind a Graphs.Experimental call is more fiddling than I'm up for.

Or as this code is very simple you might just copy it to your own repository.

Indeed, I have and will just continue to do that, yep.

But just to clarify - your problem is not that you are building graphs by adding edges until they are connected? Because if that is the issue, there is a much better algorithm.

No, I'm not doing that; appreciate the check though.

As a general side note - and please know that I appreciate these reviews very much (!) and your efforts on what is no doubt spare time (!!) - but I wonder if the level of scrutiny and optimization that many PRs here go through is optimal: I understand the intent and the aim of making stable API and of getting good, maintainable code. But I think there's a risk of trading off too much towards these goals, at the cost of vibrancy and community engagement. From my experience here, there's room for leaning more towards "is this better than what we previously had" over "could this be even better".

PRs like this usually happen on time "stolen away" from our day-jobs, and the odds of returning to PRs for edits, however small, go down very quickly with time; similarly, the expectation that a PR will be a multi-iteration process reduces the likelihood it will be made.

I appreciate the feedback, and I lean in the same direction. The JuliaGraphs community calls have pretty much dried up recently and we don't have much of a community to start with, so it's in our best interest to make engagement rewarding instead of tiresome.
I'm expecting an answer this month on funding I applied to which could serve to revitalize this ecosystem, hopefully I'll have good news to share soon.

but in the end it should be the job for Julia's allocator to verify if there is some suitable piece of memory lying around.

After hitting similar problems to op I feel the need to point out that this statement is a bit problematic as a design philosophy in Julia.

Julias garbage collector is slow, and in practice it's on us as developers not to overwork it constantly. Base julia lets us reuse workspaces in sort and LinearAlgebra solves, and the many in-place ! methods.

For me the shortest paths algorithms here are barely useable at production scale (milllions of calls) due to the allocation and garbage collector overheads. I can also just write fast custom methods of everything that our project needs, but the end point of us all doing that is useful code duplicated and scattered through field-specific tools, rather than centralized and usable to the ecosystem.

But I think there needs to be a broad policy to keep things coherent, like: mutated workspaces are either the first argument with a ! method if they are also the return value (as I would do with Graphs.enumerate_paths!) or passed in as keywords like in Base.sort.

But given that kind of framework PRs adding workspace keywords and ! mutating versions shouldn't be too controversial.

simonschoelly · 2024-11-20T13:16:51Z

For the doctest example of g = Graph(Edge.([1=>2, 2=>3, 3=>1, 4=>5, 5=>6, 6=>4, 7=>8])), count_connected_components is about twice as fast as length∘connected_components (179 ns vs. 290 ns). Using the buffers, it is faster still (105 ns).

We should not do benchmarks on such small graphs unless the algorithm has a huge complexity and is slow even on very small graphs. Otherwise the benchmark is way too noisy and also does not really reflect the situations where this library is used.

thchr · 2024-11-21T08:30:00Z

We should not do benchmarks on such small graphs unless the algorithm has a huge complexity and is slow even on very small graphs. Otherwise the benchmark is way too noisy and also does not really reflect the situations where this library is used.

What are some good go-to defaults for testing? This is a thing I'm running up against frequently, I feel: I am not sure which graphs to test against, and anything beyond small toy examples are not easily accessible via convenience constructors in Graphs.

As context, in my situation the graphs are rarely larger than 50-100 vertices; my challenge is that I need to consider a huge number of permutations of such graphs, so performance in the small-graph case is relevant to me.

gdalle · 2024-11-21T11:10:25Z

What are some good go-to defaults for testing? This is a thing I'm running up against frequently, I feel: I am not sure which graphs to test against, and anything beyond small toy examples are not easily accessible via convenience constructors in Graphs.

I have opened this issue to discuss further:

Typical graphs to test on? #411

Krastanov · 2026-02-21T03:07:30Z

@thchr , it has been a while since the discussion on this PR has died out.

What is your stance on this PR -- do you still care for it? If you have gone on to solve your problem in a different way, could you share what you are using instead of Graphs.jl
Could you confirm that the only outstanding issue is the discussion about whether it is ok to introduce the use of a scratch space in the public API?

If that is indeed the only outstanding issue, I would like to keep it as it is and move forward with rebasing/testing/merging this. It would be nice to clean up the backlog of PRs.

github-actions · 2026-02-21T22:10:14Z

Benchmark Results (Julia v1)

Time benchmarks

	master	`3551cc7`...	master / `3551cc7`...
centrality/digraphs/betweenness_centrality	16.9 ± 0.41 ms	16.2 ± 0.42 ms	1.04 ± 0.037
centrality/digraphs/closeness_centrality	12 ± 0.24 ms	11.4 ± 0.3 ms	1.05 ± 0.035
centrality/digraphs/degree_centrality	1.94 ± 0.15 μs	1.94 ± 0.15 μs	1 ± 0.11
centrality/digraphs/katz_centrality	0.857 ± 0.046 ms	0.854 ± 0.052 ms	1 ± 0.081
centrality/digraphs/pagerank	0.0363 ± 0.00057 ms	0.0361 ± 0.00057 ms	1 ± 0.022
centrality/graphs/betweenness_centrality	28.8 ± 1.5 ms	28.5 ± 1.3 ms	1.01 ± 0.069
centrality/graphs/closeness_centrality	21.8 ± 0.51 ms	21.7 ± 0.38 ms	1 ± 0.029
centrality/graphs/degree_centrality	1.5 ± 0.17 μs	1.49 ± 0.17 μs	1.01 ± 0.16
centrality/graphs/katz_centrality	1.02 ± 0.041 ms	1.02 ± 0.041 ms	1 ± 0.058
connectivity/digraphs/strongly_connected_components	0.043 ± 0.001 ms	0.0425 ± 0.0011 ms	1.01 ± 0.035
connectivity/graphs/connected_components	24.1 ± 0.74 μs	24.3 ± 0.69 μs	0.991 ± 0.042
core/edges/digraphs	7.15 ± 0.01 μs	7.01 ± 0.01 μs	1.02 ± 0.002
core/edges/graphs	15.7 ± 0.16 μs	17 ± 0.08 μs	0.919 ± 0.01
core/has_edge/digraphs	5.15 ± 0.35 μs	5.16 ± 0.36 μs	0.998 ± 0.097
core/has_edge/graphs	5.47 ± 0.35 μs	5.54 ± 0.38 μs	0.987 ± 0.093
core/nv/digraphs	0.361 ± 0.01 μs	0.37 ± 0.02 μs	0.976 ± 0.059
core/nv/graphs	0.391 ± 0.01 μs	0.381 ± 0.02 μs	1.03 ± 0.06
edges/fille	7.88 ± 1.1 μs	8.59 ± 0.93 μs	0.917 ± 0.17
edges/fillp	4.57 ± 3.7 μs	5.79 ± 4 μs	0.789 ± 0.84
edges/tsume	2.56 ± 0.1 μs	2.53 ± 0.029 μs	1.01 ± 0.041
edges/tsump	2.5 ± 0.15 μs	2.58 ± 0.041 μs	0.965 ± 0.06
insertions/SG(n,e) Generation	24.2 ± 3.4 ms	24.6 ± 3.3 ms	0.984 ± 0.19
parallel/egonet/twohop	0.326 ± 0.04 s	0.293 ± 0.0074 s	1.11 ± 0.14
parallel/egonet/vertexfunction	2.23 ± 0.16 ms	2.21 ± 0.17 ms	1.01 ± 0.1
serial/egonet/twohop	0.29 ± 0.0068 s	0.311 ± 0.0071 s	0.931 ± 0.031
serial/egonet/vertexfunction	2.31 ± 0.26 ms	2.44 ± 0.35 ms	0.945 ± 0.17
traversals/digraphs/bfs_tree	0.0492 ± 0.002 ms	0.051 ± 0.0087 ms	0.965 ± 0.17
traversals/digraphs/dfs_tree	0.0642 ± 0.0024 ms	0.0641 ± 0.0087 ms	1 ± 0.14
traversals/graphs/bfs_tree	0.0531 ± 0.0018 ms	0.0546 ± 0.0019 ms	0.973 ± 0.048
traversals/graphs/dfs_tree	0.0654 ± 0.0021 ms	0.0656 ± 0.0027 ms	0.997 ± 0.052
time_to_load	0.531 ± 0.00093 s	0.543 ± 0.0015 s	0.978 ± 0.0031

Memory benchmarks

	master	`3551cc7`...	master / `3551cc7`...
centrality/digraphs/betweenness_centrality	0.29 M allocs: 24 MB	0.29 M allocs: 24 MB	1
centrality/digraphs/closeness_centrality	18.6 k allocs: 14.5 MB	18.6 k allocs: 14.5 MB	1
centrality/digraphs/degree_centrality	8 allocs: 5.01 kB	8 allocs: 5.01 kB	1
centrality/digraphs/katz_centrality	2.63 k allocs: 2.83 MB	2.63 k allocs: 2.83 MB	1
centrality/digraphs/pagerank	21 allocs: 14.9 kB	21 allocs: 14.9 kB	1
centrality/graphs/betweenness_centrality	0.545 M allocs: 0.0313 GB	0.545 M allocs: 0.0313 GB	1
centrality/graphs/closeness_centrality	19.3 k allocs: 14 MB	19.3 k allocs: 14 MB	1
centrality/graphs/degree_centrality	10 allocs: 5.43 kB	10 allocs: 5.43 kB	1
centrality/graphs/katz_centrality	2.96 k allocs: 3.1 MB	2.96 k allocs: 3.1 MB	1
connectivity/digraphs/strongly_connected_components	1.05 k allocs: 0.075 MB	1.05 k allocs: 0.075 MB	1
connectivity/graphs/connected_components	0.061 k allocs: 22.5 kB	0.061 k allocs: 22.5 kB	1
core/edges/digraphs	3 allocs: 0.0938 kB	3 allocs: 0.0938 kB	1
core/edges/graphs	3 allocs: 0.0938 kB	3 allocs: 0.0938 kB	1
core/has_edge/digraphs	20 allocs: 12.6 kB	20 allocs: 12.6 kB	1
core/has_edge/graphs	28 allocs: 13.8 kB	28 allocs: 13.8 kB	1
core/nv/digraphs	3 allocs: 0.0938 kB	3 allocs: 0.0938 kB	1
core/nv/graphs	3 allocs: 0.0938 kB	3 allocs: 0.0938 kB	1
edges/fille	3 allocs: 0.153 MB	3 allocs: 0.153 MB	1
edges/fillp	3 allocs: 0.153 MB	3 allocs: 0.153 MB	1
edges/tsume	0 allocs: 0 B	0 allocs: 0 B
edges/tsump	0 allocs: 0 B	0 allocs: 0 B
insertions/SG(n,e) Generation	0.0465 M allocs: 10.9 MB	0.0465 M allocs: 10.9 MB	1
parallel/egonet/twohop	10 allocs: 0.0768 MB	10 allocs: 0.0768 MB	1
parallel/egonet/vertexfunction	10 allocs: 0.0768 MB	10 allocs: 0.0768 MB	1
serial/egonet/twohop	3 allocs: 0.0764 MB	3 allocs: 0.0764 MB	1
serial/egonet/vertexfunction	3 allocs: 0.0764 MB	3 allocs: 0.0764 MB	1
traversals/digraphs/bfs_tree	2.34 k allocs: 0.113 MB	2.34 k allocs: 0.113 MB	1
traversals/digraphs/dfs_tree	2.44 k allocs: 0.118 MB	2.44 k allocs: 0.118 MB	1
traversals/graphs/bfs_tree	2.52 k allocs: 0.121 MB	2.52 k allocs: 0.121 MB	1
traversals/graphs/dfs_tree	2.63 k allocs: 0.127 MB	2.63 k allocs: 0.127 MB	1
time_to_load	0.149 k allocs: 11.1 kB	0.145 k allocs: 11 kB	1.02

thchr · 2026-02-24T13:37:23Z

@thchr , it has been a while since the discussion on this PR has died out.

What is your stance on this PR -- do you still care for it? If you have gone on to solve your problem in a different way, could you share what you are using instead of Graphs.jl

Could you confirm that the only outstanding issue is the discussion about whether it is ok to introduce the use of a scratch space in the public API?

If that is indeed the only outstanding issue, I would like to keep it as it is and move forward with rebasing/testing/merging this. It would be nice to clean up the backlog of PRs.

Hi @Krastanov - thanks for having a look at this, and for your efforts here more broadly!

I still think it's a useful addition: I use it myself, but have so far simply duplicated the definitions under new names (here). I have since optimized the methods a bit more (e.g., here), but it's more complicated and probably not worthwhile porting & restarting review here.

The outstanding issue was indeed whether or not it was OK to introduce scratch space. The bigger question around that got transferred into issue #410, but no consensus was reached afaik. I think the idea of having a scratch/buffer style like what I proposed here #410 (comment) is sound, but it's not implemented here (for lack of a clear consensus in #410).

We could merge as-is, simply stating that the use of the scratch space buffers is experimental and subject to change?

Krastanov · 2026-02-24T14:05:48Z

Thank you for getting back to this after a long wait! Let's indeed merge it. I asked my instance of claude to add a quick changelog update and to edit the docstring:

Could you check whether the incoming edits are fitting your preferences. If they do, I will mark this as approved and we can merge.

By the way, you probably have noticed that we are short on volunteers here. Would you be interested in being added to the github org for Graphs.jl to be notified about the occasional PRs, in case you have the bandwidth to review (once or twice a year would already be a big help)?

Update CHANGELOG.md with entries for count_connected_components, connected_components! export, and is_connected optimization. Mark search_queue argument as experimental in docstrings, noting that future versions will provide a more universal work buffer approach. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

thchr · 2026-02-24T15:22:51Z

Thanks for the Claude-edit; I need to play with that myself - very handy..!

I've made a minor revision to the experimental warning language: it's good to go from my side now.

thchr · 2026-02-24T15:23:57Z

By the way, you probably have noticed that we are short on volunteers here. Would you be interested in being added to the github org for Graphs.jl to be notified about the occasional PRs, in case you have the bandwidth to review (once or twice a year would already be a big help)?

Oh, forgot this: yeah, I'd be happy to lend a hand when I have the bandwidth and know-how.

Krastanov · 2026-02-24T16:15:03Z

Thank you! I will add you to the github org. A lot of the volunteers are also on the julia slack in the #graphs channel. Funnily, I was just meeting with Mikkel while you were updating the PR this morning -- just learnt you are working together. Funny how small the world is.

thchr added 2 commits November 14, 2024 12:32

implement count_connected_components(g) as faster version of `lengt…

59270de

…h(connected_components(g))`

export connected_components!

a8f8d19

thchr added 4 commits November 14, 2024 12:55

allow resetting label inside count_connected_components

2316b0f

fix unsaved test

05e3b7e

fix copy bug and nits

0153724

JuliaFormatter

ce66be5

lingering copy typo

f440525

simonschoelly reviewed Nov 20, 2024

View reviewed changes

Comment thread src/connectivity.jl

simonschoelly reviewed Nov 20, 2024

View reviewed changes

Comment thread src/connectivity.jl Outdated

simonschoelly reviewed Nov 20, 2024

View reviewed changes

Comment thread src/connectivity.jl Outdated

thchr added 3 commits November 21, 2024 09:30

updates from code-review

da1d31e

simplify count_unique

21b1854

improve comments for count_unique

ead687e

thchr mentioned this pull request Nov 21, 2024

Implement is_articulation to check if vertex is articulation point #387

Merged

gdalle mentioned this pull request Nov 21, 2024

Work buffers for algorithms? #410

Open

use BitSet in count_unique

d5e0e37

Krastanov added 2 commits February 21, 2026 02:27

Merge branch 'master' into connected_components

97e194a

Merge branch 'master' into connected_components

f35f4eb

Merge branch 'master' into connected_components

7f07fb5

Merge branch 'master' into connected_components

f5add6e

Krastanov and others added 2 commits February 24, 2026 09:06

nits to experimental warning

3551cc7

Krastanov merged commit 89dabeb into JuliaGraphs:master Feb 24, 2026
16 checks passed

Conversation

thchr commented Nov 14, 2024

Uh oh!

thchr commented Nov 14, 2024

Uh oh!

codecov bot commented Nov 14, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

simonschoelly Nov 20, 2024

Choose a reason for hiding this comment

Uh oh!

thchr Nov 21, 2024

Choose a reason for hiding this comment

Uh oh!

gdalle Nov 21, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gdalle Nov 21, 2024

Choose a reason for hiding this comment

Uh oh!

simonschoelly Nov 24, 2024

Choose a reason for hiding this comment

Uh oh!

thchr Jan 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gdalle Jan 16, 2025

Choose a reason for hiding this comment

Uh oh!

rafaqz Apr 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

simonschoelly commented Nov 20, 2024

Uh oh!

thchr commented Nov 21, 2024

Uh oh!

gdalle commented Nov 21, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Krastanov commented Feb 21, 2026

Uh oh!

github-actions bot commented Feb 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmark Results (Julia v1)

Uh oh!

thchr commented Feb 24, 2026

Uh oh!

Krastanov commented Feb 24, 2026

Uh oh!

thchr commented Feb 24, 2026

Uh oh!

thchr commented Feb 24, 2026

Uh oh!

Uh oh!

Krastanov commented Feb 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

codecov bot commented Nov 14, 2024 •

edited

Loading

gdalle Nov 21, 2024 •

edited

Loading

thchr Jan 14, 2025 •

edited

Loading

rafaqz Apr 5, 2025 •

edited

Loading

gdalle commented Nov 21, 2024 •

edited

Loading

github-actions bot commented Feb 21, 2026 •

edited

Loading