LightGraphs.jl icon indicating copy to clipboard operation
LightGraphs.jl copied to clipboard

Inconsistency about weights

Open greimel opened this issue 5 years ago • 21 comments

I am confused about how the LightGraphs ecosystem handles weighted graphs. I wonder why adjacency_matrix == weights for SimpleWeightedGraphs, while for MetaGraphs

  • the adjacency_matrix is a matrix of {0,1}, even in the presence of weights
  • the weight equals the default weight for non-connected nodes (I would have expected it to be 0)

Is there a reason for this inconsistency?

I am not a mathematician, but an economist. My standard reference for graphs is Social and Economic Networks by Matthew O. Jackson. And he makes no difference between adjacency matrix and weight matrix.

I am happy to contribute to get rid of this inconsistency, if that means to bring MetaGraphs closer to the behavior of SimpleWeightedGraphs.

Here is some code to show you what I mean

using LightGraphs, SimpleWeightedGraphs, MetaGraphs, GraphDataFrameBridge, DataFrames

begin
	w_g = SimpleWeightedGraph(3)
	add_edge!(w_g, 1, 2, 0.5)
	add_edge!(w_g, 2, 3, 0.8)
	add_edge!(w_g, 1, 3, 2.0)
end

adjacency_matrix(w_g)

# 3×3 SparseMatrixCSC{Float64, Int64} with 6 stored entries:
#  ⋅   0.5  2.0
# 0.5   ⋅   0.8
# 2.0  0.8   ⋅ 

weights(w_g) == adjacency_matrix(w_g)

# true

begin
	df = DataFrame(from = [1, 2, 1], to = [2, 3, 3], weight = [0.5, 0.8, 2.0])
	m_g = MetaGraph(df, :from, :to, weight = :weight)
end

adjacency_matrix(m_g)

# 3×3 SparseMatrixCSC{Int64, Int64} with 6 stored entries:
#  ⋅  1  1
#  1  ⋅  1
#  1  1  ⋅

collect(weights(m_g))

# 3×3 Matrix{Float64}:
#  1.0  0.5  2.0
#  0.5  1.0  0.8
#  2.0  0.8  1.0

greimel avatar Jan 28 '21 10:01 greimel

I would prefer if the adjacency matrix would not be the same as weights for all graphs - sometimes one is only interested in structural information so in that case a 0/1 matrix is enough.

That the default weights return a matrix full of ones is a bit of a hack for performance - as for algorithms that take a weight parameter one usually just queries the entries of that matrix where there is an edge. There are certainly other ways to do that that would yield the same performance though.

simonschoelly avatar Jan 28 '21 11:01 simonschoelly

What about allowing MetaGraphs.graph isa SimpleWeightedGraph and then defaulting to the SimpleWeightedGraphs behavior? (I guess this would be very confusing)

So what about having an alternative type like MetaWeightedGraph or WeightedMetaGraph that behaves more like SimpleWeightedGraphs?

greimel avatar Jan 28 '21 11:01 greimel

OK, let's see if this helps clear things up:

SimpleGraphs are unweighted. That is, each edge has a unit weight of 1. You. cannot change this - it is a feature/limitation of the graph type.

MetaGraphs support edge weights. The way you do it is that you create and set a value for a weight field for each edge, and then tell the graph that that field represents edge weights.

Adjacency matrices simply tell you whether an edge exists between two given vertices. It has nothing to do with weights at all. Adjacency matrices do not have to be integers: you can tell adjacency_matrix to give you booleans, for example, so that true means the edge exists, and false means there is no edge. The default for SimpleGraphs is int. The function signature is

adjacency_matrix(g[, T=Int; dir=:out])

So if you want a matrix of bool, you can use adjacency_matrix(g, Bool).

Note that an adjacency matrix will ALWAYS be a sparse matrix, even if weights are defined (as in MetaGraphs) in another data structure (like a dict).

Does this help?

Edited to respond to @greimel - in LightGraphs, we NEVER reference fields of structs - there are always accessors. That is, you should NEVER use MetaGraphs.graph anywhere in your code. It may (eventually will) break on you without warning.

For MetaGraphs, the weights are set (and used) as follows:

julia> g = MetaGraph(3)
{3, 0} undirected Int64 metagraph with Float64 weights defined by :weight (default weight 1.0)

julia> add_edge!(g, 1, 2)
true

julia> add_edge!(g, 2, 3)
true

julia> set_prop!(g, 1, 2, :weight, 5.0)
true

julia> set_prop!(g, 2, 3, :weight, 10.0)
true

julia> diameter(g)
15.0

sbromberger avatar Jan 28 '21 12:01 sbromberger

Thank you for your responses!

My question was not how to use LightGraphs, but why it works the way it does. Why doesn't adjacency_matrix(::SimpleWeightedGraph) produce a matrix of {0,1} but the weights?

I think this is inconsistent with

Adjacency matrices simply tell you whether an edge exists between two given vertices. It has nothing to do with weights at all.

greimel avatar Jan 28 '21 12:01 greimel

For a consistent behavior in the ecosystem, these things would have to change

  1. adjacency_matrix(::SimpleWeightedGraph) returns a matrix of 0 and 1
  2. weights(::MetaGraph) are 0 for unconnected nodes

greimel avatar Jan 28 '21 12:01 greimel

I think you are misunderstanding. adjacency_matrix(::SimpleWeightedGraph) DOES NOT return a matrix of Ints.

Moreover, adjacency_matrix(::SimpleWeightedGraph, Bool) gives an error.

see my code snipped from the OP

using LightGraphs, SimpleWeightedGraphs

begin
	w_g = SimpleWeightedGraph(3)
	add_edge!(w_g, 1, 2, 0.5)
	add_edge!(w_g, 2, 3, 0.8)
	add_edge!(w_g, 1, 3, 2.0)
end

adjacency_matrix(w_g)

# 3×3 SparseMatrixCSC{Float64, Int64} with 6 stored entries:
#  ⋅   0.5  2.0
# 0.5   ⋅   0.8
# 2.0  0.8   ⋅ 

greimel avatar Jan 28 '21 12:01 greimel

Ah, you're talking about SimpleWeightedGraphs. The reason adjacency_matrix returns the weights there is twofold:

  1. it's super efficient.
  2. it's probably incorrect.

That is, we really should be doing a adjacency_matrix(g) .> 0 there for consistency.

sbromberger avatar Jan 28 '21 12:01 sbromberger

Very good! I am happy to prepare a PR.

What about point two of

For a consistent behavior in the ecosystem, these things would have to change

  1. adjacency_matrix(::SimpleWeightedGraph) returns a matrix of 0 and 1
  2. weights(::MetaGraph) are 0 for unconnected nodes

Do you agree that weights of a MetaGraph should be zero for unconnected nodes?

greimel avatar Jan 28 '21 12:01 greimel

Do you agree that weights of a MetaGraph should be zero for unconnected nodes?

I'm not convinced yet.

Also - please include benchmarks in your PR: that is, you can do (!iszero).(adjacency_matrix(g)), adjacency_matrix(g) .> 0, etc. Time these against the default behavior so we know how much of a performance loss we're going to get.

sbromberger avatar Jan 28 '21 12:01 sbromberger

I take back what I said earlier about efficiency. What we should be doing for SWG is adjacency_matrix(g::SimpleWeightedGraph{T}, ::Datatype{T}) where T = copy(g.weights) (or the transform; I forget). I don't recall offhand what we're actually doing and I don't have time to look at it for a while - but if you open up the PR over in SWG, we can discuss when I'm back in office.

sbromberger avatar Jan 28 '21 12:01 sbromberger

Do you agree that weights of a MetaGraph should be zero for unconnected nodes?

I'm not convinced yet.

Ok, so let me give another try.

Point 1: Simplicity - weights(graph) can completely characterize the graph

In much of the economics literature on networks, a graph is represented as matrix G. (We call G the adjacency matrix, but that's not so important). We define two nodes as connected if G[i,j] > 0.

In LightGraphs this matrix G doesn't exist yet. It is adjacency_matrix(G) .* weights(G) which is a bit annoying. I don't think it would hurt anyone if the weights are 0 for unconnected nodes.

Point 2: Consistency

Also, there still remains an inconsistency. If you want to return the default weight for unconnected nodes, then there should be a default weight for a ::SimpleWeightedGraph, so that weights(g) = weights .* (weights .> 0) .+ default_weight .* (weights .== 0). This matters if you do wg = SimpleWeightedGraph(g::SimpleGraph, 5), then weights(wg) should be equal to 5, even for unconnected nodes, since 5 is the default weight.

greimel avatar Jan 28 '21 13:01 greimel

I think there may be a misunderstanding about what "default weight" means: it means: IF the edge exists, AND no weights have been defined for the edge, THEN we use this value. It's not "Every entry in the adjacency matrix has this value UNLESS an edge is defined". The latter definition would result in dense matrices for all adjacency matrices, which is not desirable at all.

Edited: but it's possible I'm misunderstanding, since I'm rushing to get out the door and haven't had coffee yet. Sorry. This discussion should probably wait until I'm back in the office (week after next).

sbromberger avatar Jan 28 '21 13:01 sbromberger

I guess we agree then that the current behavior is a bug. Here's the other part of the code snippet from the OP.

using LightGraphs, MetaGraphs, GraphDataFrameBridge, DataFrames
begin
	df = DataFrame(from = [1, 2, 1], to = [2, 3, 3], weight = [0.5, 0.8, 2.0])
	m_g = MetaGraph(df, :from, :to, weight = :weight)
end

adjacency_matrix(m_g)

# 3×3 SparseMatrixCSC{Int64, Int64} with 6 stored entries:
#  ⋅  1  1
#  1  ⋅  1
#  1  1  ⋅

collect(weights(m_g))

# 3×3 Matrix{Float64}:
#  1.0  0.5  2.0
#  0.5  1.0  0.8
#  2.0  0.8  1.0

The weights are 1.0 for unconnected nodes.

greimel avatar Jan 28 '21 13:01 greimel

That sure looks like a bug. But I don't think I've ever used that constructor. You're familiar with GraphDataFrameBridge, right?

Let's pick this up week after next if you don't mind :) I'm seriously late this morning.

sbromberger avatar Jan 28 '21 13:01 sbromberger

This is a really important discussion to have, though. Thanks for bringing it up.

sbromberger avatar Jan 28 '21 13:01 sbromberger

I think I understand Seth's point to be distinguishing the weights object from the weighted adjacency matrix. If you do collect(weights(mg)) .* adjacency_matrix(mg) you get the weighted adjacency matrix. A nonbreaking solution could be to introduce a weighted_adjacency_matrix(mg) function.

jpfairbanks avatar Jan 28 '21 13:01 jpfairbanks

I could live with that. But it would keep the inconsistencies between MetaGraphs and SimpleWeightedGraphs, which is not very nice.

Either of the two changes I am proposing is breaking, unfortunately. But I would consider the current behavior as bugs in both cases.

One could

  1. introduce weighted_adjacency_matrix
  2. deprecate weights in favor weighted_adjacency_matrix
  3. if necessary, introduce fast_weights for the current behavior

Changing the behavior of adjacency_matrix in SWG is probably more difficult to handle nicely.

greimel avatar Jan 28 '21 13:01 greimel

I'm tempted to make weights private (that is, not part of any API contract). Would that solve the issue?

sbromberger avatar Feb 15 '21 18:02 sbromberger

Closing this out as it appears that the discussion has run its course.

sbromberger avatar Jul 06 '21 12:07 sbromberger

I'd appreciate if you could reopen this. I don't think the issue is solved. (It's just that I was busy with other things.)

If you create the same graph, once as a SimpleWeightedGraph and once as a MetaGraph with weights, then all kinds of network statistics give different results. I think this is a bug that should be fixed.

Once I am bitten by this issue again, I'll make that PR (that I announced months ago, sorry!).

greimel avatar Jul 06 '21 12:07 greimel

Happy to reopen. I still think weights needs to be private.

sbromberger avatar Jul 06 '21 12:07 sbromberger