Diameters of Trees and Cycle Deletion

In the past two posts, we introduced two models of random trees. The Uniform Spanning Tree chooses uniformly at random from the set of spanning trees for a given underlying graph. The Minimum Spanning Tree assigns IID weights to each edge in the underlying graph, then chooses the spanning tree with minimum induced total weight. We are interested to know whether these are in fact the same distribution, and if they are not, what properties can be used to distinguish them asymptotically.

While investigating my current research problem, I was interested in the diameter of large random trees under various models. Specifically, I am considering what happens if you take a standard Erdos-Renyi process on n vertices, where edges appear at constant rate between pairs of vertices chosen uniformly at random, and add an extra mechanism to prevent the components becoming too large. For this particular model, our mechanism consists of removing any cycles as they are formed. Thus all the components remain trees as time advances, so it is not unreasonable to think that there might be some sort of equilibrium distribution.

Now, by definition, any tree formed by the Erdos-Renyi process is a uniform tree. Why? Well, the probability of a configuration is determined entirely by the number of edges present, so once we condition that a particular set of vertices are the support of a tree, all possible tree structures are equally likely. Note that this relies on sampling at a single fixed time. If we know the full history of the process, then it is no longer uniform. For example, define a k-star to be a tree on k vertices where one ‘centre’ vertex has degree k-1. The probability that a uniform tree on k vertices is a k-star is \frac{k}{k^{k-2}}=k^{-(k-3)}. But a star can only be formed by successively adding single vertices to an existing star. That is, we cannot join a 3-tree and a 4-tree with a edge to get a 7-star. So it is certainly not immediately clear that once we’ve incorporated the cycle deletion mechanism, the resulting trees will be uniform once we condition on their size.

In fact, the process of component sizes is not itself Markovian. For a concrete example, observe first that there is, up to isomorphism, only one tree on any of {0,1,2,3} vertices, so the first possible counterexample will be splitting up a tree on four vertices. Note that cycle deletion always removes at least three edges (ie a triangle), so the two possibilities for breaking a 4-tree are:

(4) -> (2,1,1) and (4) -> (1,1,1,1)

I claim that the probabilities of each of these are different in the two cases: a) (4) is formed from (2,2) and b) (4) is formed from (3,1). This is precisely a counterexample to the Markov property.

In the case where (4) is formed from (2,2), the 4-tree is certainly a path of length 4. Therefore, with probability 1/3, the next edge added creates a 4-cycle, which is deleted to leave components (1,1,1,1). In the case where (4) is formed from (3,1), then with probability 2/3 it is a path of length 4 and with probability 1/3 it is a 4-star (a ‘T’ shape). In this second case, no edge can be added to make a 4-cycle, so after cycle deletion the only possibility is (2,1,1). Thus the probability of getting (1,1,1,1) is 2/9 in this case, confirming that the process is non-Markovian. However, we might remark that we are unlikely to have O(n) vertices involved in fragmentations until at least the formation of the giant component in the underlying E-R process, so it is possible that the cycle deletion process is ‘almost Markov’ for any property we might actually be interested in.

When we delete a cycle, how many vertices do we lose? Well, for a large tree on n vertices, the edge added which creates the cycle is chosen uniformly at random from the pairs of vertices which are not currently joined by an edge. Assuming that n is w(1), that is we are thinking about a limit of fairly large trees, then the number of edges present is much smaller than the number of possible edges. So we might as well assume we are choosing uniformly from the possible edges, rather than just the possible edges which aren’t already present.

If we choose to add an edge between vertices x and y in the tree, then a cycle is formed and immediately deleted. So the number of edges lost is precisely the length of the path between x and y in the original tree. We are interested to know the asymptotics for this length when x and y are chosen at random. The largest path in a graph is called the diameter, and in practice if we are just interested in orders of magnitude, we might as well assume diameter and expected path length are the same.

So we want to know the asymptotic diameter of a UST on n vertices for large n. This is generally taken to be n^{1/2}. Here’s a quick but very informal argument that did genuinely originate on the back of a napkin. I’m using the LERW definition. Let’s start at vertex x and perform LERW, and record how long the resultant path is as time t advances. This is a Markov chain: call the path length at time t X_t.

Then if X_t=k, with probability 1-\frac k n we get X_{t+1}=k+1, and for each j in {0,…,k-1}, with probability 1/n we have X_{t+1}=j, as this corresponds to hitting a vertex we have already visited. So

\mathbb{E}\Big[X_{t+1}|X_t=k\Big]=\frac{nk-k^2/2}{n}.

Note that this drift is positive for k<< \sqrt n and negative for k>>\sqrt n, so we would expect n^{-1/2} to be the correct scaling if we wanted to find an equilibrium distribution. And the expected hitting time of vertex y is n, by a geometric distribution argument, so in fact we would expect this Markov chain to be well into the equilibrium window with the n^{-1/2} scaling by the time this occurs. As a result, we expect the length of the x to y path to have magnitude n^{1/2}, and assume that the diameter is similar.

So this will be helpful for calculations in the cycle deletion model, provided that the trees look like uniform trees. But does that even matter? Do all sensible models of random trees have diameter going like n^{1/2}? Well, a recent paper of Addario-Berry, Broutin and Reed shows that this is not the case for the minimum spanning tree. They demonstrate that the diameter in this case is n^{1/3}. I found this initially surprising, so tried a small example to see if that shed any light on the situation.

The underlying claim is that MSTs are more likely to be ‘star-like’ than USTs, a term I am not going to define. Let’s consider n=4. Specifically, consider the 4-star with centre labelled as 1. There are six possible edges in K_4 and we want to see how many of the 6! weight allocations lead to this star. If the three edges into vertex 1 have weights 1, 2 and 3 then we certainly get the star, but we can also get this star if the edges have weights 1, 2 and 4, and the edge with weight 3 lies between the edges with weights 1 and 2. So the total number of possibilities for this is 3! x 3! + 3! x 2! = 48. Whereas to get a 4-path, you can assign weights 1, 2 and 3 to the edges of the path, or weights 1, 2 and 4 provided the 4 is not in the middle, and then you have the 3 joining up the triangle formed by 1 and 2. So the number of possibilities for this is 3! x 3! + 4 x 2! = 44.

To summarise in a highly informal way, in a star-like tree, you can ‘hide’ some fairly low-scoring weights on edges that aren’t in the tree, so long as they join up very low-scoring edges that are in the tree. Obviously, this is a long way from getting any formal results on asymptotics, but it does at least show that we need to be careful about diameters if we don’t know exactly what mechanism is generating the tree!

Minimum Spanning Trees

In my last post, I discussed the Uniform Spanning Tree. To summarise very briefly, given a connected graph on n vertices, a tree is a subgraph, that is a subset of the edges, which is connected, but which contains no cycles. It turns out this requires the tree to have n-1 edges.

We are interested in natural mechanisms for generating randomly chosen spanning trees of a given graph. One way we can always do this is to choose uniformly at random from the set of possible trees. This UST is in some sense canonical, but it is worth knowing about some other measures on trees that might be of interest.

A family of natural problems in operations research concerns an arbitrary complex network, with some weight or cost associated to each connection. The question is how to perform some operation on the network so as to minimise the resulting cost. Perhaps the most famous such problem is that of the Travelling Salesman. The story is that a salesman needs to visit n locations and wants to do the trip as efficiently as possible. This might be thought of as some sort of financial or time cost, but proably the easiest way to set it up is to imagine he is trying to minimise the distance he has to travel. It is not hard to see why this problem might genuinely arise in plenty of real-world situations, where a organisation or agent is trying to be as efficient as possible.

It might be the case that it is not possible to travel between every pair of locations, but we needn’t assume that for now. So if he knows the distance between any pair of cities, he wants to know which of the possible routes gives the shortest overall distance. The problem is that there are n! routes, and this grows roughly like n^n, which is faster than exponential, so for as few as 20 cities it has turned into a comparison which is too large to compute.

There are various algorithms which reduce the number of routes that must be checked, and some approximation methods. But if you want the exact answer, it is not currently possible to calculate this in polynomial time.

Minimal Spanning Trees and Uniqueness

For the travelling salesman, we were looking for the minimal cost spanning path. In the case of the complete graph, this is the same as the minimal cost non-repeating path of length n-1. Such paths are a subset of the set of spanning trees on the underlying graph. So what if we look instead for the minimal cost spanning tree? This exists as after all, there are only finitely many spanning trees.

So far, this has been deterministic, but we were looking for a random spanning tree. We can achieve this by choosing the weights at random. Anything other than assigning the weights as an IID sequence seems likely to be complicated, but there isn’t a canonical choice of the distribution of the weights. Our first question will be whether the distribution of the weights affects the distribution of the induced MST. In fact it will turn out that so long as the distribution is continuous, it has no effect on the distribution of the MST. The continuous condition might seem odd, but it is present only to ensure that the weights almost certainly end up generating a unique MST.

It turns out that there is a straightforward greedy algorithm to find the MST once the weights are known. We will examine some consequences of this algorithm in the random setting. First we check uniqueness. The condition required for uniqueness is that the weights be distinct. Note that this is slightly weaker than the statement that all of sums of (n-1)-tuples be distinct, which immediately implies a unique MST.

We now prove this condition. Suppose we have distinct weights, and an associated MST. If the underlying graph is a tree, then the result is clear. Otherwise, add some extra edge e, with weight w(e). By the definition of a tree, this generates exctly one cycle. Consider the other edges, say e_1,\ldots,e_k in this cycle. If any of w(e_i)>w(e) then we can replace e_i with e to get a spanning tree with smaller weight, a contradiction of the claim that we started with an MST. So by distinctness of weights, we conclude that w(e)>w(e_i) for all i.

Conversely, suppose we remove some edge e which IS in the MST. We end up with exactly two connected components. Consider all the edges in the underlying graph between the two components, and suppose that one of these f satisfies w(f)<w(e). Then if we add in edge f, which is by construction not in the original MST, we end up with a smaller total weight than we started with, a further contradiction.

We can summarise this in a neat form. Given an edge e between x and y, consider the set of all edges in the underlying graph with weight LESS THAN w(e). Then if x and y are in different components, the edge e must be in the MST. Since we have an explicit description of which edges are present, it follows that the MST is unique. The problem is that working out the component structure of the graph with higher weights removed is computationally rather intensive. We want a slightly faster algorithm.

Kruskal’s Algorithm

Several rather similar algorithms were developed roughly simultaneously. Prim’s algorithm is a slight generalisation of what we will discuss. Anyway, for now we consider Kruskal’s algorithm which has the advantage that it can be described without really needing to draw a diagram.

We start by ordering the weights. Without loss of generality, we might as well relabel the edges so that

w(e_1)< w(e_2)<\ldots< w(e_{|E|}).

Now, by the condition derived in the argument for uniqueness, we must have e_1 and e_2 in any MST. Now consider e_3. Unless doing so would create a cycle, add e_3. Then, unless doing so would create a cycle, add e_4. Continue. It is clear that the result of this procedure is acyclic. To check it is actually a spanning tree, we show that it is also connected. Suppose not, and two of the components are A and B. Let e be the edge between A and B with minimal weight. According to the algorithm, we should have included e in our MST because at no point would adding it possibly have created a cycle. So we have proved that this greedy algorithm does indeed give the (unique) MST.

A useful consequence of this is that we know the two edges with overall minimum weight are definitely in the MST. In the search for a random measure on spanning trees, what is most important is that we didn’t use the actual values of the weights in this construction, only the order. In other words, we might as well have assumed the weights were a random permutation from S_{|E|}. This now answers our original question about how the random weight MST depends in distribution on the underlying edge weight distribution. So long as with probability one the weights are distinct (which holds if the distribution is continuous), then the distribution of the resulting spanning tree is constant.

It’s not too hard to show this isn’t the same as UST: n=4 suffices as a counterexample. But the difference in asymptotic behaviour of properties such as the diameter is of interest, and will be explored in the next post.