Mixing Times 6 – Aldous-Broder Algorithm and Cover Times

In several previous posts, I’ve discussed the Uniform Spanning Tree. The definition is straightforward: we choose uniformly at random from the set of trees which span a fixed underlying graph. But for a dense underlying graph, there are a very large number spanning trees. Cayley’s formula says that the complete graph K_n has n^{n-2} spanning trees, so to select from this list is impractical.

We seek a better algorithm. In a post about a year ago, I presented the result that the path between two fixed points x and y in the UST is distributed as the path generated by Loop-Erased Random Walk, for which we start at x and delete cycles as they appear. An initial problem might be that this only gives us a single path, which might be enough in some contexts, but in general we will want to specify the whole tree. Wilson’s Algorithm is an unsurprising but useful extension to this equivalence which does just that. You start by constructing the LERW between two vertices, then you add the LERW which connects some other vertex to the path you already have. Then you take a further vertex not currently explored and start LERW there, continuing until you hit the tree that you already have. Iterate this process, which must terminate after at most n steps when there are no vertices which to start from. The tree thus obtained is the UST. The tricky part is proving that the method for selecting which unused vertices to start from has no effect on the distribution of paths between two fixed points.

I want to consider a different algorithm, discovered roughly simultaneously by Aldous and Broder. Start a random walk on the underlying graph at some particular vertex. Every time we traverse an edge which takes us to a vertex we haven’t yet explored, add this edge to the tree. For now I don’t want to give a proof that this algorithm works, but rather to talk about how fast it works, because it ties in nicely with something from the Mixing Times book we’ve been reading recently. It is clear that the algorithm terminates at the first time the random walk has visited every vertex. This is a stopping time, called the cover time of the Markov chain. If we are working with an underlying complete, then we notice that this is annoying, because it means that the cover time will increase like n.log n. That is, it will take an increasingly long time to gather the final few vertices into the tree. Perhaps some combination of Aldous-Broder initially then Wilson’s method for the final o(n) vertices might be preferable?

I want to discuss how to treat this cover time. Often we have information about the hitting times of states from other states \mathbb{E}_x T_y. A relationship between S, the hitting time, defined to be the maximum of the previous display over x and y, and the expected cover time would be useful, especially for a highly symmetric graph like the complete graph where the expected hitting times are all the same.

Matthews’ Method relates these two for an irreducible finite Markov chain on n states. It says:

t_{cov}\leq t_{hit}\left(1+\frac12+\ldots+\frac 1 n\right).

We first remark that this agrees with what we should get for the random walk on the complete graph. There, the hitting time of x from y is a geometric random variable with success probability 1/n, hence expectation is n. The cover time is the standard coupon collector problem, giving expectation n log n, and the sum of reciprocals factor is asymptotically a good approximation.

The intuition is that if we continue until we hit state 1, then reset and continue until we hit state 2, and so on, by the time we hit state n after (n-1) iterations, this is a very poor overestimate of the cover time, because we are actually likely to have hit most states many times. What we want to do really is say that after we’ve hit state 1, we continue until we hit state 2, unless we’ve already done so, in which case we choose a different state to aim for, one which we haven’t already visited. But this becomes complicated because we then need to know the precise conditional probabilities of visiting any site on the way between two other states, which will depend rather strongly on the exact structure of the chain.

Peres et al give a coupling proof in Chapter 11 of their book which I think can be made a bit shorter, at least informally. The key step is that we still consider hitting the sites in order, only now in a random order.

That is, we choose a permutation \sigma\in S_n uniformly at random, and we let T_k be the first time that states \sigma(1),\ldots,\sigma(k) have all been visited. This is a random time that is measurable in the product space, and for each \sigma it is a stopping time.

The key observation is that \mathbb{P}(T_{k+1}=T_k)=1-\frac{1}{k+1}. This holds conditional on any path of the Markov chain because the requirement for the event is that \sigma(k+1) is visited after \{\sigma(1),\ldots,\sigma(k)\}. The statement therefore holds as stated as well as just pathwise. Then, by the SMP, conditional on \{T_{k+1}>T_k\}, we have

T_{k+1}-T_k \leq_{st} t_{hit}.

Note that by the definition of t_{hit}, this bound on the hitting time T_{k+1} is unaffected by concerns about where the chain actually is at T_k (since it is not necessarily at \sigma(k)).

So, removing the conditioning, we have:

\mathbb{E}\Big[T_{k+1}-T_k\Big]\leq\frac{1}{k+1}t_{hit},

and so the telescoping sum gives us Matthews’ result.

One example is the cover time of random walk on the n x n torus, which turns out to be

O(n^2(\log n)^2).

If anyone remembers that Microsoft screensaver from many years ago which started with a black screen and a snake leaving a trail of white pixels as it negotiated the screen, this will be familiar. The last few black bits take a frustratingly long while to disappear. Obviously that isn’t quite a random walk, but it perhaps diminishes the surprise that it should take this long to find the cover time.

There are a couple of interesting things I wanted to say about electrical networks for Markov chains and analytic methods for mixing times, but the moment may have passed, so this is probably the last post about Mixing Times. Plans are in motion for a similar reading group next term, possible on Random Matrices.

Random Walks and Spanning Trees

Introduction

In this post, I’m going to talk about probability distributions on graphs. In particular, consider paths on a graph, and how to assign a distribution to these in a natural way. One way is to consider a standard random walk, treating the vertices as states of a discrete Markov chain. But, in many situations, this isn’t really enough. We might want paths, that is, walks which are self-avoiding. But even in regular lattices like \mathbb{Z}^d, it is hard to say a great deal about the set of self-avoiding walks (SAWs) of length n. We would prefer a distribution which has a natural product form, or which at least we can sample from without large combinatorial calculations.

A spanning tree is a connected graph without cycles. The set of edges is maximal, in the sense that adding any further edge creates a cycle, and also minimal, as removing one will disconnect the graph. Counting the number of spanning trees on labelled vertices is harder than one might suspect, and is possibly worth a post by itself. However, in general the uniform distribution on spanning trees is a useful object to consider. Any spanning tree contains a unique path between two vertices of the graph, and so a Uniform Spanning Tree (UST) induces a distribution on paths.

An alternative construction is to take a random walk and remove the cycles. This is not well-defined unless you specify a canonical order in which to remove them, and the obvious option is to remove the cycles in the order in which they appear. That is, every time you end up at a vertex which you have already visited, you remove all the edges traversed since you were last at v. This gives a Loop-Erased Random Walk (LERW), and from this another measure on paths between two vertices (subject to connectivity conditions which guarantee that the LERW almost surely hits the target vertex eventually).

At an informal level, the difference between these measures is significant. Inducing a measure down from a natural but potentially unwieldy uniform distribution is theoretically natural, but hard to work with. On the other hand a computer could sample from LERW, just by performing a random walk and storing the history suitably, which immediately makes it useful. So, the following theorem is elegant in its own right, but in particular as a bridge between these two frameworks.

Theorem: The measures on paths from to in a finite graph G induced by UST and generated by LERW are the same.

Proof: Some books give proofs which involve coupling a LERW construction with a sequence of STs directly, driven by an underlying random walk on the graph. The difficulty in this approach is that to prove that the uniform distribution on spanning trees is stationary often requires the random walk to be in equilibrium, a criterion which causes difficulties when it come to starting at x later in the proof. Essentially, the difficulty in many LERW constructions is that the edges being removed are incident to the wrong vertex in the random walk, and so RW equilibrium is required to show that the spanning tree transitions are reversible. Instead, we proceed by an argument motivated by the fact that LERWs appear ‘backwards’ in UST constructions. Continue reading