Lovasz Local Lemma

At our training and selection camp in Tonbridge in May, I gave a session about the use of probabilistic methods to solve olympiad combinatorics questions. Such an approach will normally be relevant to a problem where it is required to show the existence of a structure satisfying a particular property. We might consider constructing a structure randomly, and then try to show that the probability that our construction has the required property is non-zero.

There are several possible approaches to this, but often for the property we seek, we can describe a family of ‘bad events’ A_1,\ldots,A_n, and then we have the property if none of the bad events hold. That is, we are looking for general conditions under which we can show that \mathbb{P}(\Cap_{i=1}^n A_i^c)>0.

We have two cases where this is easy to do.

1) If all the A_is are independent, then so long as all \mathbb{P}(A_i)<1, we have our result.

2) If the probability of the bad events have sum less than 1, then we can use a union bound

\mathbb{P}(\cup A_i)\le \sum_{i=1}^n \mathbb{P}(A_i) <1,

to conclude what we want.

In Tonbridge we also discussed first-moment methods, where we show that the expected number of bad events is less than 1, meaning there must be some elements of the probability space where the number of bad events is zero. In this article, we’re looking in a different direction. We’ll try to interpolate between situation 1) and 2), to make some concrete comment on the situation where the probabilities of the bad events are small, but not small enough to use the union bound, and where the events are not quite independent, but are in some sense almost independent.

The first thing we need to do is think about what it means for a family of events to be independent. Suppose we toss a fair coin twice, and consider the events:

A:= {first coin is H}, B:={second coin is T}, C:={exactly one H}.

So, it is easy to see that each pair of events is independent, but if we know that A and B hold, then also C holds (and indeed this stays true under cyclic re-ordering). So C is not independent of the family of events {A,B}. Rather than give a formal definition, I’ll say instead that an event B is said to be independent of the family of events \{A_1,\ldots,A_5\} if it is independent of

A_3

A_1\cap A_2

A_3\cap A_4^c\cap A_5^c,

and so on. I hope it’s clear from this what I mean. Slogan: no knowledge about which of the A_i do or don’t hold gives information about B.

Now we return to our original setup. We want that each A_i is independent of lots of the rest, and so we choose for each i\in[n] a dependency set D_i\subset [n] of indices, such that A_i is independent of the family of events \{A_j: j\in [n]\backslash D_i\}. It is tempting to interpret this immediately as saying that A_i depends on each event with index in D_i. This will normally be fine in practice, but we should emphasise that there is a lot of freedom to choose D_i, and the genuinely important condition is that A_i is independent of the family given by the rest of the indices.

[*Health warning*: The language I am using seems sensible for anyone learning about this for the first time, but is very slightly non-standard. Instead of dependency sets, the classical notation is to consider a dependency digraph on [n], with an edge i->j whenever j\in D_i.]

The symmetric version of the Lovasz Local Lemma says: suppose \mathbb{P}(A_i)\le p and we can choose D_i as described so that |D_i|\le d for each I. Then, if epd\le 1, we may conclude \mathbb{P}(\Cap_{i=1}^n A_i^c)>0.

We’ll come back to the proof, which is best seen in a slightly more general formulation. First, let’s get to grips with the notation, by reassuring ourselves that this really does lie between the union bound and the independence case.

If the events are independent, then clearly we may take D_i=\{i\} for each i, that is d=1, so we may draw the desired conclusion so long as p\le 1/e, which is fine, but a lot less convincing than p<1. Similarly, for the union bound, we have said nothing about the dependency relationships, and so we have to take D_i=[n] for each i. So we may draw the desired conclusion provided p\le 1/ne, which is obviously again a factor of e less than what we would have had with a union bound itself.

Now we’ll see how this might be useful when applied, for example, to a probabilistic construction for the lower bound on the Ramsey number R(k). From Erdos’s argument, we know R(k)\ge (1+o(1)) 2^{k/2}, and we will earn an extra factor of k\sqrt{2}/e. An extra factor of k/e can also be obtained by an alteration argument, but this extra factor of \sqrt{2} makes this LLL method one of the strongest available for this problem.

Recall that for a lower bound, we are trying to find examples of 2-edge-colourings of a large complete graph K_n, such that there is no monochromatic copy of a K_k. We consider the uniform independent random edge-colouring. It makes sense that a bad event A_S should be the presence of a monochromatic complete graph induced on a set S\subset [n], of size k. Since there are two potential colours for the monochromatic induced K_k, we have \mathbb{P}(A_S)=2^{1-\binom{k}{2}}. Then we take the dependency set D_S of A_S to include all those k-sets which share an edge with S, that is |S\cap T|\ge 2. We think about which vertices might contribute to the shared edge, and which make up the remainder to conclude |D_S|\le \binom{k}{2}\binom{n-2}{k-2}.

So now, whenever e\cdot 2^{1-\binom{k}{2}}\binom{k}{2}\binom{n-2}{k-2}\le 1, as a consequence of LLL we can conclude that with positive probability the random colouring gives a suitable example, that is R(k)>n. After fiddling around with Stirling’s formula in a fashion that isn’t hugely interesting, we can conclude R(k)\ge (1+o(1)) \frac{k\sqrt{2}}{2} 2^{k/2}.

The prompt for this article was a discussion during our Malaysian training camp of methods applicable to IMO 2014 Q6. If you want to know just how applicable LLL is, I suggest you have a think yourself. It’s not a direct application – so some thought is involved. Maybe as an appetiser, here are some more elementary exercises, which I’ve borrowed from examples on Po-Shen Loh’s olympiad handouts, and Wikipedia, though I doubt the latter is the original source:

1) 11n points on a circle are coloured with n colours, such that each colour is used exactly 11 times. Show that it’s possible to choose one point of each colour such that no pair are adjacent.

2) A finite graph is given, and all degrees are between 50 and 100. Find some finite C such that you can always colour the vertices of such a graph so that the neighbourhood of any vertex includes at least 20 colours.

Finally, we discuss the more general statement of LLL, and explain how the proof works.

General Lovasz Local Lemma: Suppose there exist x_i\in [0,1) such that \mathbb{P}(A_i)\le x_i \prod_{j\in D_i\backslash\{i\}} (1-x_j) (*). Then \mathbb{P}(\Cap A_i^c)\ge \prod (1-x_i)>0.

Deducing the symmetric form from the general form is not hard. Condition (*) is motivated by the proof. We want to be able to say that no matter which other events and their unions, complements etc we condition on, we still have an upper bound for the probability of A_i. This bound will be x_i. In particular, we want to show that the probability of bad event A_i does not get too high, even if we know that lots of other bad events have not occurred.

The proof proceeds by showing \mathbb{P}(A_i | \Cap_{j\in S}A_j^c)\le x_i for all i, by induction on |S|. For the inductive step, you split S=S_1\cup S_2 where S_1=S\cap D_i, S_2=S\cap D_i^c. If S_1=\varnothing, you are done straight away, by the assumption (*) and independence of A_i and the events not indexed by D_i. Otherwise, you can use the inductive hypothesis on S_2, and repeated Bayes’ theorem to show what you want in a series of steps that have a lot of notation, but aren’t hugely difficult.

Sharpness of Phase Transition in Percolation

On the flight to Romania a couple of weeks ago, I read this very nice paper by Duminil-Copin and Tassion in which, as a corollary to a more general result in a companion paper, they provide a short proof of a result about percolation on \mathbb{Z}^d. Mitch talked about this paper at our final Junior Probability Seminar of this term yesterday, so it seemed appropriate to write up something about this nice argument. I must confess I know relatively little about these problems, and in particular know nothing about how the original authors, Aizenmann + Barsky (1987), and Menshikov (1986) approached this problem, but experts have said that this is substantially easier to digest.

Rather than reference the first paper repeatedly, I remark now that everything which follows comes from there.

We consider conventional bond percolation on the edges of \mathbb{Z}^d, for d\ge 2, and are interested in whether the the origin percolates with positive probability. That is, that zero is contained in an infinite component. As usual we define p_c=\sup\{p: \mathbb{P}_p(0\leftrightarrow\infty)=0\} to be the critical probability above which percolation happens with positive probability. Defining \theta(p)=\mathbb{P}_p(0\leftrightarrow\infty), we do not know whether \theta(p_c)=0 for some values of d, notably d=3.

If the origin is connected to infinity, it is by definition connected to the boundary of every \Lambda_n:=[-n,n]^d. The number of distinct paths from the origin to \partial \Lambda_n is bounded by the number of self-avoiding walks on the lattice of length n starting from 0, which grows at most as fast as (2d-1)^n. In particular, we know that p_c\ge \frac{1}{2d-1}, but also, for any p<\frac{1}{2d-1}, the probability \mathbb{P}_p[0\leftrightarrow \partial\Lambda_n] decays exponentially in n. We would expect this in fact to hold for all p<p_c, and this is something that the authors prove, called Item 1. They also show that the percolation probability grows at least linearly beyond p_c, specifically (called Item 2)

\theta(p)\ge \frac{p-p_c}{p(1-p_c)}.

The proof here proceeds by considering the function of subsets S which contain the origin:

\varphi_p(S):= p\sum_{(x,y)\in\Delta S} \mathbb{P}_p[0\stackrel{S}{\leftrightarrow} x],\quad S\subset \mathbb{Z}^d.

In words, this gives the expected number of edges across the boundary which are connected to the origin by a path within S. So this gives a measure of how likely we are to escape S, and in particular, an upper bound on the probability that an open path exists from 0 to outside S. The authors then define the alternative critical probability

\tilde p_c := \sup_\{p\text{ s.t. }\exists\text{ finite }0\in S\text{ with }\varphi_p(S)<1\}.

They will show that \tilde p_c satisfies the statements of both Item 1 and Item 2. Item 2 for \tilde p_c implies \tilde p_c\le p_c, and Item 1 for \tilde p_c implies p_c\le \tilde p_c, so this is exactly what we need.

They show Item 1 first. We consider this set S for which \varphi_p(S)<1, and take some box \Lambda_L which strictly contains S. Now we consider the probability of escaping from a box of size kL. The reason why considering this definition of S works really nicely is that it makes it possible to split this event of escaping from \Lambda_{kL} into an event involving subjects of various disjoint sets of edges being open, so we can use independence.

We decompose the path from 0 to \partial\Lambda_{kL} based on the first time it leaves S. We are mindful that there might be lots of paths from from 0 to this boundary. The way we are bounding means it doesn’t matter if we have lots of suitable paths, but they should all spend a maximal number of steps in S, in the sense that whenever the path re-enters S, say to vertex z, there is no open path from 0 to z contained in S. Let the vertex on \partial S we leave from for the first time be x. Then, for all vertices y later in the path, 0\stackrel{S}{\not\leftrightarrow}y.

So under any suitable path, now take y to be the vertex directly following x, hence (x,y)\in\Delta S. If we take \mathcal{C} to be the set of vertices z for which 0\stackrel{S}{\leftrightarrow}z, we can split the expression based on S to obtain:

\mathbb{P}_p[0\leftrightarrow \partial \Lambda_{kL}]\le p \sum_{(x,y)\in\Delta S}\sum_{C\subset S} \mathbb{P}_p[0\stackrel{S}{\leftrightarrow} x,\mathcal{C}=C] \mathbb{P}_p[y\stackrel{\mathbb{Z}^d\backslash C}{\leftrightarrow}\partial\Lambda_{kL}].

Splitting based on C gives us independence between all of these sets of edges, but then we immediately forget about it. Irrespective of choice of y (recall, y\in S\subset \Lambda_L), this final probability is definitely bounded by \mathbb{P}_p[0\leftrightarrow \partial \Lambda_{(k-1)L}], while the p and the first term can be summed over C to give \varphi_p(S). They obtain:

\mathbb{P}_p[0\leftrightarrow \partial \Lambda_{kL}] \le \varphi_p(S)\mathbb{P}_p[y\leftrightarrow \partial \Lambda_{(k-1)L}] \le \varphi_p(S)^{k-1},

where the final relation holds by induction, and clearly gives exponential decay as required.

For Item 2 we use Russo’s formula. Here we have a slightly simpler example than the most general version, since the event under consideration A_n=\{0\leftrightarrow \partial\Lambda_n\} is increasing with respect to adding edges. It is also a function of a finite number of edges. Then we can consider \frac{d}{dp}\mathbb{P}_p[A] under the coupling which adds each edge independently as a Poisson process with (locally) rate \frac{1}{1-t}. (We take this to be the rate to avoid having to reparameterise exponentially between time and probability. Here t=p.)

Just for ease, we only consider the right-derivative at p. Then with \mathbb{P} as the law of the coupled process:

\frac{d}{dp}\mathbb{P}_p[A] \approx \frac{1}{\epsilon} \mathbb{P}[A\text{ holds at }p+\epsilon,\text{ but not at }p]

= \frac{1}{\epsilon} \sum_{e\in E}\mathbb{P}[e\text{ added between }p,p+\epsilon\text{ and }e\text{ completes event }A]

+ \frac{1}{\epsilon}\mathbb{P}[\text{two or more edges relevant to }A\text{ added}].

Since the number of edges whose states determine A is finite, this second term vanishes as \epsilon\rightarrow 0. So

=\frac{1}{\epsilon}\sum \frac{\epsilon}{1-p} \mathbb{P}(A \text{ doesn't hold at p, and edge }e\text{ pivotal at p}).

Taking the limit in \epsilon in this example gives

\frac{d}{dp}\mathbb{P}_p[0\leftrightarrow \partial\Lambda_n] = \frac{1}{1-p} \sum_{e\in\Lambda_n} \mathbb{P}_p[e\text{ pivotal, }0\not\leftrightarrow \partial \Lambda_n].

The argument then proceeds in a similar way to Item 1, decomposing \{0\not\leftrightarrow \partial \Lambda_n\} by conditioning on the set of vertices \mathcal{S} from which it is not possible to get to \partial \Lambda_n. In particular, this set is an excellent candidate to view as S, since on this event it contains 0 by definition. Once we have specified \mathcal{S} we know which edges might be pivotal, namely those across the boundary of \mathcal{S}. Crucially, the event \{\mathcal{S}=S\} only depends on those edges between the boundary of S and \partial \Lambda_n, so is independent of the event \{0\stackrel{S}{\leftrightarrow}x\} whenever x\in\partial \mathcal{S}. So applying this version of Russo gives

\frac{d}{dp}\mathbb{P}_p[0\leftrightarrow \partial\Lambda_n] = \frac{1}{1-p}\sum_{0\in S\subset \Lambda_n} \sum_{(x,y)\in\Delta S} \mathbb{P}_p[0\stackrel{S}{\leftrightarrow} x]\mathbb{P}_p[\mathcal{S}=S].

It is clear where \varphi_p(S) might turn up within the sum (after removing a factor of p), so for a bound we can take \inf_S \varphi_p(S) outside the sum, and arrive at

\frac{d}{dp}\mathbb{P}_p[0\leftrightarrow \partial\Lambda_n] \ge \frac{1}{p(1-p)}\inf_{0\in S\subset \Lambda_n} (1-\mathbb{P}_p[0\leftrightarrow \partial \Lambda_n].

It wasn’t immediately clear to me immediately that this implied the given form of Item 2 (though it certainly is consistent). I think perhaps I was trying to be too complicated and thinking about Gronwall’s lemma when in fact everything really follows from bounding \inf \varphi_p(S) below by 1 (since we have assumed p>\tilde p_c here), then integrating the differential inequality

\frac{d}{dp}\left[ \frac{p}{1-p}f(p) \right] = \frac{p}{1-p}f'(p)+\frac{1}{(1-p)^2}f(p) \ge \frac{1}{(1-p)^2}.

I include this not because it’s an interesting part of the argument (I don’t think it is really) but because I struggled to digest it first time round.

What is interesting is how well this choice to consider \varphi_p(S) works out. In both parts of the argument, sets which work well for splitting the crossing probabilities into disjoint edge events mesh nicely with considering this function after conditioning on sub- or super-criticality with respect to \tilde p_c.

An obvious remark

Aside

An obvious remark:

If a sequence of independent random variables X_n converge almost surely to some limit X, this limit must be a constant (almost surely).

I’ve been thinking about the Central Limit Theorem about related Large Deviations results this afternoon, and wasted almost an hour worrying about situations which were effectively well-disguised special cases of the above.

Why is it true? Well, suppose each X_n is \mathcal{F}_n-measurable. But by independence, we might as well take \mathcal{F}_n=\sigma(X_n). Then the limit variable X is independent of \mathcal{F}_n for all n, and thus independent of \cup F_n=\mathcal{F}\supset \sigma(X). If X is independent of itself, it must be almost surely constant.

Independence and Association

Back when we did GCSE probability, we gave a definition of independent events as:

A and B are said to be independent if \mathbb{P}(A)\mathbb{P}(B)=\mathbb{P}(A\cap B).

We might also apply Bayes’ definition of conditional probability to say

\mathbb{P}(A|B)=\mathbb{P}(A)\quad\iff\quad A,B\text{ independent}\quad\iff\quad\mathbb{P}(B|A)=\mathbb{P}(B)

provided all the terms exist. (Eg the definition of \mathbb{P}(B|A) is at the very least non-obvious if the probability of A is 0.) In my opinion, this is a more naturally intuitive definition. For example, I think that when you toss two coins, the fact that the probability of the second coin being a tail is unaffected by whether the first is heads is more naturally ‘obvious’ than the fact that the joint probability of the two events is 1/4.

But, before getting too into anything philosophical, it is worth thinking about an equivalent situation for non-independent events. We remark that by an identical argument to above:

\mathbb{P}(A|B)\geq\mathbb{P}(A)\quad\iff\quad \mathbb{P}(A\cap B)\geq\mathbb{P}(A)\mathbb{P}(B)\quad\iff\quad\mathbb{P}(B|A)\geq\mathbb{P}(B)

Informally, this says that if we know A occurs, it increases the likelihood of B occuring. If we were talking about two random variables, we might say that they were positively correlated. But of course, by considering RVs 1_A,1_B, the result above is precisely the statement that the indicator functions have positive correlation.

Aim: To find a sufficient condition for positive correlation of random variables in a product measure.

Consider the following. Suppose A is an event which is positively correlated with the appearance of each edge. We might suspect that two such events A and B would be positively correlated. Instead, we consider a more concrete description. Recall that an event A is a subset of \Omega=\{0,1\}^E. Given w\in\Omega,e\in E, we say w^e\in\Omega defined by taking w and setting edge e to be open (note it may be open already). Now, we say event A is increasing, if

\forall w\in\Omega,\forall e\in E: w\in A\Rightarrow w^e\in A.

Note that this certainly implies the property previously mentioned, but the converse is not necessarily true.

Anyway, our revised aim will be to show that increasing events A and B are positively correlated for product measure.

For now, we approach the problem from the other direction, namely we attempt to find which measures on \{0,1\}^E have the property that A and B are positively correlated for all increasing A, B. Note that as before, we can think of this as \mathbb{E}1_A1_B\geq\mathbb{E}1_A\mathbb{E}1_B, and again here it is useful to rephrase our framework in terms of random variables. There is a natural (product) partial ordering of \Omega=\{0,1\}^E, and from this there is an easy notion of increasing random variables. Recall a random variable is defined as a measurable map \Omega\rightarrow\mathbb{R} so no further work is required.

X is increasing if w\geq w'\Rightarrow X(w)\geq X(w').

So we clarify our aim, which is to find a condition on the measure \mu such that \mu(XY)\geq \mu(X)\mu(Y) for all increasing X, Y. When this occurs, we say \mu is positively associated. Note that this is equivalent to \mu(A\cap B)\geq \mu(A)\mu(B) for all increasing events A, B. Why? We can build up X and Y from increasing indicator functions like \{X\geq x\} in a usual monotone class argument.

On the way, we need a partial ordering on the set of probability measures. Obviously, if \mu(A)\leq \nu(A) for all events A, then in fact \mu=\nu! So instead we say \mu\leq_{st}\nu if \mu(A)\leq \nu(A) for all increasing A. This is called the stochastic ordering, and there is a technical result of Strassen, proving the intuitively obvious claim that if \mu_1\leq \mu_2, then we can couple the measures in a natural way. Formally:

Theorem: \mu_1\leq\mu_2 \iff \exists a probability measure \nu on \Omega^2 such that the marginals are \mu_1,\mu_2 and

\nu(\{(w_1,w_2):w_1\leq w_2\})=1.

Our main result will be the FKG inequality which asserts that when \mu satisfies the following FKG lattice property

\mu(w_1\vee w_2)\mu(w_1\wedge w_2)\geq \mu(w_1)\mu(w_2),\quad\forall w_1,w_2\in\Omega

then \mu is positively associated. We will prove the case |E|<\infty.

We proceed by showing that \mu_1\leq\mu_2\propto Y\mu_1, rescaled, for Y an increasing RV. [Note that we are now suppressing the ‘st’ subscript, as context makes the use clear.]

To show this, we prove the more general Holley’s Theorem:

This states that if two positive probability measures satisfy a related lattice condition:

\mu_2(w_1\vee w_2)\mu_1(w_1\wedge w_2)\geq \mu_1(w_1)\mu_2(w_2)\quad\forall w_1,w_2\in\Omega

then we have the stochastic domination result: \mu_1\leq \mu_2.

Note that the lattice condition states, very informally, that adding edges results in a greater relative increase with respect to the measure \mu_2, which has a natural similarity to the definition of stochastic domination.

We prove this, perhaps unexpectedly, by resorting to a Markov chain. We note that there is a Markov chain on \Omega with equilibrium distribution given by \mu_1. This is simple: the non-zero transition rates are those given by the addition or removal of a single edge. Assume that edges are added at unit rate, and that edges are removed with rate: G(w^e,w_e)=\frac{\mu_1(w_e)}{\mu_1(w^e)}.

Similarly, we can construct a Markov chain on state space \Omega^2, where non-zero transitions are given by the addition of an edge to both states in the pair, the removal of an edge from both states in the pair, and the removal of an edge from only the first edge in the pair. Note that, as before, we may be ‘adding’ an edge which is already present. Assuming we start in this set, this choice means that we are restricting the sample space to \{(\pi,w):\pi\leq w\}. We need the transition rate of the third type of transition to have the form: \frac{\mu_1(\pi_e)}{\mu_1(\pi^e)}-\frac{\mu_2(w_e)}{\mu_2(w^e)}. So the lattice condition precisely confirms that this is non-negative, and thus we have a well-constructed Markov chain. The marginals have equilibrium distributions \mu_1,\mu_2 by construction, and by the general theory of Markov chains, there is an equilibrium distribution, and this leaves us in precisely the right position to apply Strassen to conclude the result.#

Summary of consequences: We have demonstrated that product measure is positively associated, as it certainly satisfies the FKG condition. Recall that this is what we had suspected intuitively for reasons given at the start of this account. Next time, I will talk about the most natural companion result, the BK inequality, and the stronger Reimer’s Inequality.

References: Both the motivation and the material is derived from Prof. Grimmett’s Part III course, Percolation and Related Topics, which was one of the mathematical highlights of the year. This account of the subject is a paraphrase of his lecture notes, which were themselves based on his book Probability on Graphs. Mistakes, naturally, are mine. Background on the course, and an online source of the book can be found on the course website here.