DGFF 1 – The discrete Gaussian free field from scratch

I’ve moved to Haifa in northern Israel to start a post-doc in the probability group at the Technion, and now that my thesis is finished I want to start blogging again. The past couple of weeks have been occupied with finding an apartment and learning about the Discrete Gaussian Free Field. All questions about the apartment are solved, but fortunately lots remain open about the DGFF, so I thought I’d write some background about this object and methods which have been used to study it.

Background – Random walk bridge

When we think of a random walk, we usually think of the index as time, normally going forwards. So for a random walk bridge, we might assume Z_0=0, and then condition on Z_N=0, thinking of this as a demand that the process has returned to zero at the future time. In some applications, this is the ideal intuition, but in others, it is more useful to think of the random walk bridge

(0=Z_0,Z_1,\ldots,Z_{N-1},Z_N=0),

as a random height function indexed by [0,N], where the probability of a given path decomposes naturally into a product depending on the N increments, up to a normalising constant.

Naturally, we are interested in the asymptotic behaviour of such a random walk bridge when N\rightarrow\infty. So long as the step distribution has finite variance, a conditioned version of Donsker’s theorem shows that the rescaled random walk bridge converges in distribution to Brownian bridge. Note that Brownian bridge

(B^{\mathrm{br}}_t, t\in[0,1])

can be constructed either by conditioning a standard Brownian motion B to return to zero at time one (modulo some technicalities – this event has zero probability), or by applying an appropriate (random) linear shift

B^{\mathrm{br}}(t):= B(t) - tB(1). (*)

It is not too hard to calculate the distribution of B^{\mathrm{br}}(t) for each t\in[0,1], and with a bit more work, one can calculate the joint distribution of (B^{\mathrm{br}}(s),B^{\mathrm{br}}(t)). In particular, the joint distribution is multivariate Gaussian, and so everything depends on the covariance ‘matrix’ (which here is indexed by [0,1]).

So if we return to a random walk bridge what should the step distribution be? Simple symmetric RW is a natural choice, as then lots of the quantities we might want to consider boil down to combinatorial calculations. Cleverness and Stirling’s formula can often get us useful asymptotics. But there are lots of inconveniences, not least the requirement to be careful about parity (N has to be even for a start unless you make the walk lazy, in which case the combinatorics becomes harder), and even if these can be overcome in a given calculation, it would be better not to have this.

The claim is that the random walk with Gaussian increments is by far the easiest to analyse asymptotically. As a further heuristic, think about the statement of the central limit theorem in the case where the underlying distribution is normal: it’s true but obvious. [Indeed, it’s my favourite piece of advice to anyone taking second year probability exams to check that your proposed statement of CLT does actually work for N(\mu,\sigma^2)…] More concretely, if a RW has Gaussian increments, then the path (Z_1,\ldots,Z_N) is a multivariate normal, or a Gaussian process with finite index set. In particular, covariances define the distribution. It remains a Gaussian process after conditioning on Z_N=0, and the linear tilting argument at (*) remains true here, and can indeed be applied to turn any boundary conditions into any other boundary conditions.

The discrete Gaussian free field

We know how to generalise the domain of a random walk to higher dimensions. But what generalising the index to higher dimension? So now there is definitely no arrow of time, and the notion of a random height function above \mathbb{Z}^2 (or a subset of it) is helpful, for which a scaling limit might be a random surface rather than Brownian motion.

Because we can’t well-order \mathbb{Z}^d, it’s harder to define any such random object on the entire lattice immediately, so we start with compact connected subsets, with zero boundary conditions, as in the one-dimensional case of random walk bridge. Formally, let D be a finite subset of \mathbb{Z}^d, and the boundary \partial D those elements of D^c which are adjacent to an element of D, and let \bar D:= D\cup \partial D.

Then, the discrete Gaussian free field on D is a random real vector h^D=(h^D_x: x\in \bar D), with probability density proportional to

\mathbf{1}\{h^D_x=0, x\in\partial D\}\exp\left ( - \frac{1}{4d} \sum_{x\sim y}(h^D_x - h^D_y)^2 \right), (1)

where we write x\sim y if that x,y are adjacent in \bar D. We won’t at any stage worry much about the partition function which normalises this pdf. Note also that \frac{1}{4d} is just a convenient choice of constant, which corresponds to one of the canonical choices for the discrete Laplacian. Adjusting this constant is the same as uniformly rescaling the values taken by the field.

The immediate interpretation of (1) is that the values taken by the field at vertices which are close to each other are positively correlated. Furthermore, the form of the density is Gaussian. Concretely, if the values of h^D are fixed everywhere except one vertex x\in D, then the conditional distribution of h^D_x is Gaussian. Later, or in subsequent posts, we will heavily develop this idea. Alternatively, we could if we really wanted describe the model in terms of independent Gaussians describing the ‘increment’ along each edge in D (which we should direct), subject to a very large number of conditions, namely that the sum of increments along any directed cycle is zero. This latter description might be more useful if you wanted to define a DGFF on a more sparse graph, but won’t be useful in what follows.

Note that we can rearrange the Laplacian in (1) in terms of the transition kernel p( ) of the simple random walk of D to obtain

\exp\left( -\frac12 (h^D)^T (\mathbf{P}-\mathbf{1})h^D \right),

where P_{x,y}=p(y-x) is the transition matrix of SRW on D. In particular, this means that the free field is Gaussian, and we can extract the covariances via

\mathrm{Cov}(h^D_x,h^D_y) = \left[ (\mathbf{1}-\mathbf{P})^{-1}\right]_{x,y}

= \left[\sum_{n\ge 0} \mathbf{P}^n\right]_{x,y} = \sum_{n\ge 0} \mathbb{P}_x\left[X_n=y,\tau_{\partial D}>n\right],

where, under \mathbb{P}_x, (X_0,X_1,\ldots) is simple random walk started from x.

This final quantity records the expected number of visits to y before leaving the domain D, for a random walk started at x, and is called the Green’s function.

In summary, the DGFF on D is the centred Gaussian random vector indexed by \bar D with covariance given by the Green’s function G_D(x,y).

How many of these equivalences carries over to more general D-indexed random fields is discussed in the survey paper by Velenik. But it’s worth emphasising that having the covariance given by the Green’s function as in the definition we’ve just given is a very nice property, as there are lots of pre-existing tools for calculating these. By contrast, it’s hard to think of a natural model for an integer-valued surface of this kind, as an analogue to SRW.

[Though definitely not impossible. The nicest example I’ve heard of is for height functions of large uniform domino tilings within their ‘arctic circle’, which have GFF asymptotics. See this paper by Kenyon.]

A continuous limit?

We motivated the discussion of random walk bridge by the limit object, namely Brownian bridge. Part of the reason why the DGFF is more interesting than Gaussian random walk bridge, is that the limit object, the (continuum) Gaussian free field is hard to define classically in two dimensions.

We might suppose that the DGFF in V_N, the square box of width N has some scaling limit as N\rightarrow\infty. However, for fixed x,y\in [0,1]^2, (and taking integer parts component-wise), well-known asymptotics for SRW in a large square lattice (more on this soon hopefully) assert that

\mathrm{Cov}(h^{V_N}_{\lfloor Nx \rfloor},h^{V_N}_{\lfloor Ny\rfloor}) \sim \log |x-y|, (2)

and so any scaling limit will rescale only the square domain, not the height (since there is no N on the RHS of (2)). However, then the variance of the proposed limit is infinite everywhere.

So the GFF does not exist as a random height function on [0,1]^2, with the consequence that a) more care is needed over its abstract definition; b) the DGFF in 2D on a large square is an interesting object, since it does exist in this sense.

What makes it ‘free’?

This seemed like a natural question to ask, but I’ve received various answers. Some sources seem to suggest that having zero boundary condition is free. Other sources refer to the Hamiltonian (that is the term inside the exponential function at (1) ) as free since it depends only on the increments between values. If the Hamiltonian also depends on the heights themselves, for example via the addition of a \sum_{x} \Psi(h^D_x) term, then for suitable choice of function \Psi, this is interpreted as a model where the particles have mass. The physical interpretation of these more general Gibbs measures is discussed widely, and I’m not very comfortable with it all at the moment, but aim to come back to it later, when hopefully I will be more comfortable.

Isolated Vertices and the Second-Moment Method

It’s back-to-school day or week for much of the UK. I’m sure for many, this brings resolutions of better work habits, and while I could always use some of those too as I try to finish my thesis, I also want to start blogging again when possible.

Right now, I’m in Haifa for the Mostly Markov Mixing summer school and workshop hosted at the Technion. The talks have been interesting so far, and the environment stimulating both for the discussion of new problems, and getting work done on existing research.

20150904_104109I wrote much of this post a while ago, after some of the UK olympiad students asked me to tell them about second-moment methods and I politely declined. I was reminded of this by an interesting problem introduced by Elchanan Mossel in the first lecture of his series. I’ll start with this.

Suppose we consider a symmetric inhomogeneous random graph with two types. That is, we divide the vertices into two equally-sized classes, and we connect vertices in the same class with probability p, and vertices in different classes with probability q, all independently. The question is: if we can see the resulting graph structure, with what accuracy can we recover the class division? [Note: this setup with symmetry between the types can be called the block model.]

This is a hard problem, and got us all thinking a great deal. In the most relevant regime, even the most sophisticated techniques do not allow us to identify the partition perfectly with high probability as the number of vertices goes to infinity. For now, though, I only want to use the most uninteresting regime as a starting point for the rest of this post. EM asked: what happens if we take sparse scaling, that is p,q=\Theta(1/N), where N is the number of vertices? Do we have a chance to identify the classes correctly?

Well in this case the answer is easy, because it is ‘no’, and the reason is that in sparse random graphs, there is a positive proportion of isolated vertices. In particular, there is a positive proportion of isolated vertices of each type. And so, when we see an isolated vertex, for large N we can specify accurately the distribution of each type given this extra information, but in doing so, we admit that we certainly cannot partition the isolated vertices into their classes. So for the rest of the talk, EM focused on the regime where the random graph is connected with high probability.

There are two ideas with worthwhile and simple content here. Firstly, the fact that such a random graph has a positive proportion of isolated vertices. I am finishing off a result in a model where the base structure is the inhomogeneous random graph, and have just now proved a short lemma showing exactly this in a slightly more complicated context. It’s a good example of the second-moment method. Secondly, implicit in the final statement of the previous paragraph is that the absence of isolated vertices and connectivity are roughly equivalent in a random graph. This is something I’ve talked about briefly before, and in the interests of keeping the resolution and actually finishing this post, I won’t talk about it here.

Earlier this year, I gave a short talk by request to the UK olympiad students about first-moment methods. In particular, they were hoping that they might occasionally want to apply such approaches to the sort of combinatorics problems you encounter in olympiads. Typically, the best examples in this setting involve demonstrating the existence of a set without certain bad properties, by showing that the probability that all bad properties hold simultaneously is strictly less than one.

The students asked why I wasn’t talking about second-moment methods. Firstly, the models (such as this one) which are the best examples are less familiar to the students, and are really rooted in the randomness. They can’t easily be turned into a combinatorics problem. Secondly, looking for lower bounds in probability suggests we are aiming for convergence in probability, rather than convergence in expectation, and this is a distinction that is unlikely to be appreciated before some undergraduate probability courses have been taken.

Anyway, we want to show that the proportion of isolated vertices in G(n,c/n) is bounded below in probability as n\rightarrow\infty. All the content of the argument is seen in this classical Erdos-Renyi setting. Any inhomogeneous example will merely demand extra notation.

So, first we deal with the expected number of isolated vertices. The event that a given vertex v is isolated demands that the n-1 edges potentially incident to v do not appear. The probability of this is (1-\frac{c}{n})^{n-1} \rightarrow e^{-c}. Thus the expected number of such vertices is ne^{-c}. We now want to show that the fluctuations (standard deviation if you will) of this quantity are small relative to its mean. To bound the variance, we look at pairs of vertices, v and w. Note that the events {v isolated} and {w isolated} are not independent, since knowing that v is isolated tells us that the edge from v to w is not present, which slightly increases the chance of w being isolated. (For a very concrete example, think of the case n=2.)

However, in a large graph, the events are almost independent. That’s a statement which we feel is true, but we need to quantify, so we ask for the probability that {v and w isolated}. This happens precisely if none of the edges incident to either v or w are present. There are 2n-3 such edges, and so the probability of this is (1-\frac{c}{n})^{2n-3}\rightarrow e^{-2c}. So now we can write

\mathbb{E}\left[ \mathbf{1}(1\text{ isolated})+\ldots+\mathbf{1}(n\text{ isolated})\right]^2= \sum_{k=1}^n \mathbb{E}\mathbf{1}(k\text{ isolated})^2 + 2\sum_{j\ne k} \mathbb{E}\mathbf{1}(j,k\text{ isolated}).

An indicator function takes the values 0 and 1, and so in the first sum we can remove the square sign to leave us with the expected number of isolated vertices. Secondly, the vertices are exchangeable and so we can replace each summand with the value we have already established for v and w. We can now calculate the variance of the number of isolated vertices, and we see that it is o(n^2). With a little bit more care about the limits in n, we can check it is actually O(n). In particular, the variance of the proportion of isolated vertices is tends to zero.

For explicit lower bounds in probability on the proportion of isolated vertices we could appeal to Chebyshev’s inequality. However, since the variance vanishes, we have convergence in distribution to a constant, and thus convergence in probability.

Finally, a word on the end of EM’s talk. Having said that the sparse phase is not interesting because it is impossible, we might ask about the dense phase, where p and q are fixed. Just a for concrete example, suppose the probability of connection within the class is ½, and between classes is 1/3. Thus between any pair of vertices in the same class we expect to see roughly N/8+N/18= 13N/72 paths of length 2. The summands correspond to the middle vertex of the path in the same class, and in the opposite class respectively. However, between any pair of vertices in different classes, we expect to see roughly N/6 paths of length 2 for similar reasons. Both of these quantities will be highly concentrated on their means: consider the second-moment as the existence of each possible path is independent. Indeed, the chance that we see a proportion of paths closer to N/6 when it should be 13N/72 is a large deviations event, and so has exponential decay. As a result, the chance that we get the relative positions of any pair of vertices wrong with this method vanishes for large N.

In fact, the condition that (for p<q),

N \mathbb{P}(\text{Bin}(N-1,p)\ge \text{Bin}(N-1,q)) \rightarrow 0

should be enough to identify the partition with high probability, and indeed this is proved by several authors including EM. Note that the dense regime comfortable satisfies this condition, since it holds even without the factor of N. (The sparse regime, completely fails as the probability is roughly constant.) Even closer to the connectivity threshold remains interesting!