tightness | Eventually Almost Everywhere

I am aiming to write a short post about each lecture in my ongoing course on Random Graphs. Details and logistics for the course can be found here.

By this point of the course, we’ve studied several aspects of the Erdos-Renyi random graph, especially in the sparse setting $G(n,\frac{\lambda}{n})$ . We’ve also taken a lengthy detour to revise Galton-Watson trees, with a particular focus on the case of Poisson offspring distribution.

This is deliberate. Note that a given vertex v of $G(n,\frac{\lambda}{n})$ has some number of neighbours distributed as $\mathrm{Bin}(n-1,\frac{\lambda}{n})\stackrel{d}\approx\mathrm{Po}(\lambda)$ , and the same approximation remains valid as we explore the graph (for example in a breadth-first fashion) either until we have seen a large number of vertices, or unless some ultra-pathological event happens, such as a vertex having degree n/3.

In any case, we are motivated by the notion that the local structure of $G(n,\frac{\lambda}{n})$ is well-approximated by the Galton-Watson tree with $\mathrm{Po}(\lambda)$ offspring, and in this lecture and the next we try to make this notion precise, and discuss some consequences when we can show that this form of convergence occurs.

Deterministic graphs

Throughout, we will be interested in rooted graphs, since by definition we have to choose a root vertex whose local neighbourhood is to be studied. Usually, we will study a sequence of rooted graphs $(G_n,\rho_n)$ , where the vertex set of $G_n$ is [n], or certainly increasing in n (as in the first example).

For some rooted graph $(G,\rho)$ , we say such a sequence $(G_n,\rho_n)$ converges to $(G,\rho)$ locally if for all radii $r\ge 1$ , we have $B_r^{G_n}(\rho_n)\simeq B_r^G(\rho)$ . In words, the neighbourhood around $\rho_n$ in $G_n$ is the same up to radius r as the neighbourhood around $\rho$ in $G$ , so long as n is large enough (for given r).

This is best illustrated by an example, such as $T_n$ , the binary tree to depth n.

If we take $\rho_n$ to be the usual root, then the trees are nested, and converge locally to the infinite binary tree $T_\infty$ . Slightly less obviously, if we take $\rho_n$ to be one of the leaves, then the trees are still nested (up to labelling – ie in the sense of isomorphisms of rooted trees), and converge locally to the canopy tree, defined by a copy of $\mathbb{Z}_{\ge 0}$ with nearest-neighbour edges, and where each vertex $n\ge 1$ is connected to the root of a disjoint copy of $T_{n-1}$ , as shown below:

Things get more interesting when the root is chosen randomly, for example, uniformly at random, as this encodes more global information about the graphs $G_n$ . In the case where the $G_n$ are vertex-transitive, then if we only care about rooted graphs up to isomorphism, then it doesn’t matter how we choose the root.

Otherwise, we say that $G_n$ converges in the local weak sense to $(G,\rho)$ if, for all $r\ge 1$ and for all rooted graphs $(H,\rho_H)$ ,

$\mathbb{P}\left( B^{G_n}_r(\rho_n)\simeq (H,\rho_H) \right) \longrightarrow \mathbb{P}\left( B_r^G(\rho)\simeq H\right),$

as $n\rightarrow\infty$ .

Alternatively, one can phrase this as a result about convergence of rooted-graph-valued distributions.

A simple non-transitive example is $G_n\simeq P_n$ , the path of length n. Then, the r-neighbourhood of a vertex is isomorphic to $P_{2r}$ , unless that vertex is within graph-distance (r-1) of one of the leaves of $G_n$ . As $n\rightarrow\infty$ , the proportion of such vertices vanishes, and so, $\mathbb{P}\left( B^{P_n}_r(\rho_n)\simeq P_{2r}\right)\rightarrow 1$ , from which we conclude the unsurprising result that $P_{n}$ converges in the local weak sense to $\mathbb{Z}$ . (Which is vertex-transitive, so it doesn’t matter where we select the root.)

The binary trees offer a slightly richer perspective. Let $\mathcal{L}_n$ be the set of leaves of $T_n$ , and we claim that when $\rho_n$ is chosen uniformly from the vertices of $T_n$ , then $d_{T_n}(\rho_n,\mathcal{L}_n)$ converges in distribution. Indeed, $\mathbb{P}\left( d_{T_n}(\rho_n,\mathcal{L}_n)=k\right) = \frac{2^{n-k}}{2^{n+1}-1}$ , whenever $n\ge k$ , and so the given distance converges in distribution to the Geometric distribution with parameter 1/2 supported on {0,1,2,…}.

This induces a random local weak limit, namely the canopy tree, rooted at one of the vertices we denoted by $\mathbb{Z}_{\ge 0}$ , with the choice of this vertex given by Geometric(1/2). Continue reading →

This post continues the theme of revising topics in the analytic toolkit relevant to proving convergence of stochastic processes. Of particular interest is the question of how to prove that families of Markov chains might have a process scaling limit converging to a solution of some stochastic differential equation, in a generalisation of Donsker’s theorem for Brownian motion. In this post, however, we address more general aspects of convergence of stochastic processes, with particular reference to Skorohod space.

Topological Background

I’ve discussed Skorohod space in a previous post. For now, we focus attention on compactly supported functions, D[0,T]. Some of what follows can be extended to the infinite-time setting easily, and some requires more work. Although we can define a metric on the space of cadlag functions in lots of ways, it is more useful to think topologically, or at least with a more vague sense of metric. We say two cadlag functions are close to one another if there is a reparameterisation of the time-axis, (a function [0,T] to itself) that is uniformly close to the identity function, and when applied to one of the cadlag functions, brings it close to the other cadlag function. Heuristically, two cadlag functions are close if their large jumps are close to one another and of similar size, and if they are uniformly close elsewhere. It is worth remembering that a cadlag function on even an unbounded interval can have only countably many jumps, and only finitely many with magnitude greater than some threshold on any compact interval.

For much of the theory one would like to use, it is useful for the spaces under investigation to be separable. Recall a topological space is separable if there exists a countable dense subset. Note in particular that D[0,T] is not separable under the uniform metric, since we can define $f_x(\cdot)=\mathbf{1}_{(\cdot \ge x)}$ for each $x\in[0,T]$ , then $||f_x-f_y||_\infty=1$ whenever $x\ne y$ . In particular, we have an uncountable collection of disjoint open sets given by the balls $\mathcal{B}(f_x,\frac12)$ , and so the space is not countable. Similarly, $C[0,\infty)$ is not separable. A counterexample might be given by considering functions which take the values {0,1} on the integers. Thus we have a map from $\{0,1\}^{\mathbb{N}}\rightarrow C[0,\infty)$ , where the uniform distance between any two distinct image points is at least one, hence the open balls of radius 1/2 around each image point give the same contradiction as before. However, the Stone-Weierstrass theorem shows that C[0,T] is separable, as we can approximate any such function uniformly well by a polynomial, and thus uniformly well by a polynomial with rational coefficients.

In any case, it can be shown that D[0,T] is separable with respect to the natural choice of metric. It can also be shown that there is a metric which gives the same open sets (hence is a topologically equivalent metric) under which D[0,T] is complete, and hence a Polish space.

Compactness in C[0,T] and D[0,T]

We are interested in tightness of measures on D[0,T], so first we need to address compactness for sets of deterministic functions in D[0,T]. First, we consider C[0,T]. Here, the conditions for a set of functions to be compact is given by the celebrated Arzela-Ascoli theorem. We are really interested in compactness as a property of size, so we consider instead relative compactness. A set is relatively compact (sometimes pre-compact) if its closure is compact. For the existence of subsequential limits, this is identical to compactness, only now we allow the possibility of the limit point lying outside the set.

We note that the function $C[0,T]\rightarrow \mathbb{R}$ given by $||f||_\infty$ is continuous, and hence uniform boundedness is certainly a required condition for compactness in C[0,T]. Arzela-Ascoli states that uniform boundedness plus equicontinuity is sufficient for a set of such functions to be compact. Equicontinuity should be thought of as uniform continuity that is uniform among all the functions in the set, rather than just within the argument of an individual particular function.

For identical reasons, we need uniform boundedness for relative compactness in D[0,T], but obviously uniform continuity won’t work as a criterion for discontinuous functions! We seek some analogue of the modulus of continuity that ignores jumps. We define

$\omega'_\delta(f):=\inf_{\{t_i\}} \max_i \sup_{s,t\in[t_{i-1},t_i)} |f(s)-f(t)|,$

where the infimum is taken over all meshes $0=t_0<t_1<\ldots<t_r$ with $t_i-t_{i-1}>\delta$ . Note that as $\delta\downarrow 0$ , we can, if we want, place the $t_i$ so that large jumps of the function f take place over the boundaries between adjacent parts of the mesh. In particular, for a given cadlag function, it can be shown fairly easily that $\omega'_\delta(f)\downarrow 0$ as $\delta\rightarrow 0$ . Then, unsurprisingly, in a similar fashion to the Arzela-Ascoli theorem, it follows that a set of functions $A\subset D[0,T]$ is relatively compact if it is uniformly bounded, and

$\lim_{\delta\rightarrow 0} \sup_{f\in A}\omega'_\delta(f)=0.$

Note that this ‘modulus of continuity’ needs to decay uniformly across the set of functions, but that we do not need to choose the mesh at level $\delta$ uniformly across all functions. This would obviously not work, as then the functions $\mathbf{1}_{(\cdot\ge x_n)}$ for any sequence $x_n\rightarrow x$ would not be compact, but they clearly converge in Skorohod space!

Tightness in C[0,T] and D[0,T]

Naturally, we are mainly interested in (probability) measures on D[0,T], and in particular conditions for tightness on this space. Recall a family of measures is tight if for any $\epsilon>0$ , there exists some compact set A such that

$\pi(A)>1-\epsilon,\quad \forall \pi\in\Pi.$

So, for measures $(\mu_n)$ on D[0,T], the sequence is tight precisely if for any $\epsilon>0$ , there exists $M,\delta$ and some N such that for any n>N, both

$\mu_n(||f||_\infty >M)\le \epsilon,\quad \mu_n(\omega'_\delta(f)>\epsilon)\le \epsilon$

hold. In fact, the second condition controls variation sufficiently strongly, that we can replace the first condition with

$\mu_n(|f(0)|>M)\le \epsilon.$

Often we might be taking some sort of scaling limit of these processes in D[0,T], where the jumps become so small in the limit that we expect the limit process to be continuous, perhaps an SDE or diffusion. If we can replace $\omega'_\delta$ by $\omega_\delta$ , the standard modulus of continuity, then we have the additional that any weak limit lies in C[0,T].

In general, to prove convergence of some stochastic processes, we will want to show that the processes are tight, by demonstrating the properties above, or something equivalent. Then Prohorov’s theorem (which I tend to think of as a probabilistic functional version of Bolzano-Weierstrass) asserts that the family of processes has a weak subsequential limit. Typically, one then shows that any weak subsequential limit must have the law of some particular random process. Normally this is achieved by showing some martingale property (eg for an SDE) in the limit, often by using the Skorohod representation theorem to use almost sure subsequential convergence rather than merely weak convergence. Then one argues that there is a unique process with this property and a given initial distribution. So since all weak subsequential limits are this given process, in fact the whole family has a weak limit.

Where Do Probability Measures Come From?

Eventually Almost Everywhere

A blog about probability and olympiads by Dominic Yeo

Tag Archives: tightness

Lecture 6 – Local limits

Tightness in Skorohod Space

Related articles