Weak Convergence and the Portmanteau Lemma

Much of the theory of Large Deviations splits into separate treatment of open and closed sets in the rescaled domains. Typically we seek upper bounds for the rate function on closed sets, and lower bounds for the rate function on open sets. When things are going well, these turn out to be same, and so we can get on with some applications and pretty much forget about the topology underlying the construction. Many sources made a comment along the lines of “this is natural, by analogy with weak convergence”.

Weak convergence is a topic I learned about in Part III Advanced Probability. I fear it may have been one of those things that leaked out of my brain shortly after the end of the exam season… Anyway, this feels like a good time to write down what it is all about a bit more clearly. (I’ve slightly cheated, and chosen definitions and bits of the portmanteau lemma which look maximally similar to the Large Deviation material, which I’m planning on writing a few posts about over the next week.)

The motivation is that we want to extend the notion of convergence in distribution of random variables to general measures. There are several ways to define convergence in distribution, so accordingly there are several ways to generalise it. Much of what follows will be showing that these are equivalent.

We work in a metric space (X,d) and have a sequence (\mu_n) and \mu of (Borel) probability measures. We say that (\mu_n) converges weakly to \mu, or \mu_n\Rightarrow\mu if:

\mu_n(f)\rightarrow\mu(f), \quad\forall f\in\mathcal{C}_b(X).

So the test functions required for result are the class of bounded, continuous functions on X. We shall see presently that it suffices to check a smaller class, eg bounded Lipschitz functions. Indeed the key result, which is often called the portmanteau lemma, gives a set of alternative conditions for weak convergence. We will prove the equivalence cyclically.

Portmanteau Lemma

The following are equivalent.

a) \mu_n\Rightarrow \mu.

b) \mu_n(f)\rightarrow\mu(f) for all bounded Lipschitz functions f.

c) \limsup_n \mu_n(F)\leq \mu(F) for all closed sets F. Note that we demanded that all the measures be Borel, so there is no danger of \mu(F) not being defined.

d) \liminf_n \mu_n(F)\geq \mu(G) for all open sets G.

e) \lim_n \mu_n(A)=\mu(A) whenever \mu(\partial A)=0. Such an A is called a continuity set.

Remarks

a) All of these statements are well-defined if X is a general topological space. I can’t think of any particular examples where we want to use measures on a non-metrizable space (eg C[0,1] with topology induced by pointwise convergence), but there seem to be a few references (such as the one cited here) implying that the results continue to hold in this case provided X is locally compact Hausdorff. This seems like an interesting thing to think about, but perhaps not right now.

b1) This doesn’t strike me as hugely surprising. I want to say that any bounded continuous function can be uniformly approximated almost everywhere by bounded Lipschitz functions. Even if that isn’t true, I am still not surprised.

b2) In fact this condition could be replaced by several alternatives. In the proof that follows, we only use one type of function, so any subset of \mathcal{C}_b(X) that contains the ones we use will be sufficient to determine weak convergence.

c) and d) Why should the sign be this way round? The canonical example to have in mind is some sequence of point masses \delta_{x_n} where x_n\rightarrow x in some non-trivial way. Then there is some open set eg X\{x} such that \mu_n(X\backslash x)=1 but \mu(X\backslash x)=0. Informally, we might say that in the limit, some positive mass could ‘leak out’ into the boundary of an open set.

e) is then not surprising, as the condition of being a continuity set precisely prohibits the above situation from happening.

Proof

a) to b) is genuinely trivial. For b) to c), find some set F’ containing F such that \mu(F')-\mu(F)=\epsilon. Then find a Lipschitz function f which is 0 outside F’ and 1 on F. We obtain

\limsup_n \mu_n(F)\leq \limsup \mu_n(f)=\mu(f)\leq \mu(F').

But \epsilon was arbitrary, so the result follows as it tends to zero. c) and d) are equivalent after taking F^c=G. If we assume c) and d) and apply them to A^\circ, \bar{A}, then e) follows.

e) to a) is a little trickier. Given a bounded continuous function f, assume WLOG that it has domain [0,1]. At most countably many events \{f=a\} have positive mass under each of \mu, (\mu_n). So given M>0, we can choose a sequence

-1=a_0<a_1<\ldots<a_M=2, such that |a_{k+1}-a_k|<\frac{1}{M},

and \mu(f=a_k)=\mu_n(f=a_k)=0 for all k,n. Now it is clear what to do. \{f\in[a_k,a_{k+1}]\} is a continuity set, so we can apply e), then patch everything together. There are slightly too many Ms and \epsilons to do this sensibly in WordPress, so I will leave it at that.

I will conclude by writing down a combination of c) and d) that will look very familiar soon.

\mu(A^\circ)\leq \liminf_n \mu_n(A)\leq \limsup_n\mu_n(A)\leq \mu(\bar{B}).

References

Apart from the Part III Advanced Probability course, this article was prompted by various books on Large Deviations, including those by Frank den Hollander and Ellis / Dupuis. I’ve developed the proof above from the hints given in the appendix of these very comprehensible notes by Rassoul-Agha and Seppalainen.

Advanced Probability Revision Summary

Aside

My plan to write a post a day about a likely piece of Advanced Probability bookwork has been more useful than even I anticipated. It’s surprising how often you come to write something down and realise you don’t understand half the steps. Anyway, here’s a list, in approximately chronological order through the course, of what I’ve covered:

Recurrence and Transience of BM

In this post, we consider Brownian motion as a Markov process, and consider the recurrence and transience properties in several dimensions. As motivation, observe from Question 5 of this exam paper that it is a very much non-trivial operation to show that Brownian motion in two-dimensions almost surely has zero Lebesgue measure. We would expect this to be true by default, as we visualise BM as a curve. So it is interesting to see how much we can deduce without significant technical analysis. We will make use of Ito’s formula. This material comes from the Part III course Advanced Probability, which doesn’t explicitly mention Ito’s result, and instead proves the result required separately, making use of the nature of solutions to the diffusion equation. In this context we assume that for f\in C_b^{1,2}:

M_t:=f(t,B_t)-f(0,B_0)-\int_0^t(\frac{\partial}{\partial t}+\frac12\Delta)f(s,B_s)ds

is a martingale. Of course, precisely, from Ito’s formula, this can be expressed as the stochastic integral of a bounded function with respect to Brownian motion, which is therefore a (continuous local, but bounded) martingale.

d=1: In one dimension, BM is point-recurrent. This means that almost surely, BM returns to zero infinitely many times. This is easiest shown by using the time-reversal equivalence to deduce that \lim\sup B_t=-\lim\inf B_t=\infty.

d=2: BM is two dimensions is point-transient. That means that the probability of returning to a given point is <1. In fact it is 0, as one might suspect from the fact that BM is space-invariant and, intuitively at least, has measure 0. However, it is neighbourhood-recurrent, meaning that it almost surely returns to a ball around a given point infinitely often. We discuss small balls around 0, but obviously the conclusions apply equally well elsewhere.

The aim is to choose a function so that the expression in Ito’s formula as above is as simple as possible. Taking f a function of space alone and harmonic causes the integral term to vanish. In this case, f(y)=\log|y| will suffice. Obviously we have to restrict attention to \epsilon\leq |y|\leq R. We stop M at T_\epsilon\wedge T_R, that is the first time that the BM hits the boundary of the annulus on which f is defined, and apply OST, since \log|B_t| is bounded here and the stopping time is a.s. finite. We obviously have to assume the BM starts in this annulus, but then we obtain:

\mathbb{E}_x\log|B_{T_\epsilon\wedge T_R}|=\log|x|

and so we can consider the two possibilities for B_{T_\epsilon\wedge T_R} to deduce:

\mathbb{P}_x(T_\epsilon<T_R)=\frac{\log R-\log|x|}{\log R-\log\epsilon}

Now let \epsilon\downarrow 0 to see that \mathbb{P}_x(B_t=0,\text{ some }t>0)=0. Now apply the (weak) Markov property at a small fixed time a, to deduce, with a mild abuse of notation:

\mathbb{P}_0(B_t=0,\text{ some }t>a)=\int \mathbb{P}_x(B_t=0,t>0)\mathbb{P}_0(B_a=dx)=0

as the first term in the integral we have shown to be 0 B_a-a.e. Then let a\downarrow 0 to obtain the result about point-transience.

For neighbourhood recurrence, instead let R\uparrow\infty, so \mathbb{P}_x(T_\epsilon<\infty)=1. As before, can integrate over law of B_n to obtain

\mathbb{P}_0(|B_t|\leq \epsilon,\text{ some }t\geq n)=1

which is precisely what we require for transience.

d=>3: BM is transient. That is, |B_t|\rightarrow\infty a.s. Note that for d>3, the first three components have the same distribution as BM in three dimensions, and so it suffices to consider the case d=3.

Here, the correct choice of harmonic function is f(y)=\frac{1}{|y|}, so conclude as before that

\mathbb{P}_x(T_\epsilon<T_R)=\frac{|x|^{-1}-R^{-1}}{\epsilon^{-1}-R^{-1}}

From this, we can take a limit to see that

\mathbb{P}_x(T_\epsilon<\infty)\leq \frac{\epsilon}{|x|}

We deploy a neat trick to lift this result to a global statement about transience. Define the events that the modulus never returns to n after hitting n^3

A_n:=\{|B_t|>n\quad \forall t\geq T_{n^3}\}

Calculate

\mathbb{P}_0(A_n^c)\stackrel{\text{SMP}}{=}\mathbb{E}_0[\mathbb{P}_{B_{T_{n^3}}}(T_n<\infty)]=\mathbb{E}_0[\frac{1}{n^2}]=\frac{1}{n^2}

Applying Borel-Cantelli 1, A_n eventually holds almost surely, which certainly implies the desired result.

Strong Markov Property for BM

The Strong Markov Property is the most important result to demonstrate for any Markov process, such as Brownian Motion. It is also probably the most widely requested item of bookwork on the Part III Advanced Probability exam. I feel it is therefore worth practising writing as quickly as possible.

Theorem (SMP): Take (B_t) a standard (\mathcal{F}_t)-BM, and T an a.s. finite stopping time. Then (B_{T+t}-B_T,t\geq 0) is a standard BM independent of \mathcal{F}_T.

Proof: We write B_t^{(T)}=B_{T+t}-B_T for ease of notation. We will show that for any A\in\mathcal{F}_T and F bounded, measurable:

\mathbb{E}[1_AF(B_{T+t_1}-B_T,\ldots,B_{T+t_n}-B_T)]=\mathbb{P}(A)\mathbb{E}F(B_{t_1},\ldots,B_{t_n})

This will suffice to establish independence, and taking A=\Omega\in\mathcal{F}_t shows that B_t^T is a standard BM since (Levy), BM is uniquely characterised by its finite joint distributions.

To prove the result, we approximate discretely, and apply the Markov property.

\mathbb{E}[1_AF(B_{t_1}^{(T)},\ldots)]=\lim_{m\rightarrow\infty}\sum_{k=1}^\infty \mathbb{E}[1_{A\cap\{T\in((k-1)2^{-m},k2^{-m}]\}}F(B_{t_1}^{(k2^{-m})},\ldots)]

by bounded convergence, using continuity of F, right-continuity of B, and that T<\infty a.s. (so that 1_A=\sum 1_{A\cap \{T\in(-,-]\}})

\stackrel{\text{WMP}}{=}\lim_{m\rightarrow\infty}\sum_{k=1}^\infty \mathbb{P}[A\cap\{T\in((k-1)2^{-m},k2^{-m}]\}]\mathbb{E}F(B_{t_1},\ldots,B_{t_n})

\stackrel{\text{DOM}}{=}\mathbb{P}(A)\mathbb{E}F(B_{t_1},\ldots,B_{t_n})

which is exactly what we required.

Remarks: 1) We only used right-continuity of the process, and characterisation by joint marginals, so the proof works equally well for Levy processes.

2) We can in fact show that it is independent of \mathcal{F}_T^+, by considering T+\frac{1}{n} which is still a stopping time, then taking a limit in this as well in the above proof. For details of a similar result, see my post on Blumenthal’s 0-1 Law.