Branching Processes and Dwass’s Theorem

This is something I had to think about when writing my Part III essay, and it turns out to be relevant to some of the literature I’ve been reading this week. The main result is hugely helpful for reducing a potentially complicated combinatorial object to a finite sum of i.i.d. random variables, which in general we do know quite a lot about. I was very pleased with the proof I came up with while writing the essay, even if in the end it turned out to have appeared elsewhere before. (Citation at end)

Galton-Watson processes

A Galton-Watson process is a stochastic process describing a simple model for evolution of a population. At each stage of the evolution, a new generation is created as every member of the current generation produces some number of `offspring’ with identical and independent (both across all generations and within generations) distributions. Such processes were introduced by Galton and Watson to examine the evolution of surnames through history.

More precisely, we specify an offspring distribution, a probability distribution supported on \mathbb{N}_0. Then define a sequence of random variables (Z_n,n\in\mathbb{N}) by:

Z_{n+1}=Y_1^n+\ldots+Y_{Z_n}^n,

where (Y_k^n,k\geq 1,n\geq 0) is a family of i.i.d. random variables with the offspring distribution Y. We say Z_n is the size of the nth generation. From now on, assume Z_0=1 and then we call (Z_n,n\geq 0) a Galton-Watson process. We also define the total population size to be

X:=Z_0+Z_1+Z_2+\ldots,

noting that this might be infinite. We refer to the situation where X<\infty finite as extinction, and can show that extinction occurs almost surely when \mathbb{E}Y\leq 1, excepting the trivial case Y=\delta_1. The strict inequality parts are as you would expect. We say the process is critical if \mathbb{E}Y=1, and this is less obvious to visualise, but works equally well in the proof, which is usually driven using generating functions.

Total Population Size and Dwass’s Theorem

Of particular interest is X, the total population size, and its distribution. The following result gives us a precise and useful result linking the probability of the population having size n and the distribution of the sum of n RVs with the relevant offspring distribution. Among the consequences are that we can conclude immediately, by CLT and Cramer’s Large Deviations Theorem, that the total population size distribution has power-law decay in the critical case, and exponential decay otherwise.

Theorem (Dwass (1)): For a general branching process with a single time-0 ancestor and offspring distribution $Y$ and total population size $X$:

\mathbb{P}(X=k)=\frac{1}{k}\mathbb{P}(Y^1+\ldots+ Y^k=k-1),\quad k\geq 1

where Y^1,\ldots,Y^k are independent copies of Y.

We now give a proof via a combinatorial argument. The approach is similar to that given in (2). Much of the literature gives a proof using generating functions.

Proof: For motivation, consider the following. It is natural to consider a branching process as a tree, with the time-0 ancestor as the root. Suppose the event \{X=k\} in holds, which means that the tree has k vertices. Now consider the numbers of offspring of each vertex in the tree. Since every vertex except the root has exactly one parent, and there are no vertices outside the tree, we must have Y^1+\ldots+Y^k=k-1 where Y^1,\ldots,Y^k are the offspring numbers in some order. However, observe that this is not sufficient. For example, if Y^1 is the number of offspring of the root, and k\geq 2, then we must have Y^1\geq 1. Continue reading

How to take a Passport Photograph – Part 2

Recall that by the end of the last post, we had given a complete strategy for the (slightly degenerate) stochastic control problem that modelled how to select a photo from a photobooth. The strategy was determined by an infinite sequence, defined inductively by:

a(1)=1,\quad a(n+1)=a(n)\left[1-\frac{a(n)}{2}\right].

At the very end of the post, I gave a selection of highly non-rigorous justifications for why

a(n)\approx \frac{2}{n} for large n.

After trying a bit more carefully, I think I can show the following:

a(n)=\Theta(\frac{2}{n}).

More precisely, eventually

a(n)\in [\frac{2}{n}-g(n),\frac{2}{n}+h(n)],

where g(n)=h(n)=2n^{-3/2}, so in particular g(n),h(n)=o(\frac{1}{n}).

To set this up, just to avoid having ‘2’s everywhere, we rescale the a(n)s by 1/2, so that the recursive definition is now:

a(n+1)=f(a(n)):=a(n)\left[1-a(n)\right].

I will show that for suitably large n, (though in practice this won’t be very large at all)

a(n)\in[\frac{1}{n}-n^{-3/2},\frac{1}{n}+n^{-3/2}]

\Rightarrow\quad a(n+1)\in[\frac{1}{n+1}-(n+1)^{-3/2},\frac{1}{n+1}+(n+1)^{-3/2}].

It will then suffice to show that the left hand side of this deduction holds for some n.

To show that

f(\frac{1}{n}-g(n))\geq \frac{1}{n+1}-g(n+1),

it suffices to verify that:

\frac{1}{n^2(n+1)}\leq g(n+1)-(1-\frac{2}{n})g(n)-g(n)^2.

When g(n)=n^{-3/2}, we can bound using the difference of two squares:

(\frac{1}{(n+1)^{3/2}}-\frac{n-2}{n^{5/2}})=(\frac{1}{(n+1)^3}-\frac{(n-2)^2}{n^5})/(\frac{1}{(n+1)^{3/2}}+\frac{n-2}{n^{5/2}})

\geq \frac{\frac{n^4-5n^3}{(n+1)^3n^5}}{2n^{3/2}} \stackrel{n\geq 10}{\geq}\frac{n^4}{4n^{11/2}}=\frac{1}{4n^{5/2}}

so for n\geq 16

\geq\frac{1}{n^3}\geq \frac{1}{n^2(n+1)}.

A similar calculation, which is relatively straightforward to perform and a bit of a hassle to write up into WordPress, gives the corresponding upper bound. It remains to check that

a(16)\in [\frac{1}{16}-\frac{1}{64},\frac{1}{16}+\frac{1}{64}],

and a direct computation (I used MATLAB) confirms this.

How to take a Passport Photograph – Part 1

As seems to be case whenever you start somewhere new, I’ve needed an almost infinite supply of passport-sized photographs recently. The university, my college, my department and of course the Chinese immigration authorities all wanted a record of my beautiful features. Anyway, as a result of all of this interest, I was in the do-it-yourself photo booth WHSmiths in Wandsworth getting some more the other day. The first attempt looked fine, but the machine offered me the possibility of trying again, up to twice if I wanted. This seemed like a win-win situation, so I said yes, not realising that the one I already had would not be kept ‘in the bag’. The second attempt looked somewhat startled, a pose that runs in my family, but not wanting to risk the possibility of a disastrous third attempt (and the financial penalty of having to do the whole operation again) I confirmed that I was happy and made do with the result. Naturally, the question that struck me: what is the optimal strategy for such a situation? (Assuming that, unlike me, you knew the rules from the beginning)

Mathematical model and choices

Let’s formulate this mathematically. Suppose there are n possible trials, corresponding to iid random variables X_1,\ldots,X_n. (Note that this assumes that your ‘performance’ does not improve or otherwise change during the process. Perhaps not a reasonable assumption in some contexts?) After trials X_1,\ldots,X_k have been observed, you have to choose whether to accept the value X_k as your ‘final answer’, or whether to continue.

The first key decision is: what distribution should the X_is have? Since in the original problem there isn’t a natural metric for quality, let’s assume that the X_is represent some well-defined quantitative utility, distributed as a uniform [0,1] random variable. Perhaps a normal random variable might be a more realistic model, but I can solve it in this case, so let’s stick to this for now. In addition, for the sake of making the eventual answer more simple, let’s say that 0 is the best quality and 1 is the worst. That is, we are looking for a strategy that stops the process at a time T so as to minimise \mathbb{E}X_T.

Finding an optimal strategy

The key observation is the following. In words, if we reject X_1, we can forget about its value as that is independent of \{X_2,\ldots,X_n\} which is now all that remains to base future judgments on. We return to the original problem with one fewer trial. In more mathematical notation, conditional on \{T\neq 1\}, T is independent of X_1.

The following argument assumes that an optimal strategy exists, which is not ideal, but can easily be justified. For now though, we proceed relatively informally by induction on n.

Let S be the stopping time for the optimal strategy on X_2,\ldots,X_n which we assume exists by induction. It is ‘obvious’ that the optimal strategy T for X_1,\ldots,X_n should be the following:

  • T=1 iff X_1<a(n), where this is a deterministic quality with dependence only on n.
  • Conditional on T\neq 1, take T=S.

From this alone, we can calculate \mathbb{E}X_T.

\mathbb{E}X_T=\mathbb{E}X_S1_{T=S}+\mathbb{E}X_T1_{T\neq S}

= (1-\mathbb{P}(T=1))\mathbb{E}X_S+\mathbb{E}X_1 1_{X_1<a(n)}

= (1-a(n))\mathbb{E}X_S +\frac12 a(n)^2.

This is minimised precisely when a(n)=\mathbb{E}X_s. We conclude that the optimal strategy, as of course we might well expect, is to take T=1 precisely if X_1 is less than the expected result of applying the optimal strategy to the remaining random variables.

By extension, we have a(n+1)=\mathbb{E}X_T, and so

a(n+1)=a(n)\left[1-\frac{a(n)}{2}\right]. (*)

The first few values are:

a(1)=1,\quad a(2)=\frac12,\quad a(3)=\frac38,\quad a(4)=\frac{39}{128},\quad\ldots

Behaviour of a(n)

The first question is: as n grows large, does a(n)\rightarrow 0? Well this isn’t too hard: the recursive definition (*) confirms that the sequence $a(1),a(2),\ldots$ is (strictly) decreasing, and so has a limit, which must be a fixed point of the equation (*). The only such fixed point is 0.

The second question is: what is the asymptotic behaviour of a(n) for large n? A quick run on MATLAB, or examination of the equation (*) suggests that

a(n)\approx \frac{2}{n}

should describe the behaviour well for large n. My basic attempts to verify this were initially unsuccessful, but I felt fairly sure that this should be true in some metric sense because of the following highly non-rigorous but nonetheless convincing idea.

Claim: a_n=\frac{2}{n}+O(n^{-3}) satisfies (*).

Why? Well, then:

\frac{2}{n+1}-\left[(\frac{2}{n}+O(n^{-3}))-\frac12(\frac{2}{n}+O(n^{-3}))^2\right]

=\frac{2}{n+1}-\frac{2}{n}+\frac{2}{n^2}+O(n^{-3})

=\frac{2}{n^2(n+1)}+O(n^{-3})=O(n^{-3}).

This proves the claim, but none of the = signs are really especially meaningful here. Perhaps there is a really slick way to tie this up that I’ve missed? In any case, I will save my own slightly involved method for a new post.

Markovian Excursions

In the previous post, I talked about the excursions of a Brownian motion. Today I’m thinking about how to extend these ideas to more general Markov chains. First we want to rule out some situations. In particular, we aren’t hugely interested in discrete time Markov chains. The machinery is fairly well established for excursions, whether or not the chain is transient. Furthermore, if the state space is discrete, as for a Poisson process for example, the discussion is not hugely interesting either. Remember that the technical challenges in the constructions of local time arise because of the Blumenthal 0-1 law property that Brownian motion visits 0 infinitely often in the small window after the start time. We therefore want the process to be regular at the point of the state space under discussion. This is precisely the condition described above for BM about 0.

Why is it harder in general?

The informal notion of a local time should transfer to a more general Markov chain, but there are some problems. Firstly, to define something in terms of an integral is not general enough. The state space E needs some topological structure, but any meaningful definition must be in terms of functions from E into the reals. There were also all sorts of special properties of Brownian motion, including the canonical time-space rescaling that came in handy in that particular case. It turns out to be easiest to consider the excursion measure on a general Markov chain through its Laplace transform.

Definition and Probabilistic interpretations

The resolvent is the Laplace transform of the transition probability P_t(x,A), viewed as an operator on functions f:E\rightarrow \mathbb{R}.

R_\lambda f(x):=\mathbb{E}_x\left[\int_0^\infty e^{-\lambda t}f(X_t)dt\right]=\int_0^\infty e^{-\lambda t}P_tf(x)dt.

We can interpret this in terms of the original process in a couple of ways which may be useful later. The motivation is that we would like to specify a Poisson process of excursions, for which we need to know the rate. We hope that the rate will in fact be constant, so it will in fact to suffice to work out the properties of the expected number of excursions (or whatever) up to various random times, in particular those given by exponential RVs.

So, we take \zeta\sim\text{Exp}(\lambda) independent of everything else, and assume that we ‘kill the chain’ at time \zeta. Then, by shuffling expectations in and out of the integral and separating independent bits, we get:

R_\lambda f(x)=\mathbb{E}_x\int_0^\zeta f(X_s)ds = \frac{1}{\lambda}\mathbb{E}_xf(X_\zeta).

As in the Brownian local time description

R_\lambda 1_A(x)=\mathbb{E}(\text{time spent in }A\text{ before death at time }\zeta_\lambda).

Markovian property

We want to show that excursions are Markov, once we’ve sorted out what an ‘excursion’ actually means in this context. We do know how to deal with the Markovian property once we are already on an excursion. It is relatively straightforward to define an extension of the standard transition probability operator, to include a condition that the chain should not hit point a during the transition. That is

_aP_t(x,A):= \mathbb{P}_x(X_t\in A\cap H_a>t).

This will suffice to define the behaviour once an excursion has started. The more complicated bit will be the entrance law n_t(A), being the probability of arriving at A after time t of an excursion. To summarise, as with BM, all the technical difficulties with an excursion happen at the beginning, ie bouncing around the start point, but once it is ‘up-and-running’, its path is Markovian and controlled by _aP_t.

Marking

The link between the resolvent and the excursions, is provided as in the Brownian case, by supplying a PPP of marks at uniform rate \lambda to real time. This induces a mark process on excursions, weighted by an (exponential) function of excursion length. We make no distinction between an excursion including one mark or many marks. Then the measure on marked excursion is, in a mild abuse of notation:

n_\lambda=(1-e^{-\lambda\zeta(f)})\cdot n.

We compare with the Laplace transform n_\lambda(dx)=\int_0^\infty e^{-\lambda t}dtn_t(dx) using a probabilistic argument.

We can apply the measure to a function in the usual way: \lambda n_\lambda(1_A) is the measure of those excursions for which the first mark occurs in A. So by taking A=E, we get

\lambda n_\lambda(1)=\text{ Excursion measure }=\int_U n(df)(1-e^{-\lambda\zeta(f)}).

We have therefore linked the exponential mark process on excursion measure with the Laplace transform of the entrance law. So in particular:

\frac{\lambda n_\lambda(A)}{\lambda n_\lambda(1)}=\mathbb{P}(\text{first mark when in }A)=\int_0^\infty \lambda e^{-\lambda t}P_t(0,A)dt=\lambda R_\lambda 1_A(0).

The resolvent is relatively easy to calculate explicitly, and so we can find the Laplace transform n_\lambda(A). From this it is generally possible to invert the transform to find the entrance law n_t.

References

A Guided Tour Through Excursions – L. C. G. Rogers.

This pair of posts is very much a paraphrase of chapters 3 and 4 of this excellent text. The original can be found here (possibly not open access?)

Brownian Excursions and Local Time

I’ve been spending a fair bit of time this week reading and thinking about the limits of various combinatorial objects, in particular letting the number of vertices tend to \infty in models of random graphs with various constraints. Perhaps predictably, like so many continuous stochastic objects, yet again the limiting ‘things’ turn out to be closely linked to Brownian Motion. As a result, I’ve ended up reading a bit about the notion of local time, and thought it was sufficiently elegant even by itself to justify a quick post.

Local Time

In general, we might be interested in calculating a stochastic integral like

\int_0^t f(B_s)ds.

Note that, except in some highly non-interesting cases, this is a random variable. Our high school understanding of Riemannian integration encourages thinking of this as a ‘pathwise’ integral along the path evolving in time. But of course, that’s orthogonal to the approach we start thinking about when we are introduced to the Lebesgue integral. There we think about potential values of the integrand, and weight their contribution by the (Lebesgue) measure of the subset of the domain in which they appear.

Can we do the same for the stochastic integral? That is, can we find a measure which records how long the Brownian Motion spends at a point x? This measure will not be deterministic – effectively the stochastic behaviour of BM will be encoded through the measure rather than the argument of the function.

The answer is yes, and the measure in question is referred to as local time. More formally, we want

\int_0^t f(B_s)ds=\int_\mathbb{R}f(x)L(t,x)dx. (*)

where the local time L(t,x) is a random process, increasing for fixed x. Informally, one could take

\partial_t L(t,x) \propto 1(B_t=x)

but clearly in practice that won’t do at all for a definition, and so instead we use (*). In the usual way, if we want (*) to hold for all reasonably nice functions f, it suffices to check it for the indicator functions of Borel sets. L(t,.) is therefore often referred to as occupation density, while L(.,A) is local time.

Local Time as natural index for Excursions

An excursion, for example of Brownian Motion, is a segment of the path that has zero value only at its endpoints. Alternatively, it is a maximal open interval of time such that the path is away from 0. We want to specify the measure on these excursions. Here are some obvious difficulties.

By Blumenthal’s 0-1 law, BM started from zero hits zero infinitely often in any time interval [0,e], so in the same way that there is no first positive rational, there is no first excursion. We could pick the excursion occurring in progress at a fixed time t, but this is little better. Firstly, the resultant measure is size-biased by the length of the excursion, and more importantly, the proximity of t to the origin may be significant unless we know of some memorylessness type of property to excursions.

Local time allows us to solve these problems. We restrict attention to L_t:=L(t,0), the occupation density of 0. Let’s think about some advantages of indexing excursions by local time rather than by the start time:

  • The key observation is that local time remains constant on excursions. That is, if we are avoiding 0, the local time at 0 cannot grow because the BM spends no time there!
  • If we use start time, then we have a countably infinite number of small excursions accumulating close to 0, ie with very small start time. However, local time increases rapidly when there are lots of small excursions. Remember, lots of small excursions means that the BM hits 0 lots of times. So local time grows quickly through the annoying bits, and effectively provides a size-biasing for excursions that allows us to ignore the effects of the ‘Blumenthal excursions’ near time 0.
  • When indexed by time, excursions might be Markovian, in the sense that subsequent excursions (and in particular their lengths) are independent of past excursions.This is certainly not the case if you index by start time! If an excursion starts at time t and has length u, then the ‘next’ excursions, in as much as that makes sense, must surely start at time t+u.

We know there are only countably many excursions, hence there are only countably many local times which pertain to an excursion. This motivates considering the set of excursions as a Poisson Point Process on local time. Once you’ve had this idea, everything follows quite nicely. Working out the distribution of the constant rate (which is a measure on the set of excursions) remains, but essentially we now have a sensible framework for tracking the process of excursions, and from this we can reconstruct the original Brownian Motion.