Uncountable Infinity and Cantor’s Diagonal Argument

In my last post, I talked about why infinity shouldn’t seem terrifying, and some of the interesting aspects you can consider without recourse to philosophy or excessive technicalities. Today, I’m going to explore the fact that there are different kinds of infinity. For this, we’ll use what is in my opinion one of the coolest proofs of all time, originally due to Cantor in the 19th century.

Background

Last time we talked about sets which have infinite size. In particular, a set S is called countable, if its elements can be paired up with the set of positive integers. This is equivalent to saying that you can find a way to list the elements of S, in such a way that every element comes up eventually. It might initially seem as if every set should be countable, but consider the reals between 0 and 1. If we were to put them in a list, what should the first element in the list be; and what should the second be; and so on? Is it obvious that we can make this work? But if we can’t make it work, how can possibly prove that it is impossible, because there are so many potential ways to list the elements?

The Rationals

As a warm up, consider the rationals: \mathbb{Q}=\{\frac{p}{q}:p,q\text{ integers with no common factor}\}. Is this set countable? For simplicity, let’s just think about the rationals between 0 and 1. What is the most sensible way to list them? What about starting with 0 and 1, then 1/2, then the ones with 3 as denominator, ie 1/3 and 2/3. Can we carry on with strategy? First we need to check exactly what we are doing: we are ordering the rationals by increasing size of the denominator, then within that, by increasing size of the numerator. What about 2/4? That isn’t a rational as we’ve described them, so we can just ignore it.

With this algorithm is that every such rational will turn up eventually. We’ve therefore shown that this set of rationals is countable. It’s hard to give a precise ‘formula’ for how the list works. To work out where 2/11 appears, you need to know about the prime factors of the integers less than 11, and keep track of lots of things. It turns out that 2/11 appears 35th in the list, but the easiest way to deduce this is to check up to that point in the list. The important fact is that it must appear at some point, and only once.

The Reals

Theorem (Cantor): The set of real numbers between 0 and 1 is not countable.

Proof: This will be a proof by contradiction. That means, we will assume that the set is countable, then derive a false statement. From this, we can deduce that the set cannot be countable.

Suppose it is countable. This means we can write all the reals in a list. Write them in binary, so it probably looks something like this:

0.00001011001...
0.11001011100...
0.01011101101...

and so on. Continue reading

Approaching Infinity

Last summer, I worked at and gave some lectures at the National Maths Summer School. The students submitted feedback forms, and a surprisingly large number mentioned that they would have liked to have a session about ‘infinity’. I was reminded of this by a post on an interesting blog that I’d seen linked to by, of all people, Stephen Fry. It is easy to forget, a full three years after a first university course on analysis, that the infinity had once seemed so confusing.

The problem is as much one of presentation as of mathematical content. The impression often given is that mathematical statements concerning infinity are not properly defined, or can’t be understood in a ‘real world’ setting. Unqualified and often rather misleading explanations are absolutely rife. And even some well-qualified scientists have put forward theories that are questionable at best. First we talk about some of the usual problems, and why they might not be so significant after all.

  • No-one can imagine what infinity is: I’m not sure whether this is true – I personally feel I have a reasonable idea. But even this doesn’t matter. Arguments like this often reference the fact that there are 10^{80} atoms in the universe (or something similar) and how this doesn’t even compare to infinity. This is true, but it doesn’t affect our ability to understand and make deductions about a concept. I can’t imagine what 5-dimensional space looks like, but with five co-ordinates (x,y,z,w,v) I can describe it in mathematical terms that are entirely reasonable. This allows me to start working out properties of the object even if I can’t visualise it.
  • Infinity is about philosophy: This might well stem from its appearance in popular culture (‘to infinity and beyond’) and the metaphysical (‘the Father of an infinite majesty’ etc). I would suggest that if you are worried about coming to a philosophical understanding of infinity, first you should question whether you have a true philosophical understanding of seven. I can picture seven oranges in my mind, but does that alone really explain all the seven-ness of seven? In any case, we can learn some simple rules to deal with seven (like 3+4=7) in a concrete way, and though the rules aren’t as ‘obvious’, we can do the same for infinity.
  • Infinity is not a number: Again, this is in some sense true (see below). But it doesn’t make any difference if you use it correctly. At various points in time 0 has been considered ‘not a number’, as have negative numbers. If you build up the world of complex numbers by defining the square root of -1, is this a number? As with many words, infinity means different things in different contexts. This is actually often really about the following: Continue reading

Remarkable fact about Brownian Motion #3: It is nowhere differentiable

A good problem to consider at the start of an introduction to analysis is whether continuous functions need to be differentiable on a large subset of the domain. Clearly from the definition, differentiability at a point is a stronger condition than continuity – consider the modulus function. Our intuition about what a continuous function looks like in general might suggest that a continuous can only be non-differentiable at ‘a few’ points, perhaps a set of measure 0?

But in fact there exist functions which are continuous but nowhere differentiable. The canonical example is due to Weierstrass: you create something like a fractal with a saw-tooth function as a base. Essentially, having a non-zero derivative at a point means the function is monotonic on an interval around that point, and this construction prevents this happening for any point. Eliminating the possibility that the derivative is 0 is reasonably straightforward. Informally, if you look at the function, it looks like the derivative ought to be +K or -K at every point (if it exists) where K is a constant determined by the precise details of the construction.

Having found one example, it then seems likely that the majority of continuous functions ought to be nowhere-differentiable, since you could take a limit of spiky functions in lots of ways. Proving that Brownian Motion almost surely has this property quantifies this statement.  The Wiener measure associated with BM is the most natural measure on C[0,1] and so this will indeed show that almost all continuous functions have this property. The result was first shown by Paley, Wiener and Zigmund in the 30s, and the proof here is based on that of Dvoretsky, Erdos and Kakutani (1961) as paraphrased by Peres in some excellent notes on Brownian sample paths. Note that it is reasonably straightforward to show by the Strong Markov Property and Blumenthal’s 0-1 Law that BM is not differentiable at a given point (see previous post), but it is not possible to lift this to the whole line simultaneously because of uncountability.

Theorem: Brownian Motion is nowhere-differentiable almost surely.

Proof: We restrict our attention to right-differentiability. Heuristically, being differentiable at a point gives strong restrictions on behaviour on a neighbourhood of the point, but no information about the size of that neighbourhood. The aim is to use the triangle inequality to lift the condition about the point to any form of regularity condition on the whole domain, and hope that BM doesn’t have that condition. Continue reading

Remarkable fact about Brownian Motion #2: Blumenthal’s 0-1 Law and its Consequences

Brownian motion is a martingale, so it is known that it has the Markovian property. That is, (B_{t+s}-B_s,t\geq 0) is a Brownian motion independent of \mathcal{F}_s. But in fact we can show that this is independent of \mathcal{F}_s^+=\cap_{t>s}\mathcal{F}_t, the sigma algebra that (informally) deals with events which are determined by the process up to time s and in the infinitesimal period after time s.

Is this surprising? Perhaps. This larger sigma algebra contains, for example, the existence and value of the right-derivative of the process at time s. But the events determined by the local behaviour after time s are clearly also determined by the whole process after time s, so the independence property looks likely to give a 0-1 type law, like in the proof of Kolmogorov’s 0-1 law.

Theorem: (B_{t+s}-B_s,t\geq 0) is independent of \mathcal{F}_s^+.

Proof: Take a sequence of times s<t_1<\ldots<t_k and A\in\mathcal{F}_s^+. It will suffice to show that the joint law of the new process at these times is independent of event A. So take F a bounded continuous function. The plan is to approximate B_s from above, as then we will definitely have independence from \mathcal{F}_s, and hope that we have enough machinery to carry through the statements through the limit down to s. So take s_n\downarrow s, and then: \mathbb{E}[F(B_{t_1+s}-B_s,\ldots,B_{t_k+s}-B_s)1_A]=\lim_n \mathbb{E}[F(B_{t_1+s_n}-B_{s_n},\ldots,B_{t_k+s_n}-B_{s_n})1_A] as continuity gives a.s. pointwise convergence, and we can lift to expectations by Dominated Convergence. The function in the limit on the right separates by independence as \mathbb{E}[F(_{t_1+s_n}-B_{s_n},\ldots,B_{t_k+s_n}-B_{s_n})]\mathbb{P}(A). Applying the previous argument in reserve gives that the limit of this is \mathbb{E}[F(B_{t_1+s}-B_s,\ldots,B_{t_k+s}-B_s)]\mathbb{P}(A) as desired.

In particular, this gives Blumenthal’s 0-1 Law, which states that \mathcal{F}_0^+ is trivial. This is apparent by setting s=0 in the above result, because then the process under discussion is the original process, and so \mathcal{F}_0^+ is independent of \mathcal{F}\supset \mathcal{F}_0^+.

A consequence is the following. For a BM in one dimension, let \tau=\inf\{t>0:B_t>0\}, \sigma=\inf\{t>0:B_t<0\}. By the fact that BM is almost surely non-constant, for any sample path, at least one of these is 0, and by symmetry \mathbb{P}(\tau=0)=\mathbb{P}(\sigma=0) so these are greater than or equal to 1/2. But it is easy to see that the event \{\tau=0\}\in\mathcal{F}_0^+, and so by the triviality of the sigma-field, this probability must be 1. With continuity, this means that every interval (0,\epsilon) contains a zero of the Brownian motion almost surely. Patching together on rational intervals (so we can use countable additivity) gives that BM is almost surely monotonic on no interval. A similar argument can be used to show that BM is almost surely not differentiable at t=0. For example, the existence and (conditional on existence) value of the derivative at t=0 is a trivial event, so by symmetry, either the derivative is a.s. =0, or a.s. doesn’t exist. Ruling out the former option can be done in a few ways. Predictably, in fact it is almost surely differentiable nowhere, but that is probably something to save for another post.

Hewitt-Savage Theorem

This final instalment in my exploration of exchangeability gives a stronger version of Kolmogorov’s 0-1 law, and suggests some applications. It is easy to see that the tail sigma-field is a subset of the exchangeable sigma-field. For, if A is a tail event, then it is independent of the first n random variables in the underlying sequence, so in particular, is invariant under permutations of initial segments of the sequence.

Kolmogorov’s 0-1 Law: (X_n) a sequence of independent (not necessarily iid) random variables in some probability space. Define the tail sigma-field \tau=\cap_n \sigma(X_{n+1},X_{n+2},\ldots). Then \tau is trivial; that is, \forall A\in\tau\; P(A)\in\{0,1\}.

Proof: Set \tau_n=\sigma(X_{n+1},X_{n+2},\ldots), F_n=\sigma(X_1,\ldots,X_n). Then F_n is independent of \tau_m whenever $m\geq n$. So F_n is independent of \tau for all n, hence so is \cup_n F_n, which generates the entire sigma-field F_\infty, so this is independent of \tau also. Since A\in \tau\Rightarrow A\in F_\infty trivially, the independence criterion gives P(A)=P(A\cap A)=P(A)P(A), and hence P(A)\in\{0,1\}.

Hewitt-Savage 0-1 Law: (X_n) a sequence of iid random variables. Then the sigma field of exchangeable events \mathcal{E} is trivial.

Proof: Take A\in\mathcal{E}, and approximate by A_n\in F_n, P(A\triangle A_n)\rightarrow 0 which is possible, since \cup F_n generates the whole sigma-field. Write A_n=\{(X_1,\ldots,X_n)\in B_n\} for later ease of notation. To exploit exchangeability, set \tilde{A}_n=\{X_{n+1},\ldots,X_{2n}\in B_n\}, as the permutation of RVs that sends A_n\mapsto \tilde{A}_n leaves A invariant. So P(\tilde{A}_n\triangle A)=P(A_n\triangle A)\rightarrow 0\Rightarrow P(A_n\cap \tilde{A}_n)\rightarrow P(A). But because (X) is iid (Note, this is where we use identical distributions), P(A_n\cap \tilde{A}_n)=P(A_n)P(\tilde{A}_n)=P(A_n)^2\rightarrow P(A)^2. Hence P(A)\in\{0,1\}.

Application: Given a stochastic process with iid increments, the event that a state is visited infinitely often is in the tail space of the process, however it is not in the tail space of the increments, so Kolmogorov does not apply. It is however an exchangeable event, and so occurs with probability 0 or 1.

References (for this and the related two previous posts):

Kingman – Uses of Exchangeability (1978)

Breiman – Probability, Chapter 3

Zitkovic – Theory of Probability Lecture Notes

 

An Exchangeable Law of Large Numbers

In the proof of De Finetti’s Theorem in my last post, I got to a section where I needed to show a particular convergence property of a sequence of exchangeable random variables. For independent identically distributed RVs, we have Kolmogorov’s 0-1 law, and in particular a strong law of large numbers. Does a version of this result hold for exchangeable sequences? As these represent only a mild generalisation of iid sequences, we might hope so. The following argument demonstrates that this is true, as well as providing a natural general proof of De Finetti.

Define \mathcal{E}_n=\sigma(\{f(X_1,\ldots,X_n): f\text{ symmetric, Borel}\}), the smallest sigma-field wrt which the first n RVs are exchangeable. Note that \mathcal{E}_1\supset\mathcal{E}_2\supset\ldots\supset \mathcal{E}=\cap_n\mathcal{E}_n, the exchangeable sigma-field.

So now take g(X) symmetric in the first n variables. By exchangeability E[\frac{1}{n}\sum_1^n f(X_j)g(X)]=E[f(X_1)g(X)]. Now set g=1_A, for A\in\mathcal{E}_n, and so because the LHS integrand is \mathcal{E}_n-meas. we have Z_n=\frac{1}{n}\sum_1^n f(X_j)=E[f(X_1)|\mathcal{E}_n]. So Z is a backwards martingale.

We have a convergence theorem for backwards martingales, which tells us that \lim_n n^{-1}\sum^n f(X_j) exists, and in fact = E[f(X_1)|\mathcal{E}] almost surely. Setting f(X)=1(X\leq x) gives that \lim_n\frac{\#\{X_i\leq x: i\leq n\}}{n}=F(x):=P(X_1\leq x|\mathcal{E}). We now perform a similar procedure for functions defined on the first k RVs, in an attempt to demonstrate independence.

For f:\mathbb{R}^k\rightarrow\mathbb{R}, we seek a backwards martingale, so we take sums over the n^{(k)} ways to choose k of the first n RVs. So \frac{1}{n(n-1)\ldots(n-k+1)}\sum_{I\subset[n]} f(X_{i_1},\ldots,X_{i_k}) is a backwards martingale, and hence E[f(X_1,\ldots,X_k)|\mathcal{E}]=\lim_n \frac{1}{n(n-1)\ldots(n-k+1)}\sum f(-). As before, set f(y_1,\ldots,y_k)=1(y_1\leq x_1)\ldots 1(y_k\leq x_k). Crucially, we can replace the falling factorial term with n^{-k} as we are only considering the limit, then exchange summation as everything is positive and nice to get: E[f(X_1,\ldots,X_k)|\mathcal{E}]=\lim(\frac{1}{n}\sum 1(X_1\leq x_1))\ldots(\frac{1}{n}\sum 1(X_k\leq x_k)) thus demonstrating independence of (X_n) conditional on \mathcal{E}.

So what have we done? Well, we’ve certainly proven de Finetti in the most general case, and we have in addition demonstrated the existence of a Strong Law of Large Numbers for exchangeable sequences, where the limit variable is \mathcal{E}-measurable.