I am aiming to write a short post about each lecture in my ongoing course on Random Graphs. Details and logistics for the course can be found here.
In this third lecture, we made our first foray into the scaling regime for G(n,p) which will be the main focus of the course, namely the sparse regime when . The goal for today was to give a self-contained proof of the result that in the subcritical setting
, there is no giant component, that is, a component supported on a positive proportion of the vertices, with high probability as
.
More formally, we showed that the proportion of vertices contained within the largest component of vanishes in probability:
The argument for this result involves an exploration process of a component of the graph. This notion will be developed more formally in future lectures, aiming for good approximation rather than bounding arguments.
But for now, the key observation is that when we ‘explore’ the component of a uniformly chosen vertex outwards from v, at all times the number of ‘children’ of v which haven’t already been considered is ‘at most’
. Since, for example, if we already know that eleven vertices, including the current one w are in C(v), then the distribution of the number of new vertices to be added to consideration because they are directly connected to w has conditional distribution
.
Firstly, we want to formalise the notion that this is ‘less than’ , and also that, so long as we don’t replace 11 by a linear function of n, that
.
Couplings to compare distributions
A coupling of two random variables (or distributions) X and Y is a realisation on the same probability space with correct marginals, that is
We saw earlier in the course that we could couple G(n,p) and G(n,q) by simulating both from the same family of uniform random variables, and it’s helpful to think of this in general: ‘constructing the distributions from the same source of randomness’.
Couplings are a useful notion to digest at this point, as they embody a general trend in discrete probability theory. Wherever possible, we try to do as we can with the random objects, before starting any calculations. Think about the connectivity property of G(n,p) as discussed in the previous lecture. This can be expressed directly as a function of p in terms of a large sum, but showing it is an increasing function of p is essentially impossible by computation, whereas this is very straightforward using the coupling.
We will now review how to use couplings to compare distributions.
For a real-valued random variable X, with distribution function , we always have the option to couple with a uniform U(0,1) random variable. That is, when
, we have
, where the inverse of the distribution function is defined (in the non-obvious case of atoms) as
Note that when the value taken by U increases, so does the value taken by . This coupling can be used simultaneously on two random variables X and Y, as
, to generate a coupling of X and Y.
The total variation distance between two probability measures is
,
with supremum taken over all events in the joint support S of . This is particularly clear in the case of discrete measures, as then
(Think of the difference in heights between the bars, when you plot simultaneously as a bar graph…)
The total variation distances records how well we can couple two distributions, if we want them to be equal as often as possible. It is therefore a bad measure of distributions with different support. For example, the distributions and
are distance 1 apart (the maximum) for all values of n. Similarly, the uniform distribution on [0,1] and the uniform distribution on
are also distance 1 apart.
When there is more overlap, the following result is useful.
Proposition: Any coupling of
satisfies
, and there exists a coupling such that equality is achieved. Continue reading
