Finitely many solutions

This post is aimed at secondary-school students pitched roughly at the level of the British Mathematical Olympiad. It is ostensibly about a certain class of number theory problems, but the main underlying mathematical principle is broader than this. The post is based on an in-person session I have given to students and at teachers’ conferences a few times over the past five years, but I’ve chosen to write it up at this moment because of the connection to Problem 4 on the most recent BMO1 paper, which we discuss later in the post.

Prologue

Alice and Bob make the following statements:

  • Alice: all prime numbers are odd.
  • Bob: we can’t say anything about whether prime numbers are odd or even.

These are different types of statement, one objective, one more subjective; but both are wrong. Alice is wrong because 2 is a prime, and 2 is of course even. Bob is wrong because while Alice is wrong, and the alternative statement “all prime numbers are even” is even more wrong1, there are plenty of possible statements that are true and useful, including

  • Cleo: all prime numbers are odd except 2.

and weaker versions (that may be more relevant in other contexts) like “all primes greater than 10^{2024} are odd”.

The characters now turn their attention to another family of integers, the square numbers \{1,4,9,16,25\ldots\}, and the sixth powers \{1,64, 729, 4096, 15625,\ldots\}

  • Alice: no square numbers are prime.
  • Bob: I agree.
  • Alice: most square numbers are not sixth powers.
  • Bob: hang on, but infinitely many square numbers are sixth powers?

The difficulty in the second statement is what does most mean in this context? The square of an integer n is a sixth power if and only if n is a cube, and so Alice’s statement is equivalent to the statement “most positive integers are not cubes”, which is intuitively reasonable, but would require more clarification than the analogous statement “most primes are odd”. Is it not relevant, for example, that 1/4 of the first eight positive integers are in fact cubes?

It is not impossible to formalise this notion of most in a reasonable way to permit the two statements above2.

But the bulk of this post will discuss situations that extend the first of these conversations, ie when it’s not true to say “all element of X have property Y”, but where it’s nonetheless possible to make a useful intermediate statement.

Warmup problems

I hope most readers know how to solve an equation such as

3x+4 = 2(x+7),

and indeed learning how to solve such equations forms a important milestone in mathematical education as the first example of various principles of algebra. When such equations are still novel, teachers would be advised to avoid examples such as

2x+4 = 2(x+7)\quad\text{or}\quad 2x+14 = 2(x+7),

for which the answers are, respectively, ‘no solutions’ and ‘every x satisfies this’. While they are perfectly valid equations, it means something slightly different to solve them, since the answer set has different structure and, more importantly, the method is different since it doesn’t consist of a sequence of a manipulations leading directly to the required conclusion x= \texttt{answer}.

However, I mention this in passing to clarify that even in the simplest possible family of equations (probably encountered somewhere between ages 9 and 13) we are mindful that not all equations have unique solutions.

I’ve adapted a problem from the American competition AIME:

Find all integers n\ge 1 such that \frac{(n+2)!-(n+1)!}{n!} is a prime.

One could start by writing \frac{(n+2)!-(n+1)!}{n!}=p if you wish, but in fact we only need to work with the left hand side, so it’s better to avoid this if possible. The key to this problem is to avoid the prime condition initially, and just manipulate the LHS until it simplifies as

\frac{(n+2)!-(n+1)!}{n!} = (n+1)(n+2)-(n+1)=(n+1)^2, (*)

which is never a prime, since (n+1)^2 is a square number and so, as already discussed, never a prime.

So at a meta-level, comparing with the other problem in this section, this one also begins with reversible algebraic manipulations, but does not end with a direct reduction to n = [...], and indeed the final step is not reversible at all.

Finally, we remark that the original version of the AIME problem was

Prove that for all integers n\ge 9, \frac{(n+2)!-(n+1)!}{n!} is a perfect square.

The same algebraic reduction as (*) resolves the original version without an extra step concerning primes.

To be, or not to be, a perfect square

The rest of this article will focus entirely on problems of the form: find all positive integers n such that […] is a perfect square.

As setup, we’ll always consider a function f(n), and the sequence f(1),f(2),f(3),… and the question, comment on the set of n is f(n) a square? In particular, this is a more open-ended question than for which n is f(n) a square, which requires a more exact answer. We’ll aim for one of the following possible answers:

  • f(n) is always a square;
  • f(n) is a square for infinitely many n;
  • f(n) is a square for only finitely many n;
  • f(n) is never a square.

Clearly this is not an exhaustive list of possible descriptions3 of the set under discussion, but it will do for now. The difference between the second and the third bullet point can be thought of as: “are there arbitrarily large values of n for which f(n) is a square?”

As some examples:

  • \frac{(n+2)!-(n+1)!}{n!} is always a square, because (n+1)^2 is always a square.
  • (n+2)^2-1 is never a square, because the only squares that differ by 1 are {0,1} which aren’t attainable when n\ge 1.
  • n^2+3 is a square for only finitely many n, because the only squares that differ by 3 are {1,4}.

Exercises to try yourself

  • n^8, 3n+1, 3n-1, \lfloor \sqrt{n}\rfloor, 2^n, 2^n+1, 2^n-1, 3n^3, n^2+5n+10,
  • or, now with p ranging over the set of prime numbers: 3p^3.

Some discussion of some of these follows later to avoid spoilers.

BMO1 2023, Problem 4

This problem does not have exactly the same structure as the previous exercise. A solution will follow shortly, but before starting, [spoilers] I want to emphasise that the main part of the solution involves framing this problem exactly as in the exercise above.

  • We will establish that there are only finitely many n such that n\times 2^n+1 is a square, using a classical number theory argument.
  • In doing so, we will eliminate all integers n that are at least some threshold value K from consideration (which we think of as “n large”, even though in this case it isn’t so large).
  • The actual set of solutions comes from check the remaining “small” values of n.
  • The point to emphasise is that while this third bullet is definitely an important component of a successful solution (and does feel closest to actually addressing the “find all” aspect of the problem statement), it is not really the key step, since checking a small number of possibilities is fundamentally much more straightforward than an argument to eliminate the other infinitely-many integers.

This overall structure is very common when solving Diophantine equations involving integers, even if the exact statement isn’t “find all n such that […] is a perfect square”.

Anyway, to the solution of this problem. As in many problems of this difficulty in this number theory, it’s really useful to aim to factorise and analyse the factors, remembering that they are integers! In this case, writing n\times 2^n+1=k^2 doesn’t suggest an easy factorisation of the LHS, but rewriting as n2^n = k^2-1 does now have a nice factorisation, using difference of two squares on the RHS:

n2^n = (k-1)(k+1).

Here the two factors are (k-1) and (k+1). We know nothing about k, so we know nothing intrinsic about these factors, except that they differ by 2. But now when we consider that their product is n2^n, we can start to make some serious deductions4.

  • (k-1) and (k+1) differ by two, and so have the same parity5. Then n2^n is even, and so in fact both factors are even.
  • But more than this, one factor will be a multiple of 4, while the other will not be. So all the powers of 2 within n2^n (which may include additional factors of 2 coming from n) apart from one, must be included in one of the factors.
  • The idea is that 2^n is generally much larger than n, and so this will force one of the factors to be much larger than the other factor, which is not permitted. (Recall that the factors differ by 2.)

So formally, we might write that k+1\ge 2^{n-1} and k-1\le 2n. These are inequalities not equalities because some of the factors of n could be in (k+1) not (k-1) and/or a priori, we could have k-1=2^{n-1} and k+1=2n etc. This leads to

2n+2\ge 2^{n-1} (*)

and we now have exactly the language to describe the truth of this inequality! It is only true for finitely many positive integers n. There are a number of ways to prove this, but it is important that we do provide a lower bound. One possibility is to check that (*) holds for n=5, and then prove (*) for all n\ge 5 by an inductive argument.

At this point, we have shown that n\times 2^n+1 is not a perfect square for n\ge 5, and so at this point it really is easiest just to check n=1,2,3,4 by hand, which reveals that n=2,3 work.

To link this to the background, note that this final procedure of “finding all the solutions” really had nothing in common structure-wise with “solving 3x+4 = 2(x+7)“, for which substituting in various values of x would never constitute part of a good solution.

Comments on exercises

  • 3n+1 is a square for infinitely many n, while 3n-1 is never a square, for reasons that are often framed in the language of quadratic residues.
  • 2^n is a square for infinitely many n (specifically, when n is even), while 2^n+1 is a square only when n=3, which can be seen via a similar argument to the argument above for the BMO problem. Meanwhile 2^n-1 is never a square, which can also be checked using quadratic residues.
  • 3n^3 is a square for infinitely many n (specifically, when n=3k^2), while 3p^3 is a square for only one prime p, since only one prime is divisible by 3.
  • n^2+5n+10 is only a square for finitely many n, because when n is large, we have (n+2)^2< n^2+5n+10<(n+3)^2. This style of argument is sometimes called square-sandwiching, although it can of course be deployed in contexts other than squares. Problems that turn on this method have appeared in BMO surprisingly often in the past 15 years. For this particular example, note that one does have to check a handful of values to confirm it is for finitely many but at least one n (rather than for no n).

Footnotes

  1. of course, writing even more wrong is in itself making a nuanced comment about the parity of the prime numbers. ↩︎
  2. given a set A\subseteq \mathbb{N}, a natural formalisation of the notion that ‘most integers are not in A’ would be that \lim_{n\to\infty} \frac{|A\cap\{1,\ldots,n\}|}{n}=0. In other words, the proportion of elements of A amongst the first n integers vanishes as n\to\infty. This is certainly true when A is the set of cubes.
    There are many sets for which this limit does not exist, including quite natural sets like the integers whose decimal representation starts with a 1, and so a full treatment of this notion of asymptotic density would require more than a footnote. ↩︎
  3. in particular, we might wish to draw a distinction between f(n) is a square for infinitely many n and the stronger statement f(n) is a square for all-but-finitely many n, and in some applications that difference is crucial. ↩︎
  4. A common error is to make some reasonable-sounding assumption about how the factors of n\times 2^n are split between (k-1) and (k+1), for example by assuming that k-1=n and k+1=2^n because this is visually a natural way to pair them up. ↩︎
  5. parity refers to whether a number is odd or even. ↩︎

Borel-Cantelli Lemmas

I’m currently lecturing my first course at King’s, which builds measure theory from the ground up to the construction of the Lebesgue integral, along with some more probabilistic topics. In this second week, we have been discussing various matters related to the construction of sigma-algebras. It took me slightly by surprise that this ends up being the natural moment during an undergraduate’s progression through probability to introduce the Borel-Cantelli lemmas. It’s not hard to imagine a pedagogical schedule in which they appear earlier, but equally, one benefits from seeing them with a bit more mathematical experience, even if they are only very loosely related to the construction of the Lebesgue integral!

Anyway, I noticed that I have not written about these results on the blog before. There are a couple of neat applications that lie slightly beyond the remit (and time allowance) of the lecture course, so they are presented at the end.

Context – events holding eventually and infinitely often

Recall that (in the notation of probability spaces) a sigma-algebra \mathcal F provides a set of events A\subseteq \Omega for which it is meaningful and consistent to define a probability measure. This is extremely uncontroversial in the setting where the set of outcomes \Omega is finite, but thornier in the setting where \Omega is uncountable, for example [0,1] or \mathbb{R}. The key axiom for sigma-algebras involves countable unions. If (A_n)_{n\ge 1} is a sequence of events in \mathcal F, we must also have \bigcup_{n\ge 1}A_n\in\mathcal F.

Since the axioms also demand that \mathcal F is closed under taking complements, it only takes a few steps to show that countable intersections are also valid, ie \bigcap_{n\ge 1}A_n\in\mathcal F also. It is uncontroversial to note that

\bigcap_{n\ge 1}A_n \subseteq \bigcup_{n\ge 1}A_n.

The main initial motivation for considering the events \{A_n\text{ infinitely often}\},\{A_n\text{ eventually}\} are that they lie between the intersection and the union in terms of containment.

  • ‘Infinitely often’ is relatively uncontroversial, just meaning ‘for infinitely many values of n’.
  • ‘Eventually’ is less clear on an initial reading. Perhaps ‘eventually always’ would be better? [And closer to the title of this blog!] The meaning is ‘always, except for finitely many n’, which implies that there is some threshold N such that it holds for all n>N.
  • In set notation, infinitely often (or i.o.) means that \bigcup_{k\ge n}A_k must hold for every n, and so the event \{A_n\text{ i.o.}\}=\bigcap_n\bigcup_{k\ge n}A_k.
  • Similarly, eventually means that there must be an n for which \bigcap_{k\ge n}A_k holds, so the event \{A_n\text{ eventually}\}=\bigcup_n\bigcap_{k\ge n}A_k.

We end up with the richer containment sequence

\bigcap_{n\ge 1}A_n\subseteq \{A_n\text{ eventually}\}\subseteq \{A_n\text{ i.o.}\}\subseteq \bigcup_{n\ge 1}A_n.

These individual containment relations may not be obvious a priori in this form. Probably the most clear is the central containment, that if an A_n holds for all except finitely many n, then it holds for infinitely many n. The right-most says that if it holds infinitely often, then it holds at least once. For the left-most containment, note that \bigcap_{n\ge 1}A_n is literally the same as the n=1 case of \bigcap_{k\ge n}A_k, so taking a union over other values of n only increases the set.

It’s also worth thinking about the complements. Note that \left(\bigcap_{n\ge 1}A_n\right)^c = \bigcup_{n\ge 1} A_n^c and \left(\bigcup_{n\ge 1}A_n\right)^c=\bigcap_{n\ge 1}A_n^c. Versions of these apply to eventually/i.o. also. In particular

\{A_n\text{ i.o.}\}^c = \{A_n\text{ for only finitely many }n\} = \{A_n^c\text{ eventually}\}.

and \{A_n\text{ eventually}\}^c = \{A_n^c\text{ for infinitely many }n\}.

So in any calculation or application, there is flexibility to study (A_n^c) is this is more convenient. As we will see, it makes sense to focus on whichever of (A_n),(A_n^c) has vanishing probabilities as n\to\infty.

Comment: It is standard to write \limsup_{n}A_n= \bigcap_{n\ge 1}\bigcup_{k\ge n}A_k, and \liminf_n A_n=\bigcup_{n\ge 1}\bigcap_{k\ge n}A_k. I am happy to admit I get these confused with non-vanishing probability, and prefer to stick to eventually/i.o. or the full union/intersection statements for safety!

Borel-Cantelli Lemmas

The Borel-Cantelli lemmas establish some conditions under which the event \{A_n\text{ i.o.}\} almost surely holds, or almost surely doesn’t hold. We’ll state them now:

BC1: If \sum_{n\ge 1}\mathbb{P}(A_n)<\infty, then \mathbb{P}(A_n\text{ i.o.})=0.

BC2: If \sum_{n\ge 1}\mathbb{P}(A_n)=\infty and the events (A_n) are independent, then \mathbb{P}(A_n\text{ i.o.})=1.

Proof of BC1: Since \{A_n\text{ i.o.}\}\subseteq \bigcup_{k\ge n}A_k for every $n$, we have

\mathbb{P}(A_n\text{ i.o.}) \le \sum_{k\ge n}\mathbb{P}(A_k)\;\stackrel{n\to\infty}\longrightarrow\; 0,

since \sum \mathbb{P}(A_n)<\infty.

Comment: we could also introduce the random variable N which counts how many events A_n occur, ie N=\sum_{n\ge 1}\mathbf{1}_{A_n}. Then \{A_n\text{ i.o.}\}=\{N=\infty\}, and the given condition is equivalent to \mathbb{E}[N]<\infty, which certainly implies \mathbb{P}(N=\infty)=0. Indeed, it’s reasonable to think of BC1 as a first-moment result, with BC2 as (sort of) a second-moment result, and this is reflected in the comparative complexity of the two proofs.

Proof of BC2: As discussed, it is equivalent to show that \mathbb{P}(A_n^c\text{ eventually})=0. To do this, we study the intersections of the complements, whose probabilities are tractable using independence:

\mathbb{P}\left( \bigcup_{k\ge n}A_k^c\right) = \prod_{k\ge n}\mathbb{P}(A_k^c)=\prod_{k\ge n}(1-\mathbb{P}(A_k))

\le \prod_{k\ge n}\exp(-\mathbb{P}(A_k))=\exp(-\sum_{k\ge n}\mathbb{P}(A_k)).

We have assumed that \sum_{k\ge n}\mathbb{P}(A_k)=\infty for each n, and so we obtain

\mathbb{P}\left(\bigcup_{k\ge n}A_k^c\right)=0,\quad\forall n\ge 1.

But since (\bigcap_{k\ge n}A_k^c)_{n\ge 1} is an increasing sequence of events, so taking a union over n is particularly natural. Precisely, by continuity of measure, we have

\mathbb{P}\left(\bigcup_{n\ge 1}\bigcap_{k\ge n}A_k^c\right) = \lim_{n\to\infty}\mathbb{P}\left( \bigcap_{k\ge n}A_k^c\right)=0.

Comment: the independence condition is certainly necessary, as otherwise one can have situations where A_1\supseteq A_2\supseteq\cdots. For example, with U\sim \mathrm{Unif}[0,1], consider A_n:=\{U\le \frac1n\}, for which \mathbb{P}(A_n)=\frac1n, whose sum diverges, while clearly \mathbb{P}(A_n\text{ i.o.})=\mathbb{P}(U=0)=0.

Classic examples: the famous ‘infinite monkey theorem’, relatively well-known outside the realm of theoretical probability, asserts that a monkey typing randomly at a keyboard for long enough will eventually type the complete works of William Shakespeare. Other standard examples involve independent (A_n) with probabilities decaying at various speeds. The latter do have some overlap with the slightly less standard examples presented below.

Example – generating random permutations

Consider a sequence of random permutations (\sigma_n)_{n\ge 1} where \sigma_n is a permutation of [n]. We construct these iteratively as follows: \sigma_1 is the identity (ie the only permutation of one element!); then for each n\ge 2, define \sigma_n from \sigma_{n-1} by inserting element n at a uniformly-chosen position in \sigma_{n-1}.

We ask the question whether this gives a well-defined limiting permutation \sigma_\infty on \mathbb{N}. A necessary condition for this is that the first element \sigma_n(1) is eventually constant. But in fact the events \{\sigma_n(1)=n\} are independent, and each occurs with probability 1/n, and so

\mathbb{P}(\text{first element changes i.o.})\ge \mathbb{P}(\sigma_n(1)=n\text{ for infinitely many }n)=1,

by BC2.

On reflection, each \sigma_n has the uniform distribution on \Sigma_n, permutations of [n]. So our result is consistent with the idea that one cannot construct a uniform random permutation on an infinite set.

Now, alternatively, suppose the new element is added at place \max(n-X_n,1) where (X_2,X_3,\ldots) are IID \mathrm{Geometric}(q) random variables. (The truncation is just to prevent pathologies like trying to add the new entry in ‘place -6’.) Then

\mathbb{P}(\sigma_n(1)=n)=(1-q)^{n-2},

and since \sum_n (1-q)^{n-2}<\infty, BC1 shows that the first element (\sigma_1(1),\sigma_2(1),\sigma_3(1),\ldots) changes only finitely often. A similar argument applies for the second element, and the third element, and so on. Consequently, almost surely we have a well-defined pointwise limit permutation \sigma_\infty. This is known as the Mallows Permutation on the integers, and is one of the more popular models for a random permutation on an infinite set, with lots of interesting properties. I supervised an undergraduate research project on this model over Summer 2022, and will write more when time allows.

Example – well-separated random integers

The final example is a toy version of a construction I’ve been working on recently regarding subsets of the vertices of a graph which ‘trap’ random walk on that graph.

Suppose we are trying to colour some positive integers randomly, so that the coloured integers are infinite, but also asymptotically sparse. We also do not want pairs of coloured integers to be close together. Let’s say that two integers m,n are close if |m-n|\le \sqrt{\max(m,n)}.

Now, suppose we colour each integer n green independently with probability \frac1n. Then, whenever we have two integers m,n which are close, and coloured green, we re-colour them both red.

We would like to prove that infinitely many integers are coloured green. Let C_n:={n\text{ coloured with some colour}}. Using BC2, we know that almost surely, infinitely many are coloured with some colour, and it seems implausible that many would be close. However, the events G_n={n\text{ coloured green}} are not independent, so one cannot apply BC2 directly to these. However, we can study instead R_n={n\text{ coloured red}}. We can bound these probabilities as

\mathbb{P}(R_n)\le\frac{1}{n}\frac{2\sqrt{n}}{n-\sqrt{n}} = \frac{2+o(1)}{n^{3/2}},

and apply BC1 to show \mathbb{P}(R_n\text{ i.o.})=0. Then, since \{G_n\text{ i.o.}\}\supseteq \{C_n\text{ i.o.}\}\setminus{R_n\text{ i.o.}}, we obtain \mathbb{P}(G_n\text{ i.o.})=1, as required.