BMO1 2023

The first round of the British Mathematical Olympiad was marked in early December. Belatedly, here are some thoughts of the problems. These aren’t supposed to be official solutions, and some of them are not in fact solutions at all. Students reading this in future years are advised, as always, that such commentaries are normally more valuable after attempting and digesting the problem yourself first.

Note that the copyright to the problems is retained by BMO, and I’m reproducing here with permission. The paper can be found in its original format here.

Problem 1

I only have a few brief comments on this problem.

When you are asked to count some class of objects in a problem, it is common to make progress by rephrasing or recharacterising the objects you are trying to count. Often the recharacterised object may be more naturally suited to enumeration. As a slogan: do some thinking or apply some logic first, before starting counting.

However, when counting anything at all, make sure you read the question very carefully! There are some types of problem where misreading the problem will become obvious later. For example:

  • In a ‘prove that X is true’ question, you might find examples which fit your misreading, but where X is clearly not true!
  • In a geometry question, an accurate figure may be impossible if you assume the wrong pair of angles are equal, or you may end up proving something absurd, eg all triangles are equilateral.

But counting problems do not have this quality! Unlike a proof-based question, it always works as a problem to say count the number of […]. Whether it’s a good problem is another matter. Indeed, whether it is possible to get a nice answer may be hard to say. There are many natural combinatorial objects indexed by n which can’t be enumerated in closed-form (meaning as a single expression involving a combination of standard functions).

It would be a real shame to drop significant marks on this approachable P1 by instead answering (or wasting time struggling to answer) a different problem that carries no credit.

Problem 2

In the IMO programme, we discuss how the required conclusion of a problem might not be the most natural insight about the setup. Instead, the required conclusion might instead follow quickly from a much more natural fact which is not discussed at all in the problem statement. Especially in the context of geometry, this is often the crux of a good problem. Since we aren’t allowed to ask “say something interesting about this setup”, a well-structured problem might require the solver to investigate the setup and discover something interesting without prompting.

In this case, the investigation involves exploring the given recursion (for a_n in terms of a_{n-1},a_{n-2}) and noticing that it can be converted into a recursion for the increments a_n-a_{n-1} in terms of a_{n-1}-a_{n-2}. What makes this clever is that it works nicely but differently in the two cases. Note that because the conclusion involves ‘consecutive integers’ (ie where the increment is equal to 1), this is a clue that this is a good line of attack.

Specifically, we get a_n-a_{n-1}=a_{n-1}-a_{n-2} or a_n-a_{n-1}=2(a_{n-1}-a_{n-2}).

There has been no reference to a_{2023},a_{2024} or to the fact that the sequence is integer-valued, but now one can bring that back into focus. If a_{2024}-a_{2023}=1, this means that a_1-a_0=2^{-k} for some k=0,1,\ldots,2023. But a_0,a_1 are integers, and so the only option is a_1-a_0=1.

Problem 3

This is an excellent BMO1 easy geometry problem, and should be near the top of the list for students in future years who are practising this subject while preparing.

This is a good exercise in avoiding mindless angle-chasing. In general, most geometry problems at this difficulty level can be solved just by identifying pairs of equal angles, possibly with an occasional use of something like 180-\angle A or 90-2\angle A if there’s a clear geometrical meaning to those quantities.

One goal might be to prove carefully that triangle AXZ is isosceles with \angle A =\angle X. This, combined with ABX being isosceles, shows that ABXZ is a kite, and so BZ is perpendicular to AX, which is the same line as AC. (*)

And with so many isosceles triangles given in the construction, as well as other pairs of angles coming from the cyclic quadrilaterals, this equality of angles in isosceles AXZ can be handled as shown below, or in many other orders.

The equal red angles and equal green angles immediately show that \angle ZAX=\angle AXZ, since both are ‘180 minus red minus green’. Observe that because we’ve already handled a reduction to the problem at (*), this diagram didn’t actually need to have line BZ included, even though it’s part of the original conclusion!

A remark about diagrams: if you match angles efficiently eg as in the figure above, you will discover that triangle AXZ is isosceles even if it doesn’t look particularly isosceles on your figure. Similarly for CXY if you use that in your proof. However, if you draw an accurate figure using compasses, then depending on where the points end up, it may be very clear immediately from the diagram that these triangles are isosceles. This has good points and bad points:

  • Good point: from an accurate figure, it’s more likely that you’ll have the thought process along the lines of “Oh, so AXBZ must be a kite, and that explains why those lines are perpendicular, so I should prove the kite result.”
  • Bad point: it can be easy accidentally to assume one of these intermediate results before proving it. If you assume that CXY is isosceles, or something equivalent (like BCXY a kite, or X,Y are reflections in BC) this is likely to be viewed as a big hole in the proof, not much worse than assuming directly that AXZ is isosceles.

The configuration is intriguing at a broader level. If you extend BX to meet the circumcircle, you now have a cyclic hexagon where the diagonals meet at X and form three kites.

Problem 4

I submitted this problem, and a separate post is now available.

Problem 5

The wording of this problem follows a standard pattern, and the wording of a successful solution should also follow a standard template. If spending a few hours preparing for BMO1 or similar level competitions, it would be worth devoting an hour to problems of this sort, because they come up all the time.

A previous post has discussed the yellow and white houses from the MOG paper in 2018, which is similar in structure, but this P5 has more unusual constraints. Nonetheless, the principles of that post apply. A solution attempt must include a statement of the conjectured smallest number K of faults plus

  • a construction X showing that it is possible to have K faults
  • an argument explaining why K-1 is not possible.

Note that in general, an argument saying “start with X – you can’t remove any faults” will not be valid. After all, how do you know you haven’t missed some much better construction that is totally unrelated to X?

Whether it’s in a live competition or for practice, time spent investigating what K should be is essential. If you commit too early to a value of K that is too large, you certainly won’t be able to prove that K-1 is not possible!

Returning to the details of this question, the fact that the total number of dots is a multiple of 4 but not a multiple of 3 gives a clue that the optimal construction might be based on groups of 4. Probably many students will have found a repeating pattern of four colours that gives 250 faults in total. This is a plausible guess for the minimum.

There is one challenge in proving this, namely that the pattern RBBR has no faults within this group of four. However, it must create a fault at either end when you consider the next group of four on either side. So the key here is to index the faults by the left of the two affected dots (or, properly speaking, we should say the anticlockwise of the two affected dots, since they are on a circle). Then, one can justify, either by a reduction argument or by simply listing all 2^4=16 possibilities, that each group of four indexes at least one fault, which proves that the minimum is 250 as required.

Problem 6

This year, more participants than usual scored close to full marks on BMO1, but I don’t think it is fair to say that this particular problem was too easy. The harder part of the problem (proving that it is necessarily true for n odd) is relatively short. That doesn’t mean that it’s obvious, and there are many ways to go astray. I personally spent 20 minutes on the n odd case pursuing ideas that are hard to articulate but certainly did not work. It just means that it’s relatively unlikely that a contestant had the main idea but ran out of time to write it.

It’s crucial to establish the correct answer by trying various constructions. It’s possible that the octagon n=8 was more instructive than the hexagon n=6. It’s crucial to notice that if you start with a regular k-gon (for any k>2), you can ‘decorate’ each side with congruent ‘shallow isosceles triangles’ to get a 2k-gon that satisfies the length condition but doesn’t have equal angles.

One advantage of taking the isosceles triangle to be very shallow is that we can claim that when \theta is small enough, the 2k-gon is convex, and that the obtuse angle in the triangle is greater than the total obtuse angle at the vertices of the k-gon without formally justifying this with algebra.

Of course this approach does not make sense for constructing an n-gon when n is odd. But this is absolutely not a proof that there is no such construction for n odd. However, in the context of an olympiad problem at this level, it is reasonable evidence that the statement might be true for n odd.

The condition given about the lengths is a local condition, meaning that each condition only affects nearby vertices. When thinking about proving this result, one needs to consider whether we will find a local conclusion (eg two consecutive angles are equal), or a global conclusion (eg every angle is less than or equal to the ‘average angle’). This distinction is less clear when n is small, of course.

I won’t spoil the end of the problem, except to say that I wasted my 20 minutes thinking about global effects, whereas actually the main route involves a neat observation of the local situation.

BMO1 2019

The first round of the British Mathematical Olympiad was sat yesterday. The paper can be found here, and video solutions here. Copyright for the questions is held by BMOS. They are reproduced here with permission.

I hope any students who sat the paper enjoyed at least some of the questions, and found it challenging! The following brief commentaries on the first three problems are not official solutions, and are not always full solutions at all, but contain significant steps of solutions, so would be best saved until after you have attempted the problems, if you are planning to do so.

Question 1

Show that there are at least three prime numbers p less than 200 for which p+2, p+6, p+8, p+12 are all prime. Show also that there is only one prime number q for which q+2, q+6, q+8, q+12, and q+14 are all prime.

It is a major open problem of considerable interest in mathematics to study primes p for which p+2 is also a prime. You are warmly encouraged to have a look at the video of this talk by oucolleague Vicky Neale which discusses this topic in an accessible fashion.

Fortunately, the statement of the second half of this BMO1 problem is not open, but the wider context is a sign that we should be looking for a simple reason why the numbers {q, q+2, q+6, q+8, q+12, q+14} cannot all be prime very often. A candidate for a simple reason is that one of them must be divisible by eg 6 (since there are six numbers), but of course this is not true, eg when q=3, or any other odd value. However, it is the case that one of these numbers must be divisible by 5. You could show this in a few ways:

  • studying what the last digit could be (a number is divisible by 5 precisely when it’s final digit is 0 or 5);
  • studying the remainder of each of the numbers under division by 5;
  • these are in fact pretty much the same thing, only the first option cycles with period ten, whereas the second cycles with period five.

In particular, the only chance that this collection consists of six prime numbers is when one of them is 5, since this is the only multiple of 5 which is a prime! So we need to check q=3, and q=5, and the latter leads to a solution, while q=3 does not, since q+6=9 is not prime.

In fact, the question is true even if we remove one of the terms {q+2, q+6, q+8, q+12, q+14} from consideration. To check that you understand the argument, see if you can work out which term(s) could be removed without breaking the solution?

For the first part, you will probably need to write down some integers less than 200 and check which are and aren’t prime, and maybe you have some shortcuts to narrow down the pool of candidates for p, but this doesn’t have to appear on your final solution. It really is enough to write down your three proposals for p, perhaps indicating what {p, p+2, p+6, p+8, p+12} are in these cases as a sanity check!

Question 2

A sequence of integers a_1,a_2,a_3,\ldots satisfies the relation:

4a_{n+1}^2 - 4a_na_{n+1}-a_n^2 - 1 = 0 (*)

for all positive integers n. What are the possible values of a_1?

We can ‘solve’ the equation (*) by viewing it as a quadratic in the variable a_{n+1}, with coefficients involving a_n. We end up showing that

a_{n+1}=\frac{a_n+1}{2}\text{ or }\frac{a_n-1}{2}. (**)

Note that this is a problem if a_n is even, since neither of these possibilities for the value of a_{n+1} is actually an integer. So certainly a_1 cannot be even, since then we can’t even construct a sequence of length 2, let alone an infinite sequence!

Crucially, one of the options in (**) is odd, and one of the options is even. It’s helpful to think of trying to construct the sequence one term at a time starting from a_1, where at each step we must ‘pick the odd option next’ as otherwise we’d be stuck at an even number, and couldn’t extend the sequence.

The exact wording for how you write this argument isn’t essential. What’s important is that you are clear about what you are trying to assert. Hopefully you are trying to assert something like:

For every odd choice of a_1, there does exist an (infinite) sequence satisfying (*).

Why is this true? Because you can construct the sequence one term at a time. It’s not really necessary to set this up as a formal induction. Rather, you just need to tell the reader that you know how to ensure you never get stuck, namely by always choosing the odd option for a_{n+1}, so that you do indeed end up with an infinite sequence.

Question 3

Two circles S_1,S_2 are tangent at P. A common tangent, not through P, touches S_1 at A, and S_2 at B. Points C and D, on S_1,S_2 respectively, are outside triangle APB, such that P lies on line CD.

Prove that AC is perpendicular to BD.

Here’s a diagram for this problem, lifted from the video. Cropped Dominic is pointing out where AC and BD meet, which I’ve chosen to call X.

As so often in olympiad problems, especially in geometry, the key to success is choosing the right characterisation of the property you are required to prove. In this case, proving that the angle at X is a right angle is awkward via a direct calculation since we don’t really know anything about X except its literal definition. However, an alternative is to show that the angles at C and at D sum up to 90. This is a much better contender for a useful characterisation since we know a lot more about C and D.

For example, C and D both lie on a circle, so are eligible for using circle theorems. Since we have several tangents, the alternate segment theorem might well be useful here. Indeed, if can apply it twice for each of C and D, and in doing so, characterise all the angles of triangle APB in terms of the angles at C and D:

There’s no harm in being informal at this stage. Within triangle APB, we know that the sum of two copies of the red angle and two copies of the black angle is 180 degrees, and so we can read off that the sum of the angles at C and at D is 90 degrees.

For the video presentation, I added the common tangent at P, which did turn out to be useful, and the centres and radii of the circles, which didn’t turn out to be useful. It’s generally a good idea to be thoughtful and flexible about what to include on a diagram. You can probably imagine why it might have been a distraction if we’d had lots of extra lines like BC or PX added on the figure, without any evidence they would be useful to the solution.

EGMO 2015

It’s been a while since I last wrote anything substantial. There have been some posts in the pipeline, but mainly I’ve been working hard on technical things that don’t translate very well into blog posts, and when I haven’t been working on those, have felt like doing non-maths.

Anyway, among other things which happened recently were the UK’s IMO selection camp in Cambridge during the last week of March, and the fourth European Girls’ Mathematical Olympiad in Belarus this past weekend. At the former, I was quite busy organising various things, and so gave a session based on some of my favourite problems about points and lines that I’ve used in the past. My hope was that with each problem in turn the students would actively invest time in thinking about whether the ideas each other were having seemed likely to be profitable. The aim was that being critical about your approach is a necessary skill once you start having lots of ideas for approaches.

This is hard to practise at school just by reading around, whether regarding competition material or generally. In the competition world, official solutions often contain unmotivated magic. This is reasonable, since they are supposed to be a minimal elementary demonstration of the problem’s validity. Motivation is a luxury which space and time sometimes doesn’t permit. And the solutions you find on, for example, Mathlinks often give the impression that the solvers know how to do 25,000 specific types of problem, and the sole task is to identify which type the latest one belongs to. Relating problems to ones you’ve seen before is important, but can hide, or at least dilute the actual ideas in some cases. Knowing that a specific inequality is a special case of a big machine allows you to claim a ‘solution’ but doesn’t help you identify the relevant ideas.

Later in the camp, a conversation arose concerning to what extent the younger staff found these elementary methods and problems easier now that they had experienced various levels of higher education in mathematics than when they were at school. It’s a complicated question, and I don’t think there’s a simple answer. I think the students might suspect that doing a university degree teaches you ‘advanced techniques’ which immediately simplify some of these problems. In rare examples this can be the case, but the majority of the time, I personally feel the only advantage I have is perhaps better instincts for whether something is or isn’t going to work.

Probably a good test would be Euclidean geometry. Adult olympiad staff typically come in two flavours: those who used to be good at geometry and feel out of practice; and those who weren’t good at geometry and certainly had no inclination to remedy this after they left school. I’m probably closer to the first category and I definitely feel out of practice, but also have minimal inclination to remedy this. Nonetheless, on the rare occasions I try these questions (and it’s not going to be for 4.5 hours at this stage…) my progress rate is probably comparable to when I was in sixth form. I’ve no idea how to turn this into a more concrete assessment, but there must be something about doing abstract maths all the time that prevents you forgetting how to do this, so presumably it should be at least as helpful in the types of problem with non-zero overlap with things I actually do. I won’t discuss geometry in the rest of this post, but I did also enjoy the geometry questions – it’s simply that I feel anything I have to say about them would be less useful than saying nothing.

In any case, I enjoyed trying the problems from Belarus in between bouts of typing, and thought I would write some brief comments about how I decided whether my ideas were good or not. To achieve this, I’ve looked at my rough, and will tell you the ideas which didn’t work, as well as some of the ones which did. I’ve paraphrased the statements slightly to avoid having too many LaTeX tags.

WARNING: what follows will spoil questions {2,4,5} if you haven’t already looked at them, but would like to.

Question 2 – A domino is a 2 × 1 or 1 × 2 tile. Determine in how many ways exactly n^2 dominoes can be placed without overlapping on a 2n × 2n chessboard so that every 2 × 2 square contains at least two uncovered unit squares which lie in the same row or column.

The wording of the question prompted me to consider splitting the board naturally into n^2 2 x 2 squares. I then thought about this ‘at least’ in the question, and concluded that for these 2 x 2 squares, it should be ‘exactly’.

I tried doing an unusual colouring, when I coloured half the black squares green, and half blue and tried to show that either only greens or only blues were covered, but this was clearly not true, or fixable. I then tried to do something similar for the other set of 2 x 2 squares (those whose vertices have (odd, odd) coordinates). Roughly speaking, if too few of the outer cells on the original board are covered, you can’t achieve the bounds on these odd inner squares. But this didn’t really give any useful information.

However, it did get me thinking about how what happens in the far top-left affects the rest of the board, and from there most of the ideas came quickly. I described a 2 x 2 square as N, E, S, W depending on which ‘half’ of the square was covered. Then if a square is N, all the squares directly above it must be also be N (*).

I then made two mistakes, and if we’re looking for reasons where experience is helpful, it was probably here, as I spotted them fairly quickly, rather than wasting ages and ages.

First, I decided that either all squares were {N,E} or all were {S,W} which seemed plausible when I had been focusing on the top-left. This gave an answer of 2 \binom{2n}{n}, but after a bit more thought is clearly not symmetric enough.

Second, I thought condition (*) might be the only constraint, along with similar ones for E, S, W naturally too. I tried to count these using a similar method of enumerating lines between these regions, and I realised I didn’t know how to do these sensibly, for example if it looked like this:

EGMO15 Q2 diagram (3)

This led to another bit of meta-maths that I probably wouldn’t have considered if I was 16. Namely, that the idea of counting these monotone lines across the 2n x 2n grid was too nice not to be useful. Also, if I couldn’t see a way to adapt it for this more complicated setup, my setup was probably wrong. This thought was useful, and then by thinking about the interface between {N,E} and {S,W}, then the other way round, it made sense that the configuration could be parameterised by two monotone lines between different pairs of corners, giving an answer of \binom{2n}{n}^2.

So, if it’s possible to give a reason why I could do this, it’s probably because I had an arsenal of ways to check an answer for plausibility, which saved wasting time on something wrong, and also because I trusted that the answer would be nice, which saved wasting time on something which was also wrong, and would probably have been very complicated to resolve.

Question 4 – Determine whether there exists an infinite sequence a_1,a_2,\ldots of positive integers satisfying a_{n+2}=a_{n+1}+\sqrt{a_{n+1}+a_n}.

So, my first reaction was ‘no way’. Why? Because everything’s determined by the first two terms, and I couldn’t think of any reason why a cool choice of the first two terms would force all of the sums a_{n+1}+a_n to be perfect squares. It seemed just about possible that we could arbitrarily large finite sequences with this property. (Though this also turns out to be impossible.)

I imagine many contestants might have felt similarly and spent a while playing round with quadratic residues to get a sense for exactly how hard it is to make this work for the initial terms. But I was suspicious of the form of the recurrence. We know that if it had been defined linearly, then the terms would grow exponentially, but it’s natural to ask roughly how fast they grow in this example, even relaxing the restriction that the terms be integers.

The first observation was that they are indeed (strictly) growing! But how fast? Are there actually enough squares that every a_{n+1}+a_n can be a different one? Note that the squares themselves satisfy a similar recurrence relation (N+1)^2 = N^2 + 2\sqrt{N^2} + 1 > N^2 + \sqrt{ N^2 + N^2}. So this seemed like a very good idea, and my instinct was that this should work, and I felt glad that I hadn’t pursued the quadratic residues approach.

From here we were basically home. I asked whether the sequence grows faster than the sequence (\frac{n^2}{2}), and the answer was no as

a_{n+2}\le a_{n+1}+ \sqrt{2 a_{n+1}} \le (\sqrt{a_{n=1}} + \frac{1}{\sqrt 2})^2,

so if (after translating indices) a_n = \frac{x^2}{2}, we have a_{n+1}\le \frac{(x+1)^2}{2}. This is clearly a problem or at best a very tight constraint if all the \{a_n+a_{n+1}\} have to be perfect squares, so even though we aren’t completely finished, I am confident I can be in one or two lines, with a bit of care. Fiddling with small values of n looked like it would work, or showing that looking at a large enough initial subsequence that the effect of the first two terms dissipated, we must by the pigeonhole principle have a_{n+2}+a_{n+1}=(k+1)^2,\, a_{n+1}+a_n=k^2, which is enough by a parity argument, using the original statement.

This final bit looks and feels a bit messy, but by this stage it really is just finding any argument to justify why a sequence which grows at most as fast as n^2 can’t actually be n^2 eventually.

Probably the reason I did this quickly was because I trusted my instinct for ‘no’, and also because there seemed a quick way to qualify *roughly* how this sequence grew. Sensibly approximating things, and deciding whether it’s appropriate to approximate them is definitely something I feel I’ve got better at during my PhD, so I guess this was helpful, then just try and throw back the important condition that the elements were integers at the very end.

Question 5 – Anastasia partitions the integers [1,2m] into pairs. Boris tries to choose one integer from each pair so that the sum is n. Given n and m, prove Anastasia can select pairs so that Boris can’t succeed.

So I wasted some thought time by imagining that n was fixed and trying to come up with ideas for the choice of pairs which might avoid n. It struck me that there was no reason why a a typical (uniformly chosen) pairing should avoid any particular n unless this value was particularly big or small.

How big or how small? Well Boris can always choose the bigger element of a pair, so the way to make the minimum maximum is to pair as (1,2), (3,4), … , (2m-1,2m). Conveniently, this also achieves the maximum minimum. These can be calculated as m^2,m(m+1) respectively. Suddenly this seems great, because we’ve actually solved the problem for a huge range of n, ie everything not within between these extrema.

The key step here was to start considering special pairings, chosen to give a reduced set of possible sums. Once we’ve had this idea, it makes sense to consider other different special pairings. The reason we got a small set of possible sums is that there’s lots of overlap. We can achieve lots of overlap by looking at the difference between elements in a pair, and making as many of these the same as possible. For, consider pairs (a,a+d), (b,b+d). Then it’s the same to take a and b+d as to take a+d and b, so we get the same sums in lots of different ways.

The other way to have as many differences the same as possible is to go for (1,m+1), (2,m+2), … , (m,2m). Conveniently, we can parameterise the sums now because at each choice, we decide whether to add an extra m or not, so the sum must be 1+2+…+m, plus some multiple of m. So we can defeat Boris, except when n is \binom{m}{2}+km.

This is a good point to stop because what followed was basically book-keeping. We only have to consider a couple of specific cases when m is odd, and one when m is even, that happen not to fall into either of the categories we can now deal with. It wasn’t too hard to corrupt the examples we already have to deal with these.

The key here was responding to initial failure by trying to find any control at all over n. Perhaps given enough time I would have started trying special pairings? Equally, I might have tried small examples, which would not really have been hugely helpful for this question. In any case, trying to eliminate very small or very large n luckily worked well, as a) I didn’t need to use the word ‘very’ twice in the end; and b) the idea of looking at choices of pairings to minimise the number of sums quickly gave other useful deductions.

I also really enjoyed Question 3, though was suspicious that I’d used a bound a great deal weaker than one in the question. Along the way, I was trying something confusing and not especially useful that led me into some interesting theory about Latin squares, which I might write something about later in the week.

Hitting Probabilities for Markov Chains

This continues my previous post on popular questions in second year exams. In the interest of keeping it under 2,500 words I’m starting a new article.

In a previous post I’ve spoken about the two types of Markov chain convergence, in particular, considering when they apply. Normally the ergodic theorem can be used to treat the case where the chain is periodic, so the transition probabilities do not converge to a stationary distribution, but do have limit points – one at zero corresponding to the off-period transitions, and one non-zero. With equal care, the case where the chain is not irreducible can also be treated.

A favourite question for examiners concerns hitting probabilities and expected hitting times of a set A. Note these are unlikely to come up simultaneously. Unless the hitting probability is 1, the expected hitting time is infinite! In both cases, we use the law of total probability to derive a family of equations satisfied by the probabilities/times. The only difference is that for hitting times, we add +1 on the right hand side, as we advance one time-step to use the law of total probability.

The case of hitting probabilities is perhaps more interesting. We have:

h_i^A = 1,\; i\in A, \quad h_i^A=\sum_{j\in S}p_{ij}h_j^A,\; i\not\in A.

There are two main cases of interest: where the chain is finite but has multiple closed communicating classes, and where the chain is infinite, so even though it is irreducible, a trajectory might diverge before hitting 0.

For the case of a finite non-irreducible Markov chain, this is fairly manageable, by solving backwards from states where we know the values. Although of course you could ask about the hitting probability of an open state, the most natural question is to consider the probability of ending up in a particular closed class. Then we know that the hitting probability starting from site in the closed class A is 1, and the probability starting from any site in a different closed class is 0. To find the remaining values, we can work backwards one step at a time if the set of possible transitions is sparse enough, or just solve the simultaneous equations for \{h_i^A: i\text{ open}\}.

We therefore care mainly about an infinite state-space that might be transient. Typically this might be some sort of birth-and-death chain on the positive integers. In many cases, the hitting probability equations can be reduced to a quadratic recurrence relation which can be solved, normally ending up with the form

h_i=A+B\lambda^i,

where \lambda might well be q/p or similar if the chain is symmetric. If the chain is bounded, typically you might know h_0=1, h_N=0 or similar, and so you can solve two simultaneous equations to find A and B. For the unbounded case you might often only have one condition, so you have to rely instead on the result that the hitting probabilities are the minimal solution to the family of equations. Note that you will always have h^i_i=1, but with no conditions, h^i_j\equiv 1 is always a family of solutions.

It is not clear a priori what it means to be a minimal solution. Certainly it is not clear why one solution might be pointwise smaller than another, but in the case given above, it makes sense. Supposing that \lambda<1, and A+B=1 say, then as we vary the parameters, the resulting set of ‘probabilities’ does indeed vary monotically pointwise.

Why is this true? Why should the minimum solution give the true hitting probability values? To see this, take the equations, and every time an h_i^A appears on the right-hand side, substitute in using the equations. So we obtain, for i\not\in A,

h_i^A=\sum_{j\in A}p_{ij}+\sum_{j\not\in A} p_{ij}h_j^A,

and after a further iteration

h_i^A=\sum_{j_1\in A}p_{ij_1}+\sum_{j_1\not\in A, j_2\in A}p_{ij_1}p_{j_1j_2}+\sum_{j_1,j_2\not\in A}p_{ij_1}p_{j_1j_2}h_{j_2}^A.

So we see on the RHS the probability of getting from i to A in one step, and in two steps, and if keep iterating, we will get a large sum corresponding to the probability of getting from i to A in 1 or 2 or … or N steps, plus an extra term. Note that the extra term does not have to correspond to the probability of not hitting A by time N. After all, we do not yet know that (h_{i}^A) as defined by the equations gives the hitting probabilities. However, we know that the probability of hitting A within N steps converges to the probability of hitting A at all, since the sequence is increasing and bounded, so if we take a limit of both sides, we get h_i^A on the left, and something at least as large as the hitting probability starting from i on the right, because of the extra positive term. The result therefore follows.

It is worth looking out for related problems that look like a hitting probability calculation. There was a nice example on one of the past papers. Consider a simple symmetric random walk on the integers modulo n, arranged clockwise in a circle. Given that you start at state 0, what is the probability that your first return to state 0 involves a clockwise journey round the circle?

Because the system is finite and irreducible, it is not particularly interesting to consider the actual hitting probabilities. Also, note that if it is convenient to do so, we can immediately reduce the problem when n is even. In two steps, the chain moves from j to j+2 and j-2 with probability ¼ each, and stays at j with probability ½. So the two step chain is exactly equivalent to the lazy version of the same dynamics on n/2.

Anyway, even though the structure is different, our approach should be the same as for the hitting probability question, which is to look one step into the future. For example, to stand a chance of working, our first two moves must both be clockwise. Thereafter, we are allowed to move anticlockwise. There is nothing special about starting at 0 in defining the original probability. We could equally well ask for the probability that starting from j, the first time we hit 0 we have moved clockwise round the circle.

The only thing that is now not obvious is how to define moving clockwise round the circle, since it is not the case that all the moves have to be clockwise to have experienced a generally clockwise journey round the circle, but we definitely don’t want to get into anything complicated like winding numbers! In fact, the easiest way to make the definition is that given the hitting time of 0 is T, we demand that the chain was at state n at time T-1.

For convenience (ie to make the equations consistent) we take h_0=0, h_n=1 in an obvious abuse of notation, and then

h_j=\frac12h_{j-1}+\frac12 h_{j+1},

from which we get

h_j=a+bj \Rightarrow h_j=\frac{j}{n}.

Of course, once we have this in mind, we realise that we could have cut the circle at 0 (also known as n) and unfolded it to reduce the problem precisely to symmetric gambler’s ruin. In particular, the answer to the original problem is 1/2n, which is perhaps just a little surprising – maybe by thinking about the BM approximation to simple random walk, and that BM started from zero almost certainly crosses zero infinitely many times near we might have expected this probability to decay faster. But once it is unfolded into gambler’s ruin, we have the optimal stopping martingale motivation to reassure us that this indeed looks correct.

Teaching Probability in China

As I’ve mentioned in passing in a few recent posts, which have tended to focus on Markov chains more than might perhaps be expected, I’ve just finished a two-week trip teaching at Linyi University in Shandong province, about 600km south of Beijing. I’ve been describing some of my touristic and social observations of China on my personal blog, but it has been mainly a mathematical adventure, so I thought I would describe some of the main cultural differences in the approach to science here. I was surprised to find these a far larger obstacle than the more obvious language barrier in the classroom.

1) I think many of us were surprised by how different the gender divide was at Linyi to what we are accustomed to in Cambridge. Although of course it varies from year to year, from course to course, and in select cases from lecturer to lecturer, in general there is a significant male majority in lecture halls. In China, amongst my students (who had just finished their third year), the ratio seemed to have been reversed and then further extremised. I had about twelve students in attendance during the middle of the course, and only one was a boy.

2) Perhaps the most clear reason for this reversal of the Western trend became apparent when I asked some of my group what they were planning to do after leaving university. As most of them have just a year left, they all seemed to have a fairly concrete idea of what their plans were, and the answers were perhaps unexpected. One of the most able girls said she wanted to continue with postgraduate mathematics, but all of the rest wanted to become teachers. And not just high school either. In fact, all bar two or three planned to teach middle or even elementary school. I don’t want to get burned by talking about things I don’t much understand, but I understand this is very different to the UK. Here, I do not believe many maths graduates from top universities (in a national context) have ambitions in primary education. I do not know which way round the causality works. Are they encouraged towards a pure science degree as a route into schools, or are teaching jobs viewed the natural job to follow a hard quantitative course? In either case, it sounds like an excellent situation to be in from the point of mathematical education.

3) The role of the lecturer in China evidently carries a great deal more gravitas than comparable positions in England. Clearly his task is to stand at the board and write down the notes, which the audiences copies politely. So far, so like Cambridge Part III. But then it seems that the notion of interacting with the class doesn’t really apply here. Even an utterly uncontentious question such as ‘what is your name, and which year are you in?’ produced a flurry of nervousness, and then each student stood up stiffly in turn to answer. This continued even as the class size reduced, and my manner became more informal, for example starting to explain ideas while perching on a desk in the front row. I guess old habits die hard. It’s a bit of a shame because I feel the purpose of a lecture is not just to transcribe course material, but also to give motivation and a flavour of the mathematical landscape, as well as to inject some enthusiasm into an appreciation of the topic. And it’s very hard to start such ideas in a culture where it is totally unexpected.

4) This was even more evident in some of the research talks. Each member of the Cambridge group and various Linyi staff, postgraduates and masters students gave a talk outlining some aspect of their research interest. The motivation had been to try and recreate a seminar series, at least in environment if not in totality given the expansive range of areas of study represented. I was struck by the fact that what seemed like the entire maths undergraduate community had been forced to attend, but many of the talks were completely inaccessible to a non-specialist audience. A paper on a technical property of spectral theory read out at speaking pace from a powerpoint is not going to get younger students excited about research maths.

5) As suggested by their enthusiasm for PDEs and complicated graph theory construction problems in the research talks, the students seemed to relish calculation over theory. I found it very difficult to get across the idea that it was sometimes better to use the reuslt we had just shown rather than immediately attempting to diagonalise all visible matrices. I think some of the undergraduate teaching mirrors sections of Maths A level in this country, where the emphasis is on deploying solution techniques rather than ideas. For example, the class looked absolutely lost when I tried to use recurrence relations, to the extent that I decided just to start my explanation from the basics. It turned out that I had notated the general solution as h_i=\alpha \lambda^i+\beta \mu^i whereas their teacher had used h_i=\alpha \lambda^{i-1}+\beta \mu^{i-1}, and this had caused the major confusion. The problem is compounded by the fact that despite constant encouragement, they feel mortified at the thought of asking a question or seeking a clarification. Again, it seems to be a matter of respect, but of course this spirit can remain in a more enquiring environment.

Overall, it was a very interesting insight into a completely alien mathematical culture. I know which one I prefer, but I think my perspective is richer for having experienced something of mathematics in China. And hopefully any teaching I do in the near future will seem straightforward by comparison, if only for the property of sharing a language with the students…!