anna – Picture this maths

Three correlations and an algebraic classification

March 18, 2021annaLeave a comment

We are looking at a triple of correlations $x, y, z$ that relate three variables:

In my previous post, we saw these pictures of triples of correlations:

The yellow shape is the space of possible correlations $(x,y,z)$ .

On the left, we intersect with the plane $x=0$ (i.e. we set correlation $x$ to zero).
On the right, the plane is $x+y+z=1$ (i.e. the three correlations sum to one).

The intersection of the yellow shape with the blue plane is all correlations where these extra linear conditions hold. The intersection seems to look different in the two pictures. How can we describe the difference? What are the possible shapes of intersections that can occur?

These questions were studied in the paper Nets of Conics by C.T.C. Wall from 1977. In the paper, Wall gives an algebraic classification, with 26 possibilities for what the intersection can look like:

Where do our two examples fit into Wall’s classification?

To see this, we look at the points $(x,y,z)$ on the boundary of the yellow shape. Such a point gives a matrix

$\begin{bmatrix} 1 & x & y \\ x & 1 & z \\ y & z & 1 \end{bmatrix}$

with determinant equal to zero. To get Wall’s pictures, the boundary is extended all points $(x,y,z)$ where the determinant vanishes, ignoring the requirement that $x, y, z$ come from correlations (e.g. we now get points with coordinates outside of the range -1 to 1). In our first example, we get the extended picture:

or, from a different angle

Wall’s pictures plot the intersection of the yellow shape with the blue plane, shading in the correlations. For our first example, we get a shaded circle:

All of Wall’s intersections have cubic (degree three) equations, but here we have a circle, which is degree two. We recover the cubic if we apply a change of coordinates:

In terms of equations, the full cubic polynomial is

${\rm det}\begin{bmatrix} a & 0 & y \\ 0 & a & z \\ y & z & a \end{bmatrix} = a^3-a y^2-a z^2$

which factors as $a (-a^2+y^2+z^2)$ .

Table 1 of Wall’s paper tells us we are in type D*. Then, from Wall’s Figure 3 above, we see that our first example is sub-type D*c:

We follow the same recipe to find our second example in Wall’s classification. First, we extend the picture of the yellow shape and the blue plane:

Already we start to see some differences between the two examples. The intersection of the yellow shape with the blue plane gives the picture

The intersection is the cubic curve (or elliptic curve) with equation $x^2y + xy^2 +x^2+y^2-x-y$ . This polynomial can’t be factored, and has no singular points. This means it is in Wall’s type A, which has four sub-types. Wall’s Figure 2, above, shows that it is sub-type Ab or Ac, since it has a shaded region. We distinguish Ab from Ac using what Wall calls a “preferred point”.

To find the preferred point, we apply a change of coordinates to convert the cubic to “canonical form” $y^2 = f(x)$ , where $f$ is a cubic (e.g. following the instructions here). In this example, we apply the change of coordinates

$\begin{bmatrix} a \\ x \\ y \end{bmatrix} \mapsto \begin{bmatrix} (2/3)a -x \\ (1/3)a + x \\ (1/6)a - x + y \end{bmatrix}$ .

to $x^2y + xy^2 +ax^2+ay^2-a^2 x-a^2 y$ and then set $a=1$ . This transforms the cubic to $y^2 = 2x^3 - (2/3)x + 11/108$ . In the new coordinates, the cubic is:

The preferred point is the rational root of the cubic $f(x)$ in the normal form, which is $2x^3 - (2/3)x + 11/108$ . It has a linear factor of $(6x-1)$ , which gives preferred point $( 1/6, 0)$ , the black point marked.

The preferred point is the rational root of the cubic $f(x)$ in the normal form, which is $2x^3 - (2/3)x + 11/108$ . It has a linear factor of $(6x-1)$ , which gives preferred point $( 1/6, 0)$ , the black point marked.

Going back to Wall’s pictures, we find that we’re in sub-type Ab:

In his paper, Wall says this classification is “intrinsically interesting, and involves some pleasant geometry”.

What do you think?

Three correlations and other statistical models

February 3, 2021anna1 Comment

In my previous post we saw how the above shape (called the elliptope, or the samosa) shows up when looking at correlations between three variables. The samosa is all points $(x,y,z)$ that are a triple of correlations of three variables. For example, $(0.5, 0.5, 0.5)$ is in the samosa, because it is possible to have three variables with all correlations 0.5. But $(1,1,0)$ is not, because if two variables have perfect correlation with a third, they must also be perfectly correlated.

Put another way, the samosa is the set of points where the matrix

$\begin{bmatrix} 1 & x & y \\ x & 1 & z \\ y & z & 1 \end{bmatrix}$

has non-negative determinant, for $x, y, z$ in the range from -1 to 1. If we know two of the three correlations, we can use this description of the samosa to find the possiblilities for the third correlation. This gives the line segments in the previous post.

What if we only know one of the three correlations? Maybe we know that correlation $x$ is zero. What are the possible values of correlations $y$ and $z$ ?

The space of three correlations intersected with the plane $x = 0$ .

The space of three correlations intersected with the plane $x = 0$ .

We can intersect the plane $x=0$ with the samosa. The points that lie in the samosa, and on this plane, are triples of correlations $(0, y, z)$ .

If we knew the three correlations sum to one, we would intersect the samosa with $x + y + z = 1$ .

The space of three correlations intersected with the plane $x + y + z = 1$ .

The situation of intersecting the samosa with a plane arises in the study of certain statistical models. For example, it can be seen in this figure from “The Handbook of Graphical Models”:

Figure 9.2, The Handbook of Graphical Models (see here for a pdf). This is the version of the figure from Caroline Uhler’s chapter.

In the top-left picture, the yellow shape is the samosa. It has been intersected with a green plane. The points on the green plane and in the samosa are inside the black curve on the top-right picture.

These pictures are cartoons of graphical models (statistical models that follow the structure of a graph). A Gaussian graphical model arises by intersecting the samosa (or its higher-dimensional analogue) with a particular linear space. Each point in the samosa is now not a collection of correlations, but rather an inverse covariance matrix.

When studying such graphical models, it is helpful to look at the inverses of the matrices. This is where the second row of pictures in the above figure comes in. In the lower left picture, the red shape gives the inverses of matrices of the form

$\begin{bmatrix} a & x & y \\ x & a & z \\ y & z & a \end{bmatrix}.$

The red shape is obtained by projecting the inverse matrices using sufficient statistics.

Here is some Macaulay2 code I used to compute the surface. It is also described in Example 5.3 here.

R = QQ[a,x,y,z,s11,s12,s13,s22,s23,s33];
K = matrix{{a,x,y},{x,a,z},{y,z,a}};
S = matrix{{s11,s12,s13},{s12,s22,s23},{s13,s23,s33}};
M = K*S - matrix{{1,0,0},{0,1,0},{0,0,1}};
I = minors(1,M);
J = eliminate(I,{a,x,y,z});
R1 = QQ[s11,s12,s13,s22,s23,s33,t1,t2,t3,t4];
d = minors(3,S); d = substitute(d,R1);
J = substitute(J,R1);
J = J + d + ideal(t1 - s11 - s22 - s33, t2 - s12, t3 - s13, t4 - s23);
K = eliminate(J,{s11,s12,s13,s22,s23,s33});
dc = decompose K

Three correlations and a samosa

October 31, 2020anna1 Comment

Let’s imagine a hypothetical situation. There’s an infection going round, and we want to predict the future severity of someone’s illness.

There is a test that offers a good prediction. Let’s say the outcome of the test has a correlation of 0.78 with the patient’s severity of infection. The problem with the test is that it is expensive and time-consuming. But there’s an alternative test, which is much cheaper and faster. We don’t know how well the cheap test correlates with the severity of infection, but we know the correlation between the cheap test and the expensive test is quite high, at 0.89.

We have three related correlations – two known, and one unknown.

What can we say about how well the cheap test correlates with the severity of infection?

We might expect to be able to say something about the unknown correlation. For example, if the expensive test had a correlation of 1 with the severity of infection, and that the cheap test also had a correlation of 1 with the expensive test, then everything is perfectly correlated and the cheap test must also have a correlation of 1 with the severity of infection.

But, let’s assume the expensive test only has a correlation of 0.5 with the severity of infection, and that the two tests are also only correlated with correlation 0.5. Now it isn’t clear whether we can say anything about the correlation of the cheap test with the severity of infection.

Let’s go back to the numbers in the original example.

We can organise our correlations into a $3 \times 3$ matrix. We have three variables:

severity of infection
expensive test outcome
cheap test outcome

And we build the matrix:

$\begin{bmatrix} 1 & 0.78 & y \\ 0.78 & 1 & 0.89 \\ y & 0.89 & 1 \end{bmatrix}.$

The $(i,j)$ entry of the matrix gives the correlation between variable $i$ and $j$ . Each variable has a perfect correlation with itself, so the diagonal entries of the matrix are equal to 1. In addition, the correlation between $i$ and $j$ is the same as the correlation between $j$ and $i$ , so the matrix is symmetric.

A matrix of correlations has to be positive semi-definite. This condition will allow us to find the range of possible values for $y$ .

But first, let’s consider the case where all three correlations are unknown. We then have correlation matrix

$\begin{bmatrix} 1 & x & y \\ x & 1 & z \\ y & z & 1 \end{bmatrix} .$

The region of possible values for the correlations $(x,y,z)$ are the values for which the matrix is positive semi-definite. This is a region of 3-dimensional space that looks like this:

It is called the elliptope, a name that dates back at least as far as 1996 (see here) and is also called the samosa, a name that dates back at least as far as 2011 (see here).

We can now fix values for our two known correlations, to see the possible values for the third correlation. The possibilities are all values that keep the triple of correlations inside the samosa.

If we know two correlations are $x = 0.78$ and $y = 0.89$ , the range of possibilities for the third correlation is given by the black line

If we know two correlations are $x = 0.78$ and $y = 0.89$ , the range of possibilities for the third correlation is given by the black line

We can find the upper and lower limit of the third correlation by seeing where the black line intersects the boundary of the samosa. The boundary is defined by setting the determinant of the matrix

$\begin{bmatrix} 1 & 0.78 & y \\ 0.78 & 1 & 0.89 \\ y & 0.89 & 1 \end{bmatrix}$

to be zero. We get a quadratic polynomial in $y$ with roots at approximately 0.41 and 0.98. So the third correlation has to lie in the range 0.41 to 0.98.

So for the infection example, although the cheap test has a high correlation with the expensive test, in the worst case it only offers a correlation of 0.41 with the severity of infection.

If the two known correlations had both been 0.5, a similar computation shows that the third correlation has to lie in the range -0.5 to 1. The third correlation could even be negative!

If two correlations take values 0.5, the third correlation could range from -0.5 to 1, the values along the black line

This is the first part of a small series of “correlated” posts that I hope to write about correlations – stay tuned for more!

Algorithms to find rectangles, for humans and computers

September 15, 2020annaLeave a comment

Cell blocks (or Shikaku) is a puzzle that looks like this:

It has a grid with some numbered squares. The goal is to divide the grid into rectangles, so that each rectangle contains only one numbered square, and the number on the square is the area of the rectangle.

From the locations and values of the numbers, we see where some of the rectangles should go: the “14” has to go in a rectangle of area 14, which (to avoid other numbered squares) has to be the 2×7 rectangle on rows 2 and 3. This cuts off row 1 from the rest, so we can find the rectangles on that row too. For other cell blocks puzzles it can be more difficult to work out where to start:

Finding rectangles in a grid may not seem particularly useful, but it can sometimes help us to analyse data. However, while cell blocks puzzles are designed to be solved (or even enjoyed) by a human, finding rectangles in other grids requires some computer assistance.

In a recent(-ish) project, we designed an algorithm to divide a grid into rectangles in accordance with certain rules. We used the rectangles that the algorithm found to analyse some biological data.

Here is an example of how it works. We have a grid that summarises some biological data.

The rows of the grid give information about different breast cancer cell lines, which we can think of as different sub-types of breast cancer. The columns of the grid are labelled by experimental conditions. Each cell line has been exposed to each experimental condition, to try to disentangle similarities and differences between them.

The colours on the grid give the type of response. For example, if a location on the grid is yellow, we think of this as saying that the cell line has a “high response” to a particular experimental condition. Green is “medium response” and blue is “low response”.

It is difficult to summarise the locations of the grid with each of the three different responeses, which complicates the biological interpretation. To get around this, we run our algorithm to find rectangles on the grid that are “as close as possible” to the above picture, in the sense that as few locations as possible have changed colour. We obtain this picture:

The algorithm works by re-casting the search for rectangles as an integer optimisation problem, and then using the branch and cut method. The (matlab) code to find the constraints on the optimisation is given here (on pages 155-8). This enabled the computer to solve our challenging cell blocks style puzzle.

Simpson’s paradox as a semi-algebraic set

August 26, 2020annaLeave a comment

This post is about the second part of the talk I gave at It all adds up in January. It’s about some of the maths behind the idea of making decisions based on partial observations, something we do a lot. One important example of this is deciding medical treatment options based on results from a clinical trial.

Let’s say you are deciding between two medicines: whether to take the red pill or the blue pill.

Screenshot 2020-08-25 at 18.01.00

You decide to take the option that gives you the greatest chance of being “happy”. You’re given some data to help you decide: you’re told that 62% of people who take the blue pill are happy, compared with 57% of people who choose the red pill.

But that’s not all, you’re also told the percentages broken down into older people and younger people

for young people, 63% of those who take the blue pill are happy, but 80% of those who take the red pill are happy
for older people, 50% of those who take the blue pill are happy, and 56% of those who take the red pill are happy

The data seem incompatible. If you don’t take your age into account, you choose the blue pill to have the higher chance of happiness. If you take age into account, regardless of whether you are old or young, then you should take the red pill for the higher chance of happiness.

But the data are not inconsistent. Here is an example of a survey the percentages could have come from.

Screenshot 2020-08-25 at 18.01.19

And here is the aggregated data across both age groups:

Screenshot 2020-08-25 at 18.01.10

It seems difficult to make a decision based on this data.

This is an example of a phenomenon is called Simpson’s paradox.

It can be described in terms of inequalities. We can turn each of the numbers in the tables above into probabilities, by dividing by the total number of respondents. The phenomenon occurs when the following inequalities hold:

Screenshot 2020-08-25 at 18.01.42

Such examples are rare, but they do happen. We can sample the space of probabilities to compute that it happens around approximately 1.7% of the time, when we have three binary variables.

This 1.7% is the volume of a semi-algebraic set: a region of space where the three inequalities hold. Here is a 3D picture of the algebraic conditions. The region between the three surfaces, which the arrow is pointing to, is the zone of Simpson’s paradox.

simpson

What can we learn from this? One thing is the importance of randomised trials for comparing medicines. The above data are not a well-designed trial, because almost all the young people took the blue pill, and almost all the older people took the red pill. More generally, as interpreting statistics from the news becomes an increasing part of our daily lives, this example also shows how easy it is, unfortunately, for such statistics to be misleading.

It all adds up

August 20, 2020anna1 Comment

Time has flown since our last post on this blog, and I wanted to re-kindle things here with a post about It All Adds Up, a conference for girls in years 9 to 12 that takes place each year at the Mathematical Institute in Oxford.

Back in January, I gave the plenary talk to the two hundred or so year 10-11s. It was a great experience, and all the more memorable now, when a room of that many people seems a like a world away.

One of the topics I spoke to them about was

how to detect an object from its shadows

Often an object of mathematical interest is high-dimensional and, when visualising or understanding it, we can find a “shadow” that captures important stucture. For some examples inspired by statistics, see this post and this one too.

Let’s imagine we have a mystery object in a box. We can shine a light on the box in the three different directions and observe the shadow. Based on the shadows, we want to understand the object in the box.

Screenshot 2020-08-19 at 14.47.37

For example, if each of the three shadows is a filled-in square, then one possibility for the object is a filled-in cube, but there are other options too.

Are there combinations of three shadows that can’t come from any object?

For example, does there exist an object that has its three shadows given by the three shapes shown: a square, a triangle, and a circle?

Screenshot 2020-08-19 at 14.47.54

It turns out that such an object does exist! It looks like this: (see here for more angles of it and for the files we used to make the 3D print.)

Screenshot 2020-08-19 at 14.45.17

We could also try to design an object that has three shadows given by these three letters (or your initials instead of mine):

Screenshot 2020-08-19 at 14.48.07

How could we show that three shadows could not have come from a particular object? One tool is the Loomis-Whitney inequality, which relates the volume of an object to the area of its three shadows. It says that the areas of the shadows cannot be too small relative to the volume of the object:

$vol(object) \leq \sqrt{area(proj_1) area(proj_2) area(proj_3)}.$

We are often interested in higher-dimensional examples. For example, does there exist a four dimensional object that has these shadows?

Screenshot 2019-11-25 at 09.46.07

This picture is from here, where it arises in the context of marginals of quantum systems.

For an even more artistic example, here is a sculture by Matthieu Robert-Ortis, where one projection of a sculpture gives an elephant, while another gives two giraffes:

CXe4URMWwAEd0pU

Thanks to Vicky Neale and Mareli Grady for organising It All Adds Up, and for inviting me to speak there. And thanks to Derek Moulton and Alain Goriely for help with the 3D printing.

Visualizing statistical models – it’s child’s play

July 17, 2018anna1 Comment

Before you ask a mathematician if they can visualize the fourth dimension, ask them if they can truly visualize a three-dimensional object, like the boundary of a four-dimensional football. If they tell you it’s easy, and their name isn’t Maryna Viazovska, they’re probably lying.

Making an accurate picture of an object from a high dimensional space is very challenging. In this blog post we’ll see a surprising case where it turns out to be possible. We’ll visualize an interesting seven-dimensional object, which comes from a question in statistics.

Let’s consider the probability that each of the teams in the quarter-finals of the Men’s FIFA 2018 World Cup would win. The teams were (Uruguay, France, Brazil, Belgium, Russia, Croatia, Sweden, England). Today we know the probabilities of the teams winning, in that order, are $(0,1,0,0,0,0,0,0)$ , because France has already won. Back on 3rd July the probabilities (according to FiveThirtyEight) were $(0.06, 0.15, 0.3, 0.11, 0.05, 0.12, 0.07, 0.14)$ , and on 7th July the probabilities were $(0,0.29,0,0.26,0,0.18,0,0.27)$ .

In a recent project we were studying which probability distributions lie in a particular statistical model. We found out that our statistical model is given by inequalities that the eight probabilities need to satisfy. If we call the probabilities $(a,b,c,d,e,f,g,h)$ , the inequalities are:

$(ad-bc)(eh-fg) \geq 0, \quad (af-be)(ch - dg) \geq 0, \quad (ag-ce)(bh-df) \geq 0 .$

The probabilities have to sum to 1, so $a + b + c + d + e + f + g + h = 1$ . We want to visualize the part of seven-dimensional space in which the inequalities hold. How can we do it?

The first step is to notice that some combinations of letters do not affect whether the inequalities hold or not. They are:

$(a + b + c + d) - (e + f + g + h) , \quad (a + c + e + g) - (b + d + f + h) , \quad (a + b + e + f) - (c + d + g + h)$

So we can apply a change of coordinates that removes these three directions, leaving something four-dimensional. Finally, to get something three-dimensional we can assume that the four remaining coordinates lie on the sphere.

We end up with a picture that looks like this:

BlobsL

The part of space that lies inside the statistical model are the points outside either the blue blob, the green blob, or the yellow blob.

These days, we have an even better way to visualize the statistical model, truly in 3D. It even doubles-up as a handmade toy for children.

IMG_20180716_094013 (1) — Order yours here

We can’t help but wonder – which other children’s toys are really statistical models in disguise?

A duality of pictures

February 8, 2018annaLeave a comment

Duality relates objects, which seem different at first but turn out to be similar. The concept of duality occurs almost everywhere in maths. If two objects seem different but are actually the same, we can view each object in a “usual” way, and in a “dual” way – the new vantage point is helpful for new understanding of the object. In this blog post we’ll see a pictorial example of a mathematical duality.

How are these two graphs related?

bg1

bg2 In the first graph, we have five vertices, the five black dots, and six green edges which connect them. For example, the five vertices could represent cities (San Francisco, Oakland, Sausalito etc. ) and the edges could be bridges between them.

In the second graph, the role of the cities and the bridges has swapped. Now the bridges are the vertices, and the edges (or hyperedges) are the cities. For example, we can imagine that the cities are large metropolises and the green vertices are the bridge tolls between one city and the next.

Apart from swapping the role of the vertices and the edges, the information in the two graphs is the same. If we shrink each city down to a dot in the second graph, and grow each bridge toll into a full bridge, we get the first graph. We will see that the graphs are dual to each other.

We represent each graph by a labeled matrix: we label the rows by the vertices and the columns by the edges, and we put a $1$ in the matrix whenever the vertex is in the edge. For example, the entry for vertex $1$ and edge $a$ is $1$ , because edge $a$ contains vertex $1$ . The matrix on the left is for the first graph, and the one on the right is for the second graph.

bg4

We can see that the information in the two graphs is the same from looking at the two matrices – they are the same matrix, transposed (or flipped). The matrix of a hypergraph is the transpose of the matrix of the dual hypergraph.

Mathematicians are always on the look-out for hidden dualities between seemingly different objects, and we are happy when we find them. For example, in a recent project we studied the connection between graphical models, from statistics, and tensor networks, from physics. We showed that the two constructions are the duals of each other, using the hypergraph duality we saw in this example.

Flattening a cube

August 15, 2017August 15, 2017anna2 Comments

If you conduct a survey, among some friends, consisting of three YES/NO questions, how can you summarize the responses?

I conducted a survey recently at a conference. The three questions were:

Is it your first time at the Mathematisches Forschungsinstitut Oberwolfach?
Do you like the weather?
Have you played any games?

Screen Shot 2017-08-15 at 11.49.45 AM

There are eight options for how someone could respond to three YES/NO questions. Taking YES=1, and NO=0, the eight options are labelled by the binary strings: 000, 001, 010, 100, 011, 101, 110, 111.

We can think of 0 and 1 as coordinates in space, and arrange the eight numbers into a cube:

cube2

This 3D arrangement reflects the fact that there are three questions in the survey. Since our dataset is small, there’s not much need for further analysis to compress or visualize the data. But for a larger survey, we will summarize the structural information in the data using principal components.

The first step of principal component analysis is to restructure the 3D cube of data into a 2D matrix. This is called “flattening” the cube. We combine two YES/NO questions from the survey into a single question with four possible responses. There are three choices for which questions to combine, so there are three possible ways to flatten the cube into a matrix:

$\begin{bmatrix} p_{000} & p_{001} & p_{010} & p_{011} \\ p_{100} & p_{101} & p_{110} & p_{111} \end{bmatrix} \qquad \begin{bmatrix} p_{000} & p_{001} & p_{100} & p_{101} \\ p_{010} & p_{011} & p_{110} & p_{111} \end{bmatrix} \qquad \begin{bmatrix} p_{000} & p_{010} & p_{100} & p_{110} \\ p_{001} & p_{011} & p_{101} & p_{111} \end{bmatrix}$

Our analysis of the data depends on which flattening we choose! Generally speaking, it’s bad news if an arbitrary decision has an impact on the conclusions of an analysis.

So we need to understand…

How do the principal components depend on the choice of flattening?

This picture give an answer to that question:

dec3

All points inside the star-shaped surface correspond to valid combinations of principal components from the three flattenings, while points outside are the invalid combinations. More details can be found here.

Tea with (Almond) Milk

March 17, 2017March 17, 2017anna1 Comment

Making a cup of tea in a hurry is a challenge. I want the tea to be as drinkable (cold) as possible after a short amount of time. Say, 5 minutes. What should I do: should I add milk to the tea at the beginning of the 5 minutes or at the end?

tea

The rule we will use to work this out is Newton’s Law of Cooling. It says “the rate of heat loss of the tea is proportional to the difference in temperature between the tea and its surroundings”.

This means the temperature of the tea follows the differential equation $T' = -k (T - T_s)$ , where the constant $k$ is a positive constant of proportionality. The minus sign is there because the tea is warmer than the room – so it is losing heat. Solving this differential equation, we get $T = T_s + (A - T_s) e^{-kt}$ , where $A$ is the initial temperature of the tea.

We’ll start by defining some variables, to set the question up mathematically. Most of them we won’t end up needing. Let’s say the tea, straight from the kettle, has temperature $T_0$ . The cold milk has temperature $m$ . We want to mix tea and milk in the ratio $L:l$ . The temperature of the surrounding room is $T_s$ .

Option 1: Add the milk at the start

We begin by immediately mixing the tea with the milk. This leaves us with a mixture whose temperature is $\frac{T_0 L + m l }{L + l}$ . Now we leave the tea to cool. Its cooling follows the equation $T = T_s +\left( \frac{T_0 L + m l }{L + l} - T_s \right) e^{-kt}$ . After five minutes, the temperature is

Option 1 $= T_s +\left( \frac{T_0 L + m l }{L + l}- T_s \right) e^{-5k} .$

Option 2: Add the milk at the end

For this option, we first leave the tea to cool. Its cooling follows the equation $T = T_s + (T_0 - T_s) e^{-kt}$ . After five minutes, it has temperature $T = T_s + (T_0 - T_s) e^{-5k}$ . Then, we add the milk in the specified ratio. The final concoction has temperature

Option 2 $= \frac{(T_s + (T_0 - T_s) e^{-5k}) L + m l }{L + l}.$

So which temperature is lower: the “Option 1” temperature or the “Option 2” temperature?

It turns out that most of the terms in the two expressions cancel out, and the inequality boils down to a comparison of $e^{-5k} (T_s L - ml)$ (from Option 2) with $(T_s L - ml)$ (from Option 1). The answer depends on whether $T_s L - ml > 0$ . For our cup of tea, it will be: there’s more tea than milk ( $L > l$ ) and the milk is colder than the surroundings ( $m < T_s$ ). [What does this quantity represent?] Hence, since $k$ is positive, we have $e^{-5k} < 1$ , and option 2 wins: add the milk at the end.