I always get conflicted about reading an isolated study. I know I’m going to read it poorly. There will be lots of terms I don’t know; I won’t get the context of the results. I’m assured of misreading.
On the other side of the ledger, though, is curiosity, and the fun that comes from trying to puzzle these sort of things out. (The other carrot is insight. You never know when insight will hit.)
So, when I saw Heidi talk about this piece on twitter, I thought it would be fun to give it a closer read. It’s mathematically interesting, and much of it is obscure to me. Turns out that the piece is openly available, so you can play along at home. So, let’s take a closer look.
I.
The stakes of this study are both high and crushingly low. Back in 2014 when this was published, the paper caught some press that picked up on its ‘Math Wars’ angle. For example, you have NPR‘s summary of the research:
Math teachers will often try to get creative with their lesson plans if their students are struggling to grasp concepts. But in “Which Instructional Practices Most Help First-Grade Students With and Without Mathematics Difficulties?” the researchers found that plain, old-fashioned practice and drills — directed by the teacher — were far more effective than “creative” methods such as music, math toys and student-directed learning.
Pushes all your teachery buttons, right?
But if the stakes seem high, the paper is also easy to disbelieve, if you don’t like the results.
Evidence about teaching comes in a lot of different forms. Sometimes, it comes from an experiment; y’all (randomly chosen people) try doing this, everyone else do that, and we see what happens. Other times we skip the ‘random’ part and find reasonable groups to compare (a ‘quasi-experiment‘). Still other times we don’t try for statistically valid comparisons between groups, and instead a team of researchers will look very, very closely at teaching in a methodologically rich and cautious way.
And sometimes we take a big pile of data and poke at it with a stick. That’s what the authors of this study set out to do.
I don’t mean to be dismissive of the paper. I’m writing about it because I think it’s worth writing about. But I also know that lots of us in education use research as a bludgeon. This leads to educators reading research with two questions in mind: (a) Can I bludgeon someone with this research? (b) How can I avoid getting bludgeoned by this research?
That’s why I’m taking pains to lower the stakes. This paper isn’t a crisis or a boon for anyone. It’s just the story of how a bunch of people analyzed a bunch of interesting data.
Freed of the responsibility of figuring out if this study threatens us or not, let’s muck around and see what we find.
II.
The researchers lead off with a nifty bit of statistical work called factor analysis. It’s an analytical move that, as I read more about, I find both supremely cool and metaphysically questionable.
You might have heard of socioeconomic status. Socioeconomic status is supposed to explain a lot about the world we live in. But what is socioeconomic status?
You can’t directly measure someone’s socioeconomic status. It’s a latent variable, one responsible for a myriad other observable variables, such as parental income, occupational prestige, the number of books you lying around your parents’ house, and so on.
None of these observables, on their own, can explain much of the variance in student academic performance. If your parents have a lot of books at home, that’s just it: your parents have a lot of books. That doesn’t make you a measurably better student.
Here’s the way factor analysis works, in short. You get a long list of responses to a number of questions, or a long list of measurements. I don’t know, maybe there are 100 variables you’re looking at. And you wonder (or program a computer to wonder) whether these can be explained by some smaller set of latent variables. You see if some of your 100 variables tend to vary as a group, e.g. when income goes up by a bit, does educational attainment tend to rise too? You do this for all your variables, and hopefully you’re able to identify just a few latent variables that stand behind your big list. This makes the rest of your analysis a lot easier; much better to compare 3 variables than 100.
That’s what we do for socioeconomic status. That’s also what the authors of this paper do for instructional techniques teachers use with First Graders..
I’m new to all this, so please let me know if I’m messing any of this up, but it sure seems to me tough to figure out what exactly these latent variables are. One possibility is that all the little things that vary together — the parental income, the educational attainment, etc. — all contribute to academic outcomes, but just a little bit. Any one of them would be statistically irrelevant, but together, they have oomph.
This would be fine, I guess, but then why bother grouping them into some other latent variable? Wouldn’t we be better off saying that a bunch of little things can add up to something significant?
The other possibility is that socioeconomic status is some real, other thing, and all those other measurable variables are just pointing to this big, actual cause of academic success. What this ‘other thing’ actually is, though, remains up in the air.
(In searching for other people who worried about this, I came across a piece from History and Philosophy of Psychology Bulletin called ‘Four Queries About Factor Reality.’ Leading line: ‘When I first learned about factor analysis, there were four methodological questions that troubled me. They still do.’)
So, that’s the first piece of statistical wizardry in this paper. Keep reading: there’s more!
III.
Back to First Graders. The authors of this paper didn’t collect this data; the Department of Education, through the National Center for Education Statistics, ran the survey.
The NCES study was immense. It’s longitudinal, so we’re following the same group of students over many years. I don’t really know the details, but they’re aiming for a nationally representative sample of participants in the study. We’re talking over ten-thousand students; their parents; thousands of teachers; they measured kids’ height, for crying out loud. It’s an awe-inspiring dataset, or at least it seems that way to me.
As part of the survey, they ask First Grade teachers to answer questions about their math teaching. First, 19 instructional activities…

…and then, 29 mathematical skills.

Now, we can start seeing the outlines of a research plan. Teachers tell you how they teach; we have info about how well these kids performed in math in Kindergarten and in First Grade; let’s find out how the teaching impacts the learning.
Sounds, good, except HOLY COW look at all these variables. 19 instructional techniques and 29 skills. That’s a lot of items.
I think you know what’s coming next…

FACTOR ANALYSIS, BABY!
So we do this factor analysis (beep bop boop boop) and it turns out that, yes, indeed some of the variables vary together, suggesting that there are some latent, unmeasured factors that we can study instead of all 48 of these items.
Some good news: the instructional techniques only got grouped with other instructional techniques, and skills got groups with skills. (It would be a bit weird if teachers who teach math through music focused more on place value, or something.)
I’m more interested in the instructional factors, so I’ll focus on the way these 19 instructional techniques got analytically grouped:

The factor loadings, as far as I understand, can be interpreted as correlation coefficients, i.e. higher means a tighter fit with the latent variable. (I don’t yet understand Cronbach’s Alpha or what it signifies. For me, that’ll have to wait.)
Some of these loadings seem pretty impressive. If a teacher says they frequently give worksheets, yeah, it sure seems like they also talk about frequently running routine drills. Ditto with ‘movement to learn math’ and ‘music to learn math.’
But here’s something I find interesting about all this. The factor analysis tells you what responses to this survey tended to vary together, and it helps you identify four groups of covarying instructional techniques. But — and this is the part I find so important — the RESEARCHERS DECIDE WHAT TO CALL THEM.
The first group of instructional techniques all focus on practicing solving problems: students practice on worksheets, or from textbooks, or drill, or do math on a chalkboard. The researchers name this latent variable ‘teacher-directed instruction.’
The second group of covarying techniques are: mixed ability group work, work on a problem with several solutions, solving a real life math problem, explaining stuff, and running peer tutoring activities. The researchers name this latent variable ‘student-centered instruction.’
I want to ask the same questions that I asked about socioeconomic status above. What is student-centered instruction? Is it just a little bit of group work, a little bit of real life math and peer tutoring, all mushed up and bundled together for convenience’s sake? Or is it some other thing, some style of instruction that these measurable variables are pointing us towards?
The researchers take pains to argue that it’s the latter. Student-centered activities, they say, ‘provide students with opportunities to be actively involved in the process of generating mathematical knowledge.’ That’s what they’re identifying with all these measurable things.
I’m unconvinced, though. We’re supposed to believe that these six techniques, though they vary together, are really a coherent style of teaching, in disguise. But there seems to me a gap between the techniques that teachers reported on and the style of teaching they describe as ‘student-centered.’ How do we know that these markers are indicators of that style?
Which leads me to think that they’re just six techniques that teachers often happen to use together. They go together, but I’m not sure the techniques stand for much more than what they are.
Eventually — I promise, we’re getting there — the researchers are going to find that teachers who emphasize the first set of activities help their weakest students more than teachers emphasizing the second set. And, eventually, NPR is going to pick up this study and run with it.
If the researchers decide to call the first group ‘individual math practice’ and the second ‘group work and problem solving’ then the headline news is “WEAKEST STUDENTS BENEFIT FROM INDIVIDUAL PRACTICE.” Instead, the researchers went for ‘teacher-directed’ and ‘student-centered’ and the headlines were “TEACHERS CODDLING CHILDREN; RUINING FUTURE.”
I’m not saying it’s the wrong choice. I’m saying it’s a choice.
IV.
Let’s skip to the end. Teacher-directed activities helped the weakest math students (MD = math difficulties) more than student-centered activities.

The researchers note that the effect sizes are small. Actually, they seem a bit embarrassed by this and argue that their results are conservative, and the real gains of teacher-directed instruction might be higher. Whatever. (Freddie deBoer reminds us that effect sizes in education tend to be modest, anyway. We can do less than we think we can.)
Also ineffective for learning to solve math problems: movement and music, calculating the answers instead of figuring them out, and ‘manipulatives.’ (The researchers call all of these ‘student-centered.’)
There’s one bit of cheating in the discussion, I think. The researchers found another interesting thing from the teacher survey data. When a teacher has a lot of students with math difficulty in a class, they are more likely to do activities involving calculators and with movement/music then they otherwise might be:

You might recall that these activities aren’t particularly effective math practice, and so they don’t lead to kids getting much better at solving problems.
By the time you get to the discussion of the results, though, here’s what they’re calling this: “the increasing reliance on non-teacher-directed instruction by first grade teachers when their classes include higher percentages of students with MD.”
Naming, man.
This got picked up by headlines, but I think the thing to check out is that the ‘student-directed’ category did not correlate with percentage of struggling math students in a class. That doesn’t sound to me like non-teacher-directed techniques get relied on when teachers have more weak math students in their classes.
The headline news for this study was “TEACHERS RELY ON INEFFECTIVE METHODS WHEN THE GOING GETS ROUGH.” But the headline probably should have been “KIDS DON’T LEARN TO ADD FROM USING CALCULATORS OR SINGING.”
V.
Otherwise, though, I believe the results of this study pretty unambiguously.
Some people on Twitter worried about using a test with young children, but that doesn’t bother me so much. There are a lot of things that a well-designed test can’t measure that I care about, but it certainly measures some of the things I care about.
Big studies like this are not going to be subtle. You’re not going to get a picture into the most effective classrooms for struggling students. You’re not going to get details about what, precisely, it is that is ineffective about ineffective teaching. We’re not going to get nuance.
Then again, it’s not like education is a particularly nuanced place. There are plenty of people out there who take the stage to provide ridiculously simple slogans, and I think it’s helpful to take the slogans at their word.
Meaning: to the extent that your slogan is ‘fewer worksheets, more group work!’, that slogan is not supported by this evidence. Ditto with ‘less drill, more real life math!’
(I don’t have links to people providing these slogans, but that’s partly because scrolling through conference hashtags gives me indigestion.)
And, look, is it really so shocking that students with math difficulties benefit from classes that include proportionally more individual math practice?
No, or at least based on my experience it shouldn’t be. But the thing that the headlines get wrong is that this sort of teaching is anything simple. It’s hard to find the right sort of practice for students. It’s also hard to find classroom structures that give strong and struggling students valuable practice to work on at the same time. It’s hard to vary practice formats, hard to keep it interesting. Hard to make sure kids are making progress during practice. All of this is craft.
My takeaway from this study is that struggling students need more time to practice their skills. If you had to blindly choose a classroom that emphasized practice or real-life math for such a student, you might want to choose practice.
But I know from classroom teaching that there’s nothing simple about helping kids practice. It takes creativity, listening, and a lot of careful planning. Once we get past some of the idealistic sloganeering, I’m pretty sure most of us know this. So let’s talk about that: the ways we help kids practice their skills in ways that keep everybody in the room thinking, engaged, and that don’t make children feel stupid or that math hates them.
But as long as we trash-talk teacher-directed work and practice, I think we’ll need pieces like this as a correction.