How do students debug Python programs? - Raspberry Pi Computing Education Research Centre

If you were to describe your approach to debugging in one word, what would it be? What about when you were learning to program? In this blog, I’ll report on a research study on how school students debug Python programs and what this tells us about their struggles with fixing errors. This is the first in a mini-series of blogs about my recently completed PhD, which investigated the teaching and learning of debugging in secondary schools.

Why research debugging behaviour?

Using good strategies and behaviours when problem-solving is key, and debugging is no exception. The techniques we use to debug our programs can make the difference between fixing an error and getting stuck, or enjoying programming and feeling frustrated. The more we understand students’ approaches to debugging, the more insight we have into the struggles they might have in the classroom, and the more targeted teaching approaches for debugging can be.

Currently, there are very few studies into school students’ debugging behaviour in text-based programming languages, even though students around the world are learning to program in these languages. So, back in 2023, I decided to investigate this in the first study of my PhD.

What we did

We gave several classes of 12-14-year-old students five Python programs, each of which contained some errors that beginners typically make (e.g., incorrect indentation and mixing up logical operators). Students then had to try to get these programs working by themselves in one of their computing lessons, using the Ada computer science code editor. Here’s one of the exercises below – see if you can debug it (or scroll to the end of the blog if you want to know the errors). We’ll be using this as an example later on.

This program checks if someone should apply to be a computing teacher using the steps below:

Input the user’s age.
Input the user’s response to the question “Do you have a passion for teaching computing? Enter ’yes’ or ’no’:”
If the user is 21 or over and does have a passion for teaching computing, the check should be a success. Otherwise, the check should be unsuccessful.
Print the result of the check.

This program has 4 errors – have a go at fixing them all.

A cool feature of the code editor was that it logged a snapshot of a program every time a student pressed the run button. This meant we could replay students’ debugging sessions post-hoc using the snapshot replayer tool I developed. In total, we collected over 7,000 runs from 73 students who attempted the debugging exercises. I spent many hours replaying these attempts to understand what sort of changes students were making to try to resolve the errors.

What we found

Students made a huge variety of changes to the programs (you can view the full categorisation of them in the preprint of the academic paper about this study). Before I go through some of the common behaviours, here are some summary statistics:

44% of the exercise attempts were successfully completed, while 48% ended with logical errors and 23% with syntax errors.
90% of students resolved at least one error at some point in the exercises, compared to 95% students who added at least one error.
Many students repeatedly ran the same program in quick succession, which accounted for 67% of all runs. The median time between these runs was only 0.38 seconds.

To give you a feel for the common behaviours, you can replay some students’ attempts to the debugging exercises using the snapshot replayer below. As you’re replaying, think about what could be preventing the student from successfully debugging, and what feedback you’d give on their approach.

Early changes

Many students started off by running the program and making changes soon after beginning the exercise. This is not always a bad approach; students may have run the program initially, used the error message to fix simple syntax errors, and then comprehended the program in detail to resolve the more complex errors. However, this was often not the case. Take a look at Alice’s (a fictional pseudonym) behaviour below – just use the arrows to go through the snapshots of the program.

Notice how Alice runs the program after 5 seconds into the exercise and makes their first change (which adds a syntax error) after 12 seconds. This is not enough time to understand what the program should do, let alone what it actually does. Most of Alice’s other changes were fairly rapid-fire too, meaning they ended with more errors in the program than they began with.

Since students were debugging a program they hadn’t seen before, reading the program description and comprehending the program were important steps to spend time on. Skipping these steps prevents students from viewing the program as a functional whole, making it difficult to consistently fix errors.

Tinkering

Another common approach to debugging resembled the practice of tinkering or trial-and-error debugging; the quick cycle of making small changes, running the program, and reverting changes if they are not correct. Bart’s attempt is a prime example of what tinkering looks like – flick through their runs and keep an eye on the time between each of them in the top right corner of the snapshot replayer.

Notice the enormous number of changes that Bart makes to the program. They add brackets, capitalise strings, and switch around inequalities, yet none of them resolves the original syntax error on line 6. Some of these changes are immediately reverted, while some linger in the program for periods of time. At the end of the exercise, Bart’s program is no closer to working.

Most of Bart’s changes were made to the line mentioned in the error message. This was a common theme, particularly among the first changes that students made. 80% of students’ first changes involved edits to the line mentioned in the error message. However, behaviours like Bart’s make it unclear whether students had truly isolated an error, whether they lacked the programming knowledge to make a correct fix, or whether they were simply guessing.

Either way, Bart’s changes were made quickly, with intermittent periods of “spamming the run button” between runs 9-18 and 31-36. When students debug with such a lack of systemacity and reflection, they are again unlikely to reliably find and fix errors in their programs.

Intentional changes

Not all students struggled with the debugging exercises – 90% were able to resolve at least one error at some point in the study. When fixing errors, we found that students used a much smaller range of behaviours. Students who fixed all the errors in a program tended to make less frequent but more targeted changes, rather than resorting to ineffective debugging behaviours, as shown by Carol’s behaviour.

It still takes Carol two and a half minutes to resolve the syntax error in line 6. However, Carol does not resort to tinkering. Instead, they resolve other logical errors on line 6 in their second and third run, each of which takes notably longer than most of Alice and Bart’s changes. Carol ends the program in a correct state, and probably uses runs 7-9 to test their program.

Some barriers to successfully debugging

Some of these behaviours were probably influenced by the setup of our study. Students were debugging Python programs they’d never seen with a code editor they’d never used before, which is different to debugging their own programs in the classroom. However, many other studies of debugging behaviour in different settings have found a similar tendency to unreflectively debug.

So, what are some barriers preventing students from debugging more successfully and reliably, and how can these be alleviated? From our results, we think there are a few main blockades:

Students may have fragile knowledge (“knowledge that is partial, hard to access, and often misused”) of the programming constructs they are debugging. This may prevent students from fixing errors even if they’ve found them.
Many students lack a systematic approach to debugging. Rapid-fire and small-scale changes can work, but are unlikely to consistently help.
The syntax of text-based programming languages may fixate students’ attention towards a few tokens of a program and may prevent students from making correct fixes.
Students may experience negative emotions or have attitudes towards debugging that impede their ability to effectively debug.

Some implications for teaching debugging

If you’re a teacher who’s witnessed similar behaviours in your class, or just want to improve your students’ debugging ability, here are some approaches you could try out in your classroom:

Explicitly teach a systematic approach to debugging. This can be done by modelling the debugging process or teaching lower-level debugging “tactics”, such as effective print-debugging.
- If you’re looking for some resources to help, take a look at this paper on systematic debugging from some German researchers (Figure 4 has a nice classroom poster) or PRIMMDebug, an approach I developed in my PhD.
Discourage the use of tinkering and other ineffective behaviours when debugging. You could do this by modelling the consequences of such behaviours.
Your students could use a programming environment that reduces the scope of syntax errors, such as a frame-based environment. If you teach Python, take a look at the Strype environment.

Perhaps your students use different strategies from what I’ve reported. Regardless, understanding students’ debugging strategies is key to improving them. As more research into this topic takes place, we’ll be able to better support students to debug with confidence and resilience.

Find out more

We’ve only looked at debugging from the student perspective in this blog post. In the next few weeks, I’ll be sharing another post on teachers’ experiences with debugging in the classroom.

If you want to find out more about this research, you can view the following links:

A research seminar where I demonstrate some more students’ debugging behaviours.
A preprint of the academic paper about this study.
The study repository, which contains all the materials we used to run the study.

(And if you wanted to check the errors for the debugging exercise above, here they are)

Erroneous line	Type of error	Description	Example fix
Line 6	Syntax	The assignment operator (=) is used in a syntactically invalid position.	Replace the assignment operator (=) with the equality operator (==) on line 6.
Line 6	Logical	Incorrect inequality operator (>) causing the program to output the incorrect statement when age = 21.	Replace the greater than operator (>) on line 6 with a greater than or equal operator (>=).
Line 6	Logical	Incorrect logical operator (or) causes the program to output the incorrect error message when only one of the conditions in the if statement is true.	Replace the or operator on line 6 with an and operator.
Line 10	Logical	The line is indented, meaning the statement is only printed when instead of always being printed.	Unindent line 10.