Tag: Artificial Intelligence

Paucity of high-quality causal studies supporting AI use in K-12

A paper released in the spring from Stanford’s SCALE Initiative looked at the research evidence backing AI in grades K-12. Out of 1100 papers in Stanford’s AI Hub Research Repository, only 20 were found to be high-quality studies evaluating the impact of AI on student achievement or on educator performance. Patterns emerging from these 20 papers included that AI was most effective for students during math practice, writing tasks, and programming; that AI was most effective for students when providing hints after wrong answers versus simply providing the correct answer; and that AI was most helpful to teachers in terms of helping with lesson preparation and providing insights regarding student performance.

It is important to note the gaps in the research: mainly that there are “no high-quality causal studies of student AI use conducted in U.S. K-12 classrooms,” that most studies look at short-term versus long-term effects of AI, and that there is little research on AI and student equity, wellness, and social development.

As AI tools rapidly enter classrooms, the limited causal evidence base raises important questions. While early findings suggest some promising uses, stronger research is needed to determine what works, for whom, and under what conditions in order to ensure that AI meaningfully improves student learning.

Leveraging artificial intelligence to predict young learners’ online learning engagement

By Zia Hassan, Center for Research and Reform in Education, Johns Hopkins University

With many schools rushing to adopt Generative AI, it is important to consider the real learning gains (or lack thereof) that these tools offer. A 2023 study by Pardos & Bhandari examined the use of AI-generated hints as a scaffolding mechanism with Algebra students.

Seventy-seven participants (high school graduates selected via Amazon’s MTURK system) were assigned to a control group (which provided human-generated hints) or an experimental group (which provided AI-generated hints). The researchers wanted to learn the rate of “low quality” AI-generated hints, as well as if the hints produced learning gains compared to the control group. The questions from the lesson were fed, verbatim, to ChatGPT in order to generate the hints. Quality checks were performed manually to ensure that all AI-generated hints were correct and showed the proper steps. This was then contrasted with the control group, whose hints were generated by undergraduate tutors. Pre and post tests were administered to check for learning gains between the two groups.

The results showed that 70% of the hints generated by ChatGPT were considered to be good quality, and that there was a statistically significant learning gain in the control group. A major limitation of the study is that the researchers did not prompt the AI to use any scaffolding strategies. Therefore, the quality of the hints between groups not only differed by human or AI creator, but also by pedagogical theory. Human tutors were probably more likely to employ Vygotsky-esque scaffolds, while ChatGPT was more likely to provide an immediate answer. Future work could improve upon the prompts used in this study and create a multi-tiered approach with less consequential hints being revealed at first.

Can AI reduce teacher workload? Early evidence from a UK trial with ChatGPT

By Carmen Pannone, University of Cagliari, Italy

Generative Artificial Intelligence (GenAI) tools like ChatGPT are becoming increasingly common in classrooms—not just for students, but also for teachers. In England, the Department for Education has acknowledged that educators are using GenAI more often to plan lessons, create teaching materials, and even write exam questions. A major reported advantage is the potential to save time, which is especially relevant as workload remains a key factor behind teacher attrition.

To explore whether AI can help reduce this burden, the National Foundation for Educational Research recently conducted a rigorous trial. The study involved 68 secondary schools and 259 science teachers, who were randomly assigned to prepare Year 7 and 8 science lessons either with or without ChatGPT. Teachers in the ChatGPT group were given a practical guide to support their use of the tool. Over a 10-week period in the summer term of 2024, they logged how much time they spent preparing lessons, with a particular focus on weeks 6 to 10—after an initial adaptation phase.

The findings were encouraging. On average, teachers using ChatGPT spent 25 minutes less per week on lesson preparation than those in the non-AI group—56 minutes versus 81.5—representing a 31% time saving. Importantly, an independent expert panel found no difference in the quality of lesson materials between the two groups.

Use of the support guide also declined over time, suggesting that teachers grew more confident in integrating the tool into their practice. Looking ahead, future research could explore how GenAI tools like ChatGPT are used for other aspects of teachers’ work—such as administrative duties—and whether their impact differs across subjects or age groups, especially as new and more advanced versions continue to roll out.

Chatbot customization matters for skill retention

By Zia Hassan, School of Education, Johns Hopkins University

Schools across the nation are responding to emerging AI tools. Proponents of integrating generative AI (GenAI) tools like ChatGPT into the classroom argue that GenAI software can fine-tune its instruction to personalize learning, meeting each student where they are and building skills at the pace necessary for progress. Enthusiasm around AI’s potential for personalizing learning has led schools in Texas and Florida to implement a program where each student must work with a GenAI tutor for two hours daily.

Researchers in Turkey conducted a randomized controlled trial with approximately 1,000 participants to evaluate how effectively GenAI supports the development of independent skills. After a shared math lesson, students practiced skills in one of three conditions: using a standard ChatGPT interface that was capable of providing answers (GPT base), using a pre-trained GPT tutor that was programmed to scaffold skills (GPT tutor), or relying solely on class notes.

The results showed that AI tools dramatically improved performance during practice sessions and that students performed the best with the AI Tutor, which scaffolded skills without giving the correct answer (127% increase in skills for the GPT Tutor group vs. 47% increase for the GPT base group when compared to the control). However, when researchers gave the assessment and technology was removed, the GPT base group fared far worse than the control, receiving a reduction of 17%, while the GPT tutor group’s performance on the assessment was about the same as the control.

These results indicate that if schools are to integrate generative AI tools into curricula, careful thought must be given to customizing general-purpose bots such as ChatGPT. If schools become too permissive with access to GenAI tools at critical learning moments, they could unintentionally cause a dramatic drop in transferrable skills.