Study finds teenagers rarely checked what AI told them during math learning

1 Jul

A German classroom study found that Grade 9 students entered an AI-supported math task with learning-focused goals, but monitoring and evaluation were nearly absent and post-test scores fell.

A teacher and student stand beside a screen displaying an AI-supported mathematics platform. The image represents classroom learning, AI tutors, and student evaluation of AI responses. — ***A study of 98 Grade 9 students found that they frequently requested help from an AI math tutor but rarely monitored their understanding or evaluated its responses.***

Grade 9 students using an AI tutor for mathematics frequently asked for information but rarely made their understanding visible, checked whether the response met their goal, or evaluated what the system had told them, according to preliminary research.

The study followed 98 students aged 14 to 15 across three public Gymnasium schools in Baden-Württemberg, Germany. Students used a web-based Mistral Large tutor during a curriculum-aligned mathematical modeling activity designed to prepare them for an upcoming exam.

Researchers Rania Abdelghani, Peter Kaiser, and Kou Murayama of the University of Tübingen analyzed 1,616 chat turns, including 808 written by students, alongside pre- and post-test results, stated learning goals, and measures of cognitive load.

The work-in-progress paper has been accepted for the NextGen Learning Interfaces Workshop at AIED 2026 in Seoul and online. Its current behavioral results were coded using Gemini 2.5 Pro, with human validation by mathematics education and learning sciences specialists still underway.

The researchers are now extending the analysis to examine whether students noticed inaccurate or mismatched AI responses and whether they actively shaped the tutor's role, such as requesting hints instead of answers or asking the system to test them.

Learning intentions did not consistently shape AI use

Before using the tutor, students selected the types of support they wanted from eight available goals.

Most chose learning-focused options. Step-by-step examples were selected by 82.9 percent, tips and strategies by 80.3 percent, practice problems by 71.1 percent, and concept explanations and checks of understanding by 69.7 percent each.

Only 11.8 percent selected an option asking for final solutions, while 36.8 percent said they wanted to finish as quickly as possible.

Their conversations did not consistently reflect those intentions.

Requests accounted for an average of 72.9 percent of students' task-relevant messages. Planning represented 18.1 percent, while messages showing students monitoring their understanding or evaluating the AI's responses accounted for averages of 5.7 percent and 3.4 percent respectively.

Among the 96 students who made at least one request, 75.1 percent of requests were categorized as instrumental, meaning they sought explanations or guidance intended to preserve some responsibility for learning. The remaining 24.9 percent were categorized as executive requests seeking more complete solutions.

Procedural questions made up 37.4 percent of requests, followed by conceptual questions at 25.1 percent, answer-seeking at 21.1 percent, and verification at 16.3 percent.

The relatively low verification rate contrasted with the 69.7 percent of students who had said before the chat that they wanted the AI to check their understanding.

Learning-oriented intentions such as requesting explanations, verification, or step-by-step guidance did not significantly predict the corresponding behavior during the conversations.

The clearest match between stated intentions and behavior appeared among students who selected "just give me the final answers." Those learners recorded higher rates of executive requests and answer-seeking than students who did not choose that goal.

Abdelghani wrote on LinkedIn: "Our results suggest that prompt quality alone is not enough to understand students' ability to use AI in advancing their learning."

She added: "AI-supported learning should be understood as a process, not as a collection of isolated prompts."

Post-test performance fell after the AI-supported activity

Students completed a timed mathematics pre-test before the chat and an AI-free post-test after the activity. Average performance fell from 67.5 percent on the pre-test to 56.9 percent on the post-test. The difference was statistically significant within the study.

The result does not establish that the AI tutor caused the decline. The research involved a single classroom session, did not include a comparison group completing the same activity without AI, and used pre- and post-tests around one mathematical modeling task.

Students also reported their cognitive load after the chat. Once prior mathematics knowledge was taken into account, extraneous cognitive load was the only significant predictor of post-test performance, with higher load linked to lower scores.

The researchers suggest that students may have faced additional demands associated with constructing prompts, managing the interaction, and deciding what to do with the AI's responses.

That finding places attention on more than whether a student asked a strong individual question. A learner may request a useful explanation but still fail to check whether it resolves the original uncertainty, connect it with existing knowledge, or decide whether further clarification is required.

The study's current results do not assess whether individual AI responses were accurate, whether students accepted incorrect information, or how the quality of their behavior changed from the beginning to the end of each conversation.

In her LinkedIn summary, Abdelghani said students performed better when their interactions maintained or shifted toward conceptual and procedural help-seeking and sustained mathematical work, rather than becoming more reliant on verification, answers, or validation.

The paper does not provide the full statistical results for that trajectory analysis, and the researchers describe the wider project as ongoing.

Researchers call for AI tutors to follow learning trajectories

The researchers use the term "epistemic proactivity" to describe a student's ability to identify a learning goal, decide how to seek support from AI, evaluate the response, and determine what further action is needed.

Their coding framework separates planning, monitoring, requesting help, and evaluating responses. It also distinguishes between asking for support that preserves the learner's role and seeking completed answers that bypass part of the reasoning process.

The next phase will add two AI-specific measures.

"Epistemic vigilance" will assess whether students notice and respond when the tutor provides an inaccurate or mismatched answer. "Agency over the AI" will capture attempts to control its role, response format, or level of support.

Examples include asking for a hint rather than a solution, requesting practice questions, or telling the AI to act as a tutor rather than an answer generator.

The researchers argue that teachers and product developers may need to look beyond isolated prompts and monitor how a student's behavior develops across an interaction.

An AI tutor could, for example, intervene when a learner repeatedly asks for answers, fails to check previous responses, or moves away from mathematical reasoning. It could also prompt students to explain their current understanding, evaluate an answer, or decide what they need next.

Any product recommendations remain preliminary. The current behavioral findings rely on AI-generated coding, and the human validation process has not yet been completed.

The study also covered one mathematics lesson involving students from three schools in a single German state. It does not show how the same learners would use AI across a full course, how behavior differs by subject, or whether trajectory-based support improves learning.

The researchers will next validate the coding framework and complete the analysis of epistemic vigilance and student agency.

ETIH Innovation Awards 2026