Google Gemini Guided Learning raises math scores in Sierra Leone classroom trial

The randomized study involved 1,763 students across 12 schools, but found that those who entered with stronger math skills recorded the largest gains.

Three attendees standing in front of a Global AI for Learning Alliance presentation screen

Google DeepMind is building a wider portfolio of international trials examining how AI can be integrated into teacher-led classrooms

Google DeepMind and Fab AI have released results from a randomized controlled trial in Sierra Leone that found students assigned to use Gemini Guided Learning during teacher-led math lessons recorded higher assessment scores than students receiving standard instruction.

The preregistered trial involved 1,763 Grade 7 and Grade 8 students across 48 classrooms in 12 government-supported junior secondary schools in Port Loko District. Students were aged 13 or older, with 24 classrooms assigned to the intervention and 24 continuing with their usual math instruction.

The intervention ran from October 6 to December 5, 2025, with staggered school start dates. Teachers in participating classrooms were asked to incorporate Gemini Guided Learning into two of their four weekly math periods, amounting to a target of 12 hours over eight weeks.

The results, released in June 2026, showed an intent-to-treat effect of 0.258 standard deviations on math scores. The researchers estimated that this was broadly equivalent to moving a student from the 50th to the 60th percentile, or approximately 1.2 to 1.7 years of typical learning progress in low- and middle-income countries.

The Sierra Leone study is the first in a planned international series of preregistered trials examining how Guided Learning affects teaching and learning in different education systems. Google DeepMind and Fab AI have also released the teacher training materials and a rapid RCT playbook covering the design, implementation and analysis of the trial.

Students worked with Gemini in teacher-led lessons

All participating teachers completed the same five- to six-hour training session before classrooms were randomly allocated to the intervention or control groups.

The training covered tablet use, generative AI, the Gemini app, lesson preparation and strategies for managing Guided Learning in the classroom. Teachers were shown how to establish lesson objectives, create starter prompts and prepare question stems that students could use when they became stuck.

Lessons followed a four-part structure. Teachers introduced the learning objective, students worked with Gemini in pairs, the class discussed what had been learned and the teacher closed the session by summarizing the main points.

Students shared tablets or desktop computers at a ratio of approximately two students to one device. One student acted as the "driver" responsible for typing, while the other served as the "navigator", taking notes and helping decide what questions to ask. The roles were intended to switch between lessons.

Teachers remained responsible for planning lessons, setting objectives, directing classroom discussions and supporting students who needed additional help. Local field monitors prepared and distributed devices, recorded attendance and responded to technical problems, but were generally instructed to remain outside classrooms during teaching.

Irina Jurenka, Research Director at Google DeepMind, said in a LinkedIn post that the intervention was designed around teachers rather than the independent use of an AI system: "This wasn't just about dropping the AI into the classrooms, it was a sociotechnical intervention. Teachers were supported to stay in the lead and use AI as a targeted tool that complements traditional instruction."

Jurenka added that teachers were encouraged to use time released by Gemini to provide more targeted help and strengthen interactions with students who required additional support.

Conrad Sackey, Sierra Leone’s Minister of Basic and Senior Secondary Education, says:

"We look to be innovative and improve service delivery, but we must also rigorously study the results of our innovations…I am therefore delighted that we now have strong evidence that carefully designed AI can help improve learning outcomes in support of our many hard-working teachers."

The trial was authorized by the Sierra Leone Ministry of Basic and Senior Secondary Education and received approval from the Sierra Leone Ethics and Scientific Review Committee. Parental consent, teacher consent and student assent were collected before participation.

Oxford MeasurEd independently developed and administered curriculum-aligned baseline and endline assessments. It used item response theory to place scores from the two assessments on a shared scale and carried out the scoring without access to students’ treatment assignments.

Laterite managed data collection, while EducAid led local implementation. Fab AI and Google DeepMind jointly designed the research and impact evaluation.

Higher use was associated with larger gains

Of the 871 students assigned to Guided Learning classrooms, 69% completed at least the requested 12 hours of use. Treatment classrooms averaged approximately 15 hours, 25% above the level specified by the research team.

Students who reached the 12-hour threshold recorded an estimated treatment-on-the-treated effect of 0.380 standard deviations. The researchers also estimated that each additional hour of participation produced a 0.016 standard deviation improvement.

These dosage estimates use students’ randomized classroom assignment as an instrument to account for the fact that attendance, motivation and other characteristics may influence how much exposure an individual receives.

The primary intent-to-treat result includes every student assigned to the intervention, regardless of how many sessions they attended. The 0.258 standard deviation effect remained statistically significant after adjustment for baseline math and reading scores, age and gender.

Students and Gemini exchanged 113,344 messages across 7,421 conversations during the trial. Researchers reported that 97.4% of messages remained on topic.

An analysis of the de-identified transcripts classified 91.4% of conversations as focused on developing mathematical understanding or skills. Five percent were categorized as primarily seeking a solution.

Gemini posed scaffolding questions in 76.4% of its messages and supplied direct solutions in 2.1%. The classifications were produced using Gemini 3.1 Flash-Lite after personally identifiable information had been removed from the transcripts.

Student behavior also changed during the intervention. Skill-seeking conversations accounted for 67.7% of interactions during the first week and exceeded 90% during most later weeks. Solution-seeking fell from 25.1% in the opening week to substantially lower levels during the remainder of the trial.

The transcript analysis could not be connected to individual assessment outcomes because students shared devices and different pairs sometimes used the same hardware.

Teachers participating in post-trial focus groups described strong student interest in the Guided Learning lessons. One teacher told researchers: "But the introduction of AI—I mean, let me confess, I’ve seen children rushing to attend classes."

Teachers also reported using Gemini to prepare lessons and find alternative ways to explain familiar topics, including fractions, ratios and place value. Some said the technology reduced preparation time, while one reported that reviewing the different options produced by Gemini could take longer than using previous approaches.

The focus groups also identified problems with traditional literacy, digital literacy, typing and access to devices. Teachers asked for more time and wider access to help students use Gemini effectively.

Stronger students recorded the largest benefits

The study found that students with higher baseline math scores benefited more from the intervention.

For every additional standard deviation in starting math proficiency, the treatment effect increased by an estimated 0.195 standard deviations. The research team said this pattern raised the risk that educational technology could widen existing attainment gaps rather than reduce them.

Google DeepMind and Fab AI said improving model accuracy, explanations and personalization would not automatically produce the strongest gains among students who began behind. They identified motivation, reading literacy, digital literacy, activity design and the relational role of teachers as areas requiring further investigation.

The subgroup analysis did not identify statistically significant differences in the treatment effect by baseline reading proficiency, gender or age. However, the study was primarily powered to detect the overall effect, and the researchers described subgroup findings as exploratory rather than definitive.

The schools were not selected as a nationally representative sample of Sierra Leone. Participating institutions had previous involvement with EducAid, access to stable electricity and enough classrooms to support within-school randomization.

The researchers said the findings therefore apply most directly to schools with comparable infrastructure and institutional readiness. The trial also tested a combined intervention involving Gemini, teacher training, paired student work, prepared lesson structures and technical support. It was not designed to isolate the effect of Guided Learning from each of those other components.

The technical report is an initial publication from a wider series of trials rather than a peer-reviewed cross-country evaluation. The research team said it would delay broader conclusions until results are available from additional settings.

Jurenka said the research program would continue testing the approach in local contexts while making its processes available to other researchers and education organizations.

"The Sierra Leone RCT is the first in a global portfolio we are building to understand how AI can be effectively integrated into classrooms around the world. Our goal is to enable high quality evidence-based decision making in local contexts."

The accompanying teacher training package states that its effectiveness has only been tested within the specific Sierra Leone intervention and should not be treated as a general-purpose training curriculum without local adaptation.

Future trials will examine whether Guided Learning produces similar effects in other countries and whether the approach can better support students who begin with lower attainment. Google DeepMind and Fab AI also plan to investigate metacognition, relational intelligence and the links between specific tutoring behaviors and individual learning outcomes.

Previous
Previous

£132.5m plan to widen access to STEM clubs and enrichment across England

Next
Next

Microsoft CEO Satya Nadella says companies must own the AI learning loops shaping their future