AI matches teacher judgments in new No More Marking writing assessment trial

AIassessment

29 Oct

Research from EdTech company No More Marking finds that AI judges now align with human markers in over 80% of writing assessments across UK schools.

No More Marking, an education technology company providing online comparative judgement assessments for schools, has published results from its latest AI-enhanced assessment project.

The study, shared by Director of Education Daisy Christodoulou in a LinkedIn post, explored how large language models can be integrated into writing evaluation.

The company’s platform uses comparative judgement, an approach where assessors compare two pieces of writing to decide which is stronger. According to Christodoulou, this method produces more accurate and consistent results than traditional marking. “AI is uncannily good at judging student writing,” she wrote.

High agreement between AI and human judgments

The latest assessment involved around 70,000 writing samples from Year 7, 8, and 9 students across 177 UK secondary schools. In total, human judges made 133,983 comparative decisions. The AI agreed with 83% of them, a rate Christodoulou said mirrors the average agreement between human judges across similar projects.

Of the 22,913 instances where AI and human decisions differed, 90% of the discrepancies were within 45 points on No More Marking’s fine-grained 300–700 scale. Only 1.4% of all disagreements, 324 cases in total, were over 80 points apart, representing just 0.24% of total judgments.

“Some element of disagreement is always going to exist with assessments of extended writing, whoever is judging it. This is a very low rate of serious disagreement, and one that we think is acceptable,” Christodoulou said.

The company’s analysis found that larger discrepancies were often due to human error rather than AI misjudgment. “The piece on the right is hard to read, but the AI was able to make an accurate typed transcription of it which reveals it is a good piece of writing. We feel this is an unambiguous example of a human error,” Christodoulou explained.

She noted that nearly 200,000 pieces of writing have now been evaluated using this combined method, with “all of the really big disagreements” traced to human, not AI, error.

Scaling up AI-assisted marking

Following a series of successful trials last academic year, No More Marking has now integrated AI judges into all of its national projects. Schools can choose their preferred balance between human and AI evaluation, with the company recommending a 90% AI and 10% human ratio—reducing teacher judging time by 90%.

The system has also been extended to allow individual schools to run custom assessments using AI judges under their own criteria. “We’ve found a few smaller disagreements where we think the AI has erred, but we also think these can be fixed with some tweaks to the judging prompt,” Christodoulou wrote.

She said further findings on predictive validity and accuracy will be published in future updates.

No More Marking plans to continue publishing detailed analysis of AI-assisted assessment through its Substack newsletter. The company is also running introductory webinars to help educators understand how Comparative Judgement and AI can be combined to streamline writing assessment at scale.

Christodoulou concluded: “We’ve assessed nearly 200,000 pieces of writing using this method and all of the really big disagreements are the result of human, not AI, error.”