Researchers at Stanford University introduce cost-effective and efficient way to evaluate AI language models

AI

22 Jul

Written By Rachel Lawler

A new paper from Stanford researchers, published at the International Conference on Machine Learning, has found a cost-effective and efficient way to evaluate AI language models.

As increasing numbers of new versions of AI language models are launched each year, it can be difficult - and costly - to demonstrate the benefits of a new model.

“This evaluation process can often cost as much or more than the training itself,” explains co-author Sang Truong, a doctoral candidate at the Stanford Artificial Intelligence Lab (SAIL).

“We’ve built an infrastructure that allows us to adaptively select subsets of questions based on difficulty. It levels the playing field.”

“The key observation we make is that you must also account for how hard the questions are,” adds Sanmi Koyejo, an Assistant Professor of Computer Science at the School of Engineering, who led the research.

“Some models may do better or worse just by luck of the draw. We’re trying to anticipate that and adjust for it to make fairer comparisons.”

Koyejo, Truong, and colleagues, borrowed a concept from education known as Item Response Theory, which Koyejo compares to standardized tests such as the SAT with questions of differing levels of difficulty.

The team analyze questions and score the answers provided by AI language models, using generative AI to create questions that can be tailored to different levels of difficulty. The team claims this has reduced the costs of testing by between 50 and 80 percent.

Koyejo has tested the system against 22 datasets and 172 language models and found that it can adapt easily to both new models and questions. The researchers say the finding will allow for better diagnostics and more accurate performance evaluations of AI language models.

Featured

OpenAI and UK Government sign new deal aiming to equip education, security and justice sectors with latest technology

Jul 22, 2025

OpenAI and UK Government sign new deal aiming to equip education, security and justice sectors with latest technology

Jul 22, 2025

Jul 22, 2025

Frontdoor partners with nonprofit to provide civilian job training for U.S. military veterans

Jul 22, 2025

Frontdoor partners with nonprofit to provide civilian job training for U.S. military veterans

Jul 22, 2025

Jul 22, 2025

New national blueprint calls for AI literacy in every U.S. classroom to prepare students for an AI-driven future

Jul 22, 2025

New national blueprint calls for AI literacy in every U.S. classroom to prepare students for an AI-driven future

Jul 22, 2025

Jul 22, 2025

Study reveals over half of low-income UK families face digital exclusion during summer holidays

Jul 22, 2025

Study reveals over half of low-income UK families face digital exclusion during summer holidays

Jul 22, 2025

Jul 22, 2025

Google’s philanthropic division supports AI literacy and entrepreneurship in Detroit with ‘AI in the D’ initiative

Jul 22, 2025

Google’s philanthropic division supports AI literacy and entrepreneurship in Detroit with ‘AI in the D’ initiative

Jul 22, 2025

Jul 22, 2025

12 universities selected to lead £54 million fund aiming to attract world’s top researchers into the UK

Jul 22, 2025

12 universities selected to lead £54 million fund aiming to attract world’s top researchers into the UK

Jul 22, 2025

Jul 22, 2025

Researchers at Stanford University introduce cost-effective and efficient way to evaluate AI language models

Jul 22, 2025

Researchers at Stanford University introduce cost-effective and efficient way to evaluate AI language models

Jul 22, 2025

Jul 22, 2025

University of Huddersfield research project examines links between school exclusion and criminal offending

Jul 22, 2025

University of Huddersfield research project examines links between school exclusion and criminal offending

Jul 22, 2025

Jul 22, 2025

Ringling College of Art and Design provides students with courses in safe and ethical use of AI

Jul 22, 2025

Ringling College of Art and Design provides students with courses in safe and ethical use of AI

Jul 22, 2025

Jul 22, 2025

Sdui Group secures funding to support plans to become the preferred digital education platform in Europe

Jul 22, 2025

Sdui Group secures funding to support plans to become the preferred digital education platform in Europe

Jul 22, 2025

Jul 22, 2025

Bridging the Parent-School Communication Communication Gap with Compass MIS

Jul 21, 2025

Bridging the Parent-School Communication Communication Gap with Compass MIS

Jul 21, 2025

Jul 21, 2025

UK-based learning platform Thrive acquires mentoring and coaching business Guider

Jul 21, 2025

UK-based learning platform Thrive acquires mentoring and coaching business Guider

Jul 21, 2025

Jul 21, 2025

Hearing First announces First LSL Lessons for children with hearing loss in early days after diagnosis

Jul 21, 2025

Hearing First announces First LSL Lessons for children with hearing loss in early days after diagnosis

Jul 21, 2025

Jul 21, 2025

Durham University launches £3.8m research initiative, aiming to support better use of AI and computers

Jul 21, 2025

Durham University launches £3.8m research initiative, aiming to support better use of AI and computers

Jul 21, 2025

Jul 21, 2025

Oxford and Cambridge secure £12.5m in donations to expand STEM access for disadvantaged students

Jul 21, 2025

Oxford and Cambridge secure £12.5m in donations to expand STEM access for disadvantaged students

Jul 21, 2025

Jul 21, 2025

University of Westminster visits A-Level students as part of Quintin Hogg Trust-funded mentoring project

Jul 21, 2025

University of Westminster visits A-Level students as part of Quintin Hogg Trust-funded mentoring project

Jul 21, 2025

Jul 21, 2025

MIT unveils new interface that lets anyone train collaborative robots in three different ways

Jul 21, 2025

MIT unveils new interface that lets anyone train collaborative robots in three different ways

Jul 21, 2025

Jul 21, 2025

University of Leeds student awarded Lewis Hamilton Scholarship joins Red Bull Racing F1 team

Jul 21, 2025

University of Leeds student awarded Lewis Hamilton Scholarship joins Red Bull Racing F1 team

Jul 21, 2025

Jul 21, 2025

Supergen Bioenergy Hub launches voucher scheme to support UK industry-research partnerships

Jul 21, 2025

Supergen Bioenergy Hub launches voucher scheme to support UK industry-research partnerships

Jul 21, 2025

Jul 21, 2025

Savannah College of Art and Design releases AI Insights Report, considering the use of AI in creative work

Jul 21, 2025

Savannah College of Art and Design releases AI Insights Report, considering the use of AI in creative work

Jul 21, 2025

Jul 21, 2025

AIresearchStanford University

Rachel Lawler

12 universities selected to lead £54 million fund aiming to attract world’s top researchers into the UK

University of Huddersfield research project examines links between school exclusion and criminal offending