MIT researchers push AI large language models to new reasoning heights with test-time training

16 Jul

An AI MIT study shows temporary parameter updates can make LLMs more accurate on complex reasoning tasks.

MIT researchers have developed a method to make large language models (LLMs) more adaptable when tackling difficult tasks that require logical reasoning.

The study explores test-time training, a technique that temporarily updates a model’s internal parameters during deployment, leading to up to a sixfold improvement in accuracy on complex tasks such as IQ puzzles.

LLMs often perform well on familiar problems but fail when faced with new domains like strategic planning or process optimization. The MIT team found that this approach could make models more flexible in applications ranging from medical diagnostics to supply chain management.

Test-time training boosts adaptability

The research shows that combining test-time training with in-context learning can significantly enhance performance. In-context learning typically feeds a model a few examples as prompts, but this alone is often insufficient for tasks requiring deeper reasoning.

By creating task-specific datasets from example problems and slightly altering the inputs, the researchers were able to expand the data available for temporary model updates. They used low-rank adaptation, which updates only a small subset of model parameters, to keep the process efficient.

“Genuine learning — what we did here with test-time training — is something these models can’t do on their own after they are shipped. They can’t gain new skills or get better at a task. But we have shown that if you push the model a little bit to do actual learning, you see that huge improvements in performance can happen,” says Ekin Akyürek, PhD ’25, lead author of the study.

Future applications and efficiency challenges

The improvements come at a cost: a query that would normally take less than a minute might require up to 10 minutes with test-time training. Researchers believe the technique is best suited for particularly hard tasks rather than routine queries.

The team plans to use these insights to move toward models that can automatically decide when to apply test-time training. The long-term goal is an LLM that can switch between in-context learning and temporary parameter updates without human input.

“This is important because our method needs to be efficient if it is going to be deployed in the real world. We find that you can get huge improvements in accuracy with a very small amount of parameter training,” Akyürek says.

RTIH AI in Retail Awards

Our sister title, RTIH, organiser of the industry leading RTIH Innovation Awards, proudly brings you the first edition of the RTIH AI in Retail Awards, which is now open for entries.

As we witness a digital transformation revolution across all channels, AI tools are reshaping the omnichannel game, from personalising customer experiences to optimising inventory, uncovering insights into consumer behaviour, and enhancing the human element of retailers' businesses.

With 2025 set to be the year when AI and especially gen AI shake off the ‘heavily hyped’ tag and become embedded in retail business processes, our newly launched awards celebrate global technology innovation in a fast moving omnichannel world and the resulting benefits for retailers, shoppers and employees.

Our 2025 winners will be those companies who not only recognise the potential of AI, but also make it usable in everyday work - resulting in more efficiency and innovation in all areas.

Winners will be announced at an evening event at The Barbican in Central London on Wednesday, 3rd September.

Featured