Microsoft backs 11 projects to expand AI access for Europe’s underrepresented languages
The LINGUA open call supports open datasets across 16 low-resource languages, highlighting the link between language, AI adoption, and digital inclusion across Europe.
Microsoft has announced the awardees of LINGUA, an open call launched by its AI for Good Lab to strengthen digital inclusion for Europe’s underrepresented languages. Eleven projects have been selected, spanning 16 languages and dialects across ten countries, with a combined reach of more than 65 million speakers.
The initiative reflects growing concern that limited language representation in AI systems can restrict access, skills development, and participation in AI-driven economies, even in regions with strong digital infrastructure.
The awardees were shared in a LinkedIn post by Catrin Hinkel, Corporate Vice President and CEO of Microsoft Switzerland.
Language access and AI adoption
In her post, Hinkel said language plays a decisive role in how widely AI tools are used. She wrote: “Our AI Diffusion Report shows that language is a key driver of AI adoption. Nations where low-resource languages dominate show lower usage, even after adjusting for GDP and internet access.”
She added that expanding language coverage in AI systems is “about access, inclusion, and participation in the opportunities AI creates.”
LINGUA was launched three months ago as an open call to address gaps in digital resources for European languages with limited online data. The selected projects focus on building fully open-licensed speech and text datasets that can be used to improve multilingual AI models and support fairer representation.
Universities and research institutions among awardees
The selected projects bring together universities, nonprofits, cultural organizations, a government language center, and a public broadcaster. Two awardees are based in Switzerland, which Hinkel highlighted in her post.
She wrote that the University of Zurich Department of Computational Linguistics was selected for RhaetoChat, a project creating fine-tuning data for Romansh and Ladin. “In a country with four national languages, ensuring that Romansh speakers can fully participate in the AI era is essential,” she wrote.
The École Polytechnique Fédérale de Lausanne was also selected for Scaling Finweb2-HQ, a project focused on enhancing multilingual datasets across European languages.
The full list of LINGUA Open Call awardees is as follows:
BUDOVA: Building Ukrainian Domain-Specific, Open Voice & Text Archives – Kyiv National University of Construction and Architecture (Ukraine) – Ukrainian
Collection and Digitization of Romani Language Data in Greece: Laying the Foundations for Representation in Artificial Intelligence – ARSIS – Association for the Social Support of Youth (Greece) – Romani, Greco-Romani
Icelandic AI Safety Benchmarks: Creating Open Evaluation Datasets for LLM Safety in a Low-Resource Language – University of Iceland (Iceland) – Icelandic
LuxVLD: Luxembourgish Vision-Language Dataset for Education and Digital Inclusion – SnT, University of Luxembourg (Luxembourg) – Luxembourgish
PARLA CHIARO (Speak Clearly): Protecting Italian Dialect Speakers from AI-generated Health Misinformation – University of Naples Federico II (Italy) – Neapolitan, Sicilian, Roman
Protecting Kosovo’s languages through responsible AI – Radio Television of Kosovo (Kosovo) – Serbian, Turkish, Bosnian, Romani
RhaetoChat: LLM Fine-Tuning Data for Rhaeto-Romance Languages – University of Zurich Department of Computational Linguistics (Switzerland) – Romansh, Ladin
SaqWI: Korpus Malti ta’ Mistoqsijiet u Tweġibiet / A Maltese Corpus of Questions & Answers (SaqWI-QA) – Ċentru tal-Ilsien Malti (Malta) – Maltese
Scaling Finweb2-HQ: Multi-Signal Extraction and Quality Enhancement for European Language Models and Beyond – École Polytechnique Fédérale de Lausanne (Switzerland) – Multi-language
Speaking Ladino: Open Speech and Text Datasets for AI-Powered Language Preservation – Inalco Paris (France) – Ladino
Wikispeech for All: Basque Edition – Wikimedia Sverige (Sweden) – Basque
Microsoft says additional projects will also receive support through Azure compute credits.
Open data and skills implications
Microsoft says all LINGUA-funded projects commit to producing fully open-licensed datasets for text-to-text, speech-to-text, and text-to-speech use cases. The company frames this as foundational work to support open language models and reduce structural disadvantages faced by low-resource languages.
LINGUA was developed in coordination with the APERTUS project led by EPFL and ETH Zurich and in consultation with the Council of Europe. Microsoft also worked with Mila – Quebec Artificial Intelligence Institute, Mozilla, and EPFL during the evaluation process.
For education and skills development, the initiative underlines how language coverage shapes who can engage with AI tools, learn to use them effectively, and contribute data and research back into the ecosystem. As AI adoption accelerates, LINGUA positions language inclusion as a prerequisite for equitable participation rather than a secondary consideration.
ETIH Innovation Awards 2026
The ETIH Innovation Awards 2026 are now open and recognize education technology organizations delivering measurable impact across K–12, higher education, and lifelong learning. The awards are open to entries from the UK, the Americas, and internationally, with submissions assessed on evidence of outcomes and real-world application.