Google DeepMind and Wellcome Sanger Institute form five-year AI genomics consortium
The partnership with Google.org will generate genomic datasets designed for machine learning models used in biological research.
The Wellcome Sanger Institute, Google DeepMind, and Google.org have formed a five-year consortium focused on genomic datasets for AI research
The Wellcome Sanger Institute, Google DeepMind, and Google.org have formed a five-year artificial intelligence consortium to create genomic datasets for machine learning models used in biological research.
Announced at the AI x BIO conference, the consortium will focus on generating large-scale biological datasets in areas where existing data is limited or not yet structured for advanced AI systems.
The work will combine the Wellcome Sanger Institute’s genomics research and data-generation capabilities with Google DeepMind’s AI expertise. Google.org is supporting the initiative, although the announcement does not disclose a funding amount.
The founding organizations plan to establish a broader consortium and bring in additional collaborators. No further participating organizations have been named at this stage.
The datasets are intended to support AI models that can be trained and tested on genomic and molecular data. The consortium’s five-year program will focus on producing data that is suitable for machine learning, rather than adapting existing datasets after they have been created.
Five-year focus on data generation
The Wellcome Sanger Institute and Google DeepMind have already worked together through research collaborations, a joint AI in genomics fellowship, and projects focused on building AI research capability, including in lower and middle income countries.
The new consortium shifts that relationship into a larger data-generation program. The partners say the work will support AI models used to predict biological processes and analyze areas of life sciences research that are not yet well covered by existing datasets.
Researchers at the Wellcome Sanger Institute are already using AI across genomics, including work that draws together information from genes, RNA, and proteins. AI is also being used by Sanger researchers to analyze large datasets and support experimental design and interpretation.
Dr Julia Wilson, Chief Innovation and Impact Officer at the Wellcome Sanger Institute, says: "By combining Sanger’s expertise in generating world-leading datasets with Google DeepMind’s leadership in artificial intelligence, we have an opportunity to accelerate the generation of biological data specifically designed to power the development of foundational AI models. Through this consortium, we aim to create resources that will be shared widely with the community to enable transformative scientific discoveries and deliver broad impact across the life sciences."
Google DeepMind role in biological AI
Google DeepMind’s role in the consortium centers on AI for science and model development. The announcement positions the partnership around datasets designed from the outset for AI training and evaluation.
Dr Pushmeet Kohli, Vice President of AI for Science at Google DeepMind, says: "Together with the Sanger Institute, we aim to build the data backbone needed to decode the complexities of biological processes. Ultimately, this could accelerate scientific discovery and unlock entirely new frontiers for researchers worldwide."
Anna Koivuniemi, Head of the Google DeepMind Impact Accelerator, adds: "Addressing the most significant challenges in biology will require collaboration across disciplines, sectors and institutions. We are excited to partner with the Sanger Institute in order to strengthen AI in genomics opportunities and ensure this data can help to accelerate breakthroughs that benefit humanity."
Open-access data foundations
Google.org’s involvement is focused on supporting open-access data foundations for future biological AI models. The announcement does not provide details on how the datasets will be accessed, governed, or released.
Leslie Yeh, Director of Google.org Scientific Progress, says: "Over the past decade, we’ve seen how deep learning can transform our understanding of complex biological challenges. With this new consortium, Google.org is supporting the open-access data foundations needed to fuel the next generation of biological AI models. By accelerating the integration of large-scale genomic datasets, we aim to support the global research community in achieving life-saving scientific breakthroughs."
The next stage is the formation of the wider consortium and the start of the five-year dataset program. The founding partners have not yet named additional collaborators or set out a publication schedule for the datasets.