Open Language Data Initiative
The Open Language Data Initiative (OLDI) empowers language communities around the globe to contribute to a database that drives the foundation of today’s machine translation and natural language processing work. We invite community, academic, and industry members to contribute to key datasets that are imperative to the organic expansion of language technology’s reach.
Checkout our website oldi.org, datasets OLDI-Seed and FLORES+, Open Data shared task at WMT25, and the Findings of the WMT 2024 Shared Task of the Open Language Data Initiative.
Subscribe to the OLDI newsletter
Subscribe to get full access to the newsletter and publication archives. We are going to notify you about updates to our datasets, notable events on machine translation and multilingual NLP, and occasional tutorials on language technologies.
To learn more about the tech platform that powers this publication, visit Substack.com.


