Data is crucial to advances in AI, and Google and its big tech rivals want to tap information to help make products perform better and be more available to the widest possible audience.
“Imagine a new internet user in Africa speaking Wolof... using their phone to ask where is the nearest pharmacy,” said Johan Schalkwyk, a researcher at Google.
Such situations “we take for granted,” Schalkwyk told reporters, adding that languages were “not available to everyone in the world.”
According to Schalkwyk, there are more than 7,000 languages globally.
However, Google only offers its translations for a little more than 130 of them.
The search engine giant is aiming to widen this substantially and wants to mine data in new languages not only from texts available on the internet, but also from videos, images and speech.
The group is also looking to collect audio clips for languages for which there may not be much written material.
As progress is made on the project, which is estimated to take several years, Google plans to integrate its advances into its products, including YouTube and Google Translate.
Facebook parent Meta earlier this year announced a similar plan called No Language Left Behind designed to create translation systems to cover hundreds of world languages.