top of page

Meta speech-to-text, text-to-speech AI models for over 1,100 languages; even shares open source data

27 May 2023

Meta is coming with new AI tool MMS

In a world where many languages face the risk of extinction, current speech recognition and generation technology pose limitations that contribute to this trend. To address this issue, a breakthrough has been made in the form of artificial intelligence (AI) models that aim to make information more accessible and devices more user-friendly across a multitude of languages.

Introducing Massively Multilingual Speech (MMS) models, a groundbreaking development in the field of language technology. These models push the boundaries of text-to-speech and speech-to-text capabilities by expanding the number of supported languages from approximately 100 to over 1,100. This represents a monumental leap forward, supporting more than ten times the number of languages as before. Furthermore, these AI models can identify and comprehend over 4,000 spoken languages—an astounding forty-fold increase compared to previous technology.

The applications of speech technology are far-reaching and varied. From virtual and augmented reality experiences to messaging services, these AI models offer the potential for individuals to engage with these technologies in their preferred language, while also understanding and responding to a diverse range of voices.

One significant aspect of this development is the commitment to open-source the models and associated code. By sharing this research and technology with the wider community, the aim is to foster collaboration and encourage further advancements in preserving endangered languages and fostering global unity.

The Approach:

The primary challenge in developing MMS models was collecting audio data for thousands of languages, as existing speech datasets covered a maximum of 100 languages. In a creative solution to this problem, researchers turned to religious texts, such as the Bible, which have been translated into numerous languages and extensively studied for text-based language translation research.

These religious translations provided publicly available audio recordings of individuals reading the texts in different languages. As part of the MMS project, a dataset was created comprising readings of the New Testament in over 1,100 languages, providing an average of 32 hours of data per language.

To further expand the available languages, unlabeled recordings of various other Christian religious readings were incorporated. Through this approach, the number of supported languages surged to over 4,000. Although this data primarily features male speakers and pertains to religious content, analysis has shown that the models perform equally well with both male and female voices. Additionally, the models do not exhibit a bias toward producing religious language, despite the content of the audio recordings.

Future Goals:

While the achievement of supporting over 4,000 languages is impressive, the creators of the MMS models have ambitious plans to expand coverage even further. The next frontier involves addressing the challenge of dialects, which often pose difficulties for existing speech technology. By refining the models to handle various dialects, the aim is to enhance the inclusivity and effectiveness of speech technology in diverse linguistic contexts.

In conclusion, the introduction of Massively Multilingual Speech (MMS) models represents a significant milestone in supporting a multitude of languages and bridging linguistic gaps. By leveraging AI technology, these models have the potential to empower individuals to access information and interact with devices in their preferred language. Moreover, the decision to open-source the models and code demonstrates a commitment to collaboration and invites the global research community to contribute to the preservation and revitalization of endangered languages. As the journey continues, the future holds promise for even greater linguistic inclusivity and understanding.


Reference - https://about.fb.com/news/2023/05/ai-massively-multilingual-speech-technology/

bottom of page