TECHNOLOGY, INTERNET TRENDS, GAMING

Meta Reinvents Voice Translation with Multimodal AI

Meta Reinvents Voice Translation with Multimodal AI

By auroraoddi

Meta recently unveiled its new multimodal AI speech translation model, called SeamlessM4T, which supports nearly 100 languages for text and 36 for speech. With an updated “v2” architecture, the tech giant is now expanding this tool to make conversational translations more spontaneous and expressive-the latter being a missing key to authentic conversation between languages.

SeamlessExpressive: A New Way to Express Yourself

The first of the two new functionalities is ‘SeamlessExpressive’, which, as the name suggests, transfers expressions in the translation of speech. These expressions include intonation, volume, emotional tone (excitement, sadness or whispers), speech rate, and pauses. Considering that up to now vocal translations have always sounded robotic, this innovation could change the rules of the game, both in our everyday lives and in content production. The supported languages include English, Spanish, German, French, Italian, and Chinese, though at the time of writing this article, Italian and Chinese are missing from the demonstration page.

SeamlessStreaming: Quick Real-Time Translation

The second functionality is ‘SeamlessStreaming’, which begins to translate a speech while the speaker is still talking, allowing others to hear a quicker translation. There’s still a slight delay of just under two seconds, but you won’t have to wait for someone to finish a sentence.

The challenge here is that different languages have different sentence structures, so it was necessary to develop a dedicated algorithm to study partial audio inputs to decide whether there’s enough context to start generating a translated output or to continue listening.

Meta’s ‘Communication Without Barriers’ Suite

Meta’s latest evolution of this ‘Communication Without Barriers’ suite seems to be an impressive solution, far surpassing what’s offered by mobile interpretation tools from Google and Samsung. No news yet on when the public can use these new functionalities, but one can imagine Meta integrating them into their smart glasses, making them even more practical than ever.