Meta announce Voicebox, its advance artificial intelligence (AI) tool that can generate speech from text last week. The latest tool by Facebook parent Meta is claim to produce high-quality audio clips and edit pre-recorded audio while preserving the content and style of the audio.
It is said to be multilingual and claim to deliver speech in six languages.
The machine learning model can be use for noise removal as well.
Meta’s Voicebox also has the ability to replace misspoken words without having to re-record an entire speech.
The new generative text-to-speech model works like the new AI innovations including ChatGPT and Dall-E.
Meta Voicebox, new generative AI model can perform speech generation tasks like editing, sampling, and stylising.
It is claim to deliver audio clips from a two-second audio sample and edit pre-recorded audio while keeping the content and style of the audio.
WhatsApp for Windows Beta Rolling Out In-App Chat Support Feature for Users: Report
— 2YoDoINDIA News Network (@2yodoindia) June 20, 2023
for more news visit https://t.co/98KV4yIruC#2YoDoINDIA #WhatsApp #Windows pic.twitter.com/APpNrAkRJ2
The text-to-speech model can perform tasks like noise removal, content editing, style conversion, and diverse sample generation.
It is said to modify any part of a given sample and recreate a portion of the speech that’s interrupted by noise such as car horns or barking dogs.
The AI model can also be use to replace misspoken words without having to re-record an entire speech.
Meta Voicebox can synthesise speech across six languages, English, French, Spanish, German, Polish, and Portuguese.
It can create a reading of the text in any of those languages, even when the sample speech and the text are in different languages.
Meta Voicebox claim to outperform Microsoft’s VALL-E and generate audio samples 20 times faster.
Meta AI detailed in a research paper :
“Our results show that speech recognition models trained on Voicebox-generated synthetic speech perform almost as well as models trained on real speech, with 1 percent error rate degradation as opposed to 45 to 70 percent degradation with synthetic speech from previous text-to-speech models”
WhatsApp Rolls Out 'Silence Unknown Callers' Feature on iOS and Android
— 2YoDoINDIA News Network (@2yodoindia) June 20, 2023
for more news visit https://t.co/98KV4yIruC#2YoDoINDIA #WhatsApp #SilenceUnknownCallers #WhatsAppFeatures #WhatsAppCalls #WhatsAppPrivacy #Privacy pic.twitter.com/eMQbyhZYLG
So, a few audio samples are list to show users the working of Voicebox.
In the blog, Meta claims that Voicebox can generate speech that is more representative of how people talk in the real world in the aforementioned six languages.
Meta believes that this capability could be use to generate synthetic data to help better train a speech assistant model in the near future.
Meta Voicebox is currently under development and is not available to public users.
As Meta says it realises that this technology brings the potential for misuse and unintended harm like the current AI innovations.
It is said to be working on an effective classifier that can distinguish between authentic speech and audio generated with Voicebox to mitigate these possible future risks.