VALL-E : Microsoft Unveils Audio AI That Can Simulate Any Voice From 3-Second Prompts

January 10, 2023

176

VALL-E : Microsoft Unveils Audio AI That Can Simulate Any Voice From 3-Second Prompts | 2YODOINDIA | PHOTO CREDIT : MICROSOFT

Microsoft researchers recently announce VALL-E, a new text-to-speech AI model that can accurately mimic a person’s voice when given a three-second audio sample. When it has learn a specific voice, VALL-E can synthesise audio of that person saying anything while attempting to retain the speaker’s emotional tone.

When combine with other generative AI models like GPT-3, VALL-E’s creators believe it can be use for high-quality text-to-speech applications, speech editing in which a recording of a person could be edit and alter from a text transcript by making them say something they did not actually say, and audio content creation.

As per Microsoft, VALL-E is primarily a “neural codec language model,” and is base on EnCodec, which Meta reveal in October 2022.

VALL-E creates discrete audio codec codes from text and acoustic prompts, as oppose to other text-to-speech methods that typically synthesise speech by manipulating waveforms.

It processes how a person sounds, breaks the relevant data down into discrete components (referred to as “tokens“) using EnCodec, and then uses training data to match what it “knows” about how that voice might sound if it spoke other phrases beyond the three-second sample.

Microsoft train VALL-E’s speech synthesis functionalities using Meta’s LibriLight audio library.

It includes 60,000 hours of English language speech from over 7,000 speakers, source primarily from LibriVox public domain audiobooks.

The voice in the three-second sample should closely resemble a voice in the learning algorithm for VALL-E to produce a good result.

The Microsoft offers dozens of audio examples of the AI model in action on the VALL-E example website.

The “Speaker Prompt” data set is the three-second audio given to VALL-E that it must try to emulate.

ALSO READ Assam News : State Government Signs 7 MOUs with Microsoft, Google, Tata Group and Many More

The “Ground Truth” is a previously record version of that same speaker saying a specific phrase for comparative purposes (sort of like the “control” in the experiment).

The “Baseline” sample is generate by a traditional text-to-speech synthesis method, and the “VALL-E” sample is generate by the VALL-E model.

A block diagram of VALL-E as shown in the example website by Microsoft researchers

Researchers only supply the three-second “Speaker Prompt” sample and a text string (what they would want the voice to say) into VALL-E to get those results.

Some VALL-E results appear computer-generate, but others could be misunderstand for human speech, which is the model’s goal.

Because of VALL-E’s potential to fuel wrongdoings and deceit, Microsoft has not made VALL-E code available for others to explore.

The researchers appear to be aware of the potential social harm that this technology may cause.

Researchers write in the paper’s conclusion :

“Since VALL-E could synthesize speech that maintains speaker identity, it may carry potential risks in misuse of the model, such as spoofing voice identification or impersonating a specific speaker. To mitigate such risks, it is possible to build a detection model to discriminate whether an audio clip was synthesized by VALL-E. We will also put Microsoft AI Principles into practice when further developing the models.”

दल-बदल विरोधी कानून का इतिहास | RRD’s Opinion

Why You Should Read Hanuman Chalisa | RRD’s Opinion

Examining the Justifications Behind India’s Exclusion of Muslims from the Citizenship Amendment Act | RRD’s Opinion

How BJP will Cross 370 Mark in 2024 Lok Sabha Polls | RRD’s Opinion

Reason behind PM Modi’s Sudden Announcement of Bharat Ratna for LK Advani | RRD’s Opinion

VALL-E : Microsoft Unveils Audio AI That Can Simulate Any Voice From 3-Second Prompts

Related Articles

Nothing Ear and Nothing Ear A TWS Earphones Launched in India

Paytm received NPCI Approval for User Migration to New Payment System Provider Banks | Details Inside

Samsung Neo QLED 8K and Samsung Neo QLED 4K and Samsung Neo OLED TV Models Launched in India

LEAVE A REPLY Cancel reply

Latest Articles

One UI vs HyperOS: The Differences between the Two Popular Custom Skins | Details Inside

राशिफल व पंचांग | 19th अप्रेल 2024

Nothing Ear and Nothing Ear A TWS Earphones Launched in India

Paytm received NPCI Approval for User Migration to New Payment System Provider Banks | Details Inside

राशिफल व पंचांग | 18th अप्रेल 2024

Lenovo Yoga Slim 7i with AI Engine+ Launched in India

Nokia 215 4G | Nokia 225 4G Launched in India

OnePlus Watch Cobalt Limited Edition Smartwatch Launched in India

VALL-E : Microsoft Unveils Audio AI That Can Simulate Any Voice From 3-Second Prompts

Related Articles

LEAVE A REPLY Cancel reply

Stay Connected

Latest Articles