TECHNOLOGY, INTERNET TRENDS, GAMING

Alibaba’s EMO: revolution in videos that talk and sing

Alibaba’s EMO: revolution in videos that talk and sing

By auroraoddi

Recently, experts at Alibaba ‘s Institute for Intelligent Computing developed a new artificial intelligence system called “EMO.” This system can animate a single portrait photo and generate videos in which the person in the photo speaks or sings in an amazingly realistic way.

How EMO works

The EMO system takes advantage of an artificial intelligence diffusion model, which has demonstrated a remarkable capability in generating realistic synthetic images. Alibaba ‘s experts trained the model on a large dataset of more than 250 hours of videos of people speaking, from speeches, movies, television programs, and singing performances.

Unlike traditional methods that rely on 3D facial models or blending shapes to approximate facial movements, EMO directly converts the audio wave into video frames. This allows subtle movements and identity-specific peculiarities associated with natural language to be captured.

Benefits of EMO

EMO represents a major breakthrough in generating audio-guided videos of people speaking. According to experiments described in their research paper, EMO significantly outperforms existing methods in terms of video quality, identity preservation, and expressiveness.

Alibaba researchers also conducted a user study that showed that EMO-generated videos are more natural and emotional than those produced by other systems.

Generating videos of people singing

In addition to conversational videos, EMO can also animate portraits of people singing, creating appropriate mouth movements and evocative facial expressions synchronized with the singing. The system can generate videos for an arbitrary duration based on the length of the input audio.

Experimental results show that EMO can produce not only convincing videos of people speaking, but also videos of people singing in various styles, significantly exceeding existing methodologies in terms of expressiveness and realism.

Ethical implications

Despite the significant advances achieved by EMO and similar technologies, there are ethical implications to consider. The ability to synthesize personalized video content from a simple photo and audio snippet raises concerns about the misuse of this technology to impersonate people without their consent or spread misinformation.

Alibaba ‘s experts say they plan to explore methods to detect synthesized videos in order to counter the potential spread of fake content.