• Автор темы News
  • Дата начала
  • " /> News - Microsoft’s VASA-1 can deepfake a person with one photo and one audio track | SoftoolStore.de - Софт, Avid Media Composer, Книги. | бесплатные прокси (HTTP, Socks 4, Socks 5)

    News Microsoft’s VASA-1 can deepfake a person with one photo and one audio track

    News

    Команда форума
    Редактор
    Сообщения
    14 068
    Баллы
    358
    Offline
    #1

    Enlarge / A sample image from Microsoft for "VASA-1: Lifelike Audio-Driven Talking Faces Generated in Real Time." (credit: Microsoft)


    On Tuesday, Microsoft Research Asia unveiled VASA-1, an AI model that can create a synchronized animated video of a person talking or singing from a single photo and an existing audio track. In the future, it could power virtual avatars that render locally and don't require video feeds—or allow anyone with similar tools to take a photo of a person found online and make them appear to say whatever they want.

    "It paves the way for real-time engagements with lifelike avatars that emulate human conversational behaviors," reads the abstract of the accompanying research paper titled, "VASA-1: Lifelike Audio-Driven Talking Faces Generated in Real Time." It's the work of Sicheng Xu, Guojun Chen, Yu-Xiao Guo, Jiaolong Yang, Chong Li, Zhenyu Zang, Yizhong Zhang, Xin Tong, and Baining Guo.

    The VASA framework (short for "Visual Affective Skills Animator") uses machine learning to analyze a static image along with a speech audio clip. It is then able to generate a realistic video with precise facial expressions, head movements, and lip-syncing to the audio. It does not clone or simulate voices (like other Microsoft research) but relies on an existing audio input that could be specially recorded or spoken for a particular purpose.


    Read 11 remaining paragraphs | Comments
     
    Вверх Снизу