INFP - An audio-driven dual-sided interactive video generation framework developed by ByteDance.

## Definition of INFP INFP is an audio-driven dual-sided interactive video generation framework developed by ByteDance. It generates real-time, natural-looking videos where characters dynamically respond to audio inputs without manual role switching. The framework supports multi-language audio, singing mode, and non-human avatars, optimized for applications like video conferencing. ## INFP's Operational Stages 1. **Motion-Based Head Imitation**: Projects facial behaviors from real conversations into a low-dimensional motion latent space to animate static images. 2. **Audio-Guided Motion Generation**: Maps dual-person audio inputs to motion latent codes via denoising, enabling audio-driven head movements in interactive scenarios. ## INFP's Performance Metrics INFP runs at over 40 FPS on Nvidia Tesla A10 GPUs, enabling real-time video generation for applications like instant messaging and live video conferencing. ## INFP's Dataset INFP introduces **DyConv**, a large-scale dataset of dyadic conversations collected from the internet, which includes separated audios and annotations for research. ## Accessibility of INFP As of now, INFP appears to be research-oriented with no public code or detailed usage guidelines. Its [official website](https://grisoon.github.io/INFP/) provides demonstrations but lacks installation instructions, suggesting limited accessibility. ## Comparison with DIM Unlike DIM (which requires manual role assignment), INFP dynamically adapts to conversational states (speaker/listener) based on audio input, eliminating the need for explicit role switching and producing more natural interactions. ## Features of INFP - **Motion Diversity**: Adapts outputs for the same image based on different audio inputs. - **Out-of-Distribution Support**: Works with non-human and side-face images. - **Real-Time Interaction**: Supports agent-agent and human-agent communication at >40 FPS. - **Multimodal Output**: Generates lip-synced talking heads, expressive listening behaviors, and singing animations. - **Multi-Language Support**: Processes audio inputs in various languages. ## Applications of INFP INFP is designed for real-time communication scenarios, such as virtual assistants, AI avatars, video conferencing, and interactive media, where natural, audio-driven character interactions are required. ### Citation sources: - [INFP](https://grisoon.github.io/INFP) - Official URL Updated: 2025-04-01

Register Now

Login

Lost Password

Add question

Login

Register Now

INFP - An audio-driven dual-sided interactive video generation framework developed by ByteDance.

INFP - An audio-driven dual-sided interactive video generation framework developed by ByteDance.