Fashion-VDM - A virtual try-on technology using video diffusion models for high-quality dynamic garment visualization.
## Definition of Fashion-VDM
Fashion-VDM is an advanced virtual try-on technology developed jointly by Google and the University of Washington. It utilizes video diffusion models (VDM) to generate high-resolution (512px) dynamic try-on videos from a single garment image and a person's video input, maintaining temporal consistency and fine-grained details.
## Developers of Fashion-VDM
Fashion-VDM was collaboratively developed by **Google** and the **University of Washington**.
## Technical Methodology
Fashion-VDM employs:
1. **Video Diffusion Model (VDM) architecture** for high-quality generation
2. **Split classifier-free guidance** for enhanced conditional control
3. **Progressive temporal training** strategy for efficient long-sequence video generation (64 frames)
4. **3D-Conv and temporal attention blocks** to maintain temporal coherence
5. **Joint image-video training** for data efficiency
## Core Features
Key capabilities include:
- Generating 64-frame videos at 512px resolution in a single pass
- Preserving subject identity and motion fidelity
- Supporting multiple conditional inputs (garment-only, person+garment, person+garment+pose)
- Demonstrating superior temporal consistency compared to existing methods
## Use Cases
Main applications include:
1. **Online retail**: Virtual try-on experiences for e-commerce
2. **Virtual fashion**: Dynamic garment presentation for digital fashion shows
3. **Personalized recommendations**: Customized try-on videos based on user preferences
## Training Methodology
The training involves:
1. **Spatial pre-training**: Initial training on image data
2. **Progressive temporal training**: Gradual extension to longer video sequences
3. **Adaptive batch sizing**: Using increasingly longer frame batches for efficient learning
## Access Points
Available resources:
- [Project Website](https://johannakarras.github.io/Fashion-VDM/)
- [Research Paper](https://arxiv.org/abs/2411.00225)
- [Supplementary Materials](https://johannakarras.github.io/Fashion-VDM/static/pdf/Fashion_VDM_Supplementary.pdf)
- [UBC Benchmark Dataset](https://johannakarras.github.io/Fashion-VDM/static/data/ubc-benchmark.zip)
## Core Features
Fashion-VDM outperforms existing methods by:
1. Achieving higher temporal consistency through 3D-Conv and attention mechanisms
2. Generating longer videos (64 frames) at commercial-ready resolution (512px)
3. Maintaining better identity preservation and garment detail fidelity
4. Supporting flexible conditional control via split classifier-free guidance
### Citation sources:
- [Fashion-VDM](https://johannakarras.github.io/Fashion-VDM) - Official URL
Updated: 2025-04-01