What is the architecture of Stable Diffusion 3 Medium?

Question

Answers ( 1 )

    0
    2025-04-01T07:40:38+00:00

    **Stable Diffusion 3 Medium (SD3 Medium)** uses a **Multimodal Diffusion Transformer (MMDiT)** architecture. It incorporates three text encoders (OpenCLIP-ViT/G, CLIP-ViT/L, and T5-xxl) for prompt processing and a 16-channel VAE for enhanced image detail, particularly in hands and faces.

Leave an answer