Answers ( 1 )

    0
    2025-04-01T04:59:37+00:00

    Janus-Pro employs **SigLIP-L (ViT-L-16-SigLIP-384)** as its visual encoder, which:
    - Supports **384×384 resolution** image inputs
    - Uses decoupled visual encoding to separate understanding/generation roles
    - Enhances performance through 16× downsampling.

Leave an answer