Which modalities does Gemma 3 support?

Question

Answers ( 1 )

    0
    2025-04-01T15:25:09+00:00

    - **400M, 1.2B, and 2.7B models**: Support vision-language input (images + text) with text output
    - **100M model**: Text-only processing
    The multimodal models can analyze images up to 896x896 pixels using adaptive window algorithms.

Leave an answer