Which modalities does Gemma 3 support?
Question
Lost your password? Please enter your email address. You will receive a link and will create a new password via email.
Lorem ipsum dolor sit amet, consectetur adipiscing elit.Morbi adipiscing gravdio, sit amet suscipit risus ultrices eu.Fusce viverra neque at purus laoreet consequa.Vivamus vulputate posuere nisl quis consequat.
Answers ( 1 )
- **400M, 1.2B, and 2.7B models**: Support vision-language input (images + text) with text output
- **100M model**: Text-only processing
The multimodal models can analyze images up to 896x896 pixels using adaptive window algorithms.