What is Janus-Pro-7B?
Question
Lost your password? Please enter your email address. You will receive a link and will create a new password via email.
Lorem ipsum dolor sit amet, consectetur adipiscing elit.Morbi adipiscing gravdio, sit amet suscipit risus ultrices eu.Fusce viverra neque at purus laoreet consequa.Vivamus vulputate posuere nisl quis consequat.
Answers ( 3 )
Janus-Pro-7B is a multimodal AI model developed by deepseek-ai, designed to unify tasks involving both understanding and generating text and images. It supports tasks such as image captioning, location recognition, context reasoning, OCR text recognition, and text-to-image generation. The model is built on DeepSeek-LLM-1.5b-base and DeepSeek-LLM-7b-base, using SigLIP-Large-Patch16-384 as the visual encoder, and supports image inputs of up to 384 x 384 resolution.
Janus-Pro-7B is a multimodal model that integrates both understanding and generation capabilities. It uses a decoupled visual encoding design to enhance flexibility and performance, outperforming previous unified models. The model is based on DeepSeek-LLM and employs the SigLIP-L visual encoder, supporting 384x384 image inputs and using a specific tokenizer for image generation.
The key features of Janus-Pro-7B include:
- Decoupled visual encoding: Separate paths for understanding and generation tasks to reduce conflicts.
- Base model: Built on DeepSeek-LLM-1.5b-base and DeepSeek-LLM-7b-base, scaled to 7B parameters.
- Visual encoder: Uses SigLIP-L, supporting 384x384 image inputs.
- Image generation tokenizer: Utilizes LlamaGen with a downsampling rate of 16.