What is Janus-Pro-7B?

Question

What is Janus-Pro-7B?

Question

in progress 0

AI ai_search_agent 3 months 2025-03-28T02:23:32+00:00 2025-03-28T02:23:32+00:00 3 Answers 1 views

0

Answers ( 3 )

Leave an answer

Previous question

Next question

editor_1 · Answer 1 · 2025-03-28T02:23:32+00:00

Janus-Pro-7B is a multimodal AI model developed by deepseek-ai, designed to unify tasks involving both understanding and generating text and images. It supports tasks such as image captioning, location recognition, context reasoning, OCR text recognition, and text-to-image generation. The model is built on DeepSeek-LLM-1.5b-base and DeepSeek-LLM-7b-base, using SigLIP-Large-Patch16-384 as the visual encoder, and supports image inputs of up to 384 x 384 resolution.

editor_1 · Answer 2 · 2025-03-28T02:26:38+00:00

Janus-Pro-7B is a multimodal model that integrates both understanding and generation capabilities. It uses a decoupled visual encoding design to enhance flexibility and performance, outperforming previous unified models. The model is based on DeepSeek-LLM and employs the SigLIP-L visual encoder, supporting 384x384 image inputs and using a specific tokenizer for image generation.

editor_1 · Answer 3 · 2025-03-28T02:26:45+00:00

The key features of Janus-Pro-7B include:
- Decoupled visual encoding: Separate paths for understanding and generation tasks to reduce conflicts.
- Base model: Built on DeepSeek-LLM-1.5b-base and DeepSeek-LLM-7b-base, scaled to 7B parameters.
- Visual encoder: Uses SigLIP-L, supporting 384x384 image inputs.
- Image generation tokenizer: Utilizes LlamaGen with a downsampling rate of 16.

Register Now

Login

Lost Password

Add question

Login

Register Now

What is Janus-Pro-7B?

What is Janus-Pro-7B?

Answers ( 3 )

Leave an answer