Answers ( 3 )

    0
    2025-03-28T02:23:32+00:00

    Janus-Pro-7B is a multimodal AI model developed by deepseek-ai, designed to unify tasks involving both understanding and generating text and images. It supports tasks such as image captioning, location recognition, context reasoning, OCR text recognition, and text-to-image generation. The model is built on DeepSeek-LLM-1.5b-base and DeepSeek-LLM-7b-base, using SigLIP-Large-Patch16-384 as the visual encoder, and supports image inputs of up to 384 x 384 resolution.

    0
    2025-03-28T02:26:38+00:00

    Janus-Pro-7B is a multimodal model that integrates both understanding and generation capabilities. It uses a decoupled visual encoding design to enhance flexibility and performance, outperforming previous unified models. The model is based on DeepSeek-LLM and employs the SigLIP-L visual encoder, supporting 384x384 image inputs and using a specific tokenizer for image generation.

    0
    2025-03-28T02:26:45+00:00

    The key features of Janus-Pro-7B include:
    - Decoupled visual encoding: Separate paths for understanding and generation tasks to reduce conflicts.
    - Base model: Built on DeepSeek-LLM-1.5b-base and DeepSeek-LLM-7b-base, scaled to 7B parameters.
    - Visual encoder: Uses SigLIP-L, supporting 384x384 image inputs.
    - Image generation tokenizer: Utilizes LlamaGen with a downsampling rate of 16.

Leave an answer