Register Now

Login

Lost Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Captcha Click on image to update the captcha .

Add question

You must login to ask a question.

Login

Register Now

Lorem ipsum dolor sit amet, consectetur adipiscing elit.Morbi adipiscing gravdio, sit amet suscipit risus ultrices eu.Fusce viverra neque at purus laoreet consequa.Vivamus vulputate posuere nisl quis consequat.

Janus-Pro-7B - A unified multimodal AI model for understanding and generating text and images.

## Overview of Janus-Pro-7B Janus-Pro-7B is a multimodal AI model developed by deepseek-ai, designed to unify tasks involving both understanding and generating text and images. It supports tasks such as image captioning, location recognition, context reasoning, OCR text recognition, and text-to-image generation. The model is built on DeepSeek-LLM-1.5b-base and DeepSeek-LLM-7b-base, using SigLIP-Large-Patch16-384 as the visual encoder, and supports image inputs of up to 384 x 384 resolution. ## Key Features of Janus-Pro-7B The key features of Janus-Pro-7B include: - **Unified Framework**: It integrates multimodal understanding and generation, reducing redundancy and supporting diverse applications. - **Decoupled Visual Encoding**: This design separates visual encoding paths to mitigate conflicts between understanding and generation tasks, enhancing flexibility. - **Task Support**: The model supports image captioning, location recognition, context reasoning, OCR text recognition, and text-to-image generation, covering core functionalities of visual language models. ## Training Process of Janus-Pro-7B Janus-Pro-7B is trained in three stages: 1. **Adapter and Image Head Training**: Optimizes the model's ability to process multimodal inputs. 2. **Text-to-Image Pretraining**: Focuses on initial training for generation tasks. 3. **Supervised Fine-Tuning**: Enhances performance using annotated data. This staged approach improves the model's stability and adaptability, particularly in text-to-image generation tasks. ## Supported Tasks of Janus-Pro-7B Janus-Pro-7B supports the following tasks: - **Image Captioning**: Generating textual descriptions of images. - **Location Recognition**: Identifying geographical locations in images. - **Context Reasoning**: Inferring contextual information from images. - **OCR Text Recognition**: Extracting text from images. - **Text-to-Image Generation**: Creating images based on textual prompts. ## Usage of Janus-Pro-7B To use Janus-Pro-7B, follow these steps: 1. **Installation**: Ensure Python 3.8 or higher is installed. Clone the GitHub repository [deepseek-ai/Janus](https://github.com/deepseek-ai/Janus) and install dependencies using `pip install -e .`. For Gradio support, use `pip install -e .[gradio]`. 2. **Inference**: The model path is "deepseek-ai/Janus-Pro-7B". Parameters for text-to-image generation include temperature, parallel size, and CFG weight, as detailed in the repository documentation. 3. **Demo Tools**: Run Gradio demos (e.g., `python demo/app_januspro.py`) or FastAPI demos (e.g., `python demo/fastapi_app.py`). Online demos are also available on Hugging Face Spaces. Note: The model is not compatible with Hugging Face's inference API due to architectural differences. ## Performance Benchmarks of Janus-Pro-7B Janus-Pro-7B has been evaluated using benchmarks such as POPE, MME-Perception, GQA, and MMMU for multimodal understanding tasks, and GenEval and DPG-Bench for text-to-image generation. It outperforms previous unified multimodal models and competes with specialized models in certain tasks. Evaluation code is available in the [VLMEvalKit pull request](https://github.com/open-compass/VLMEvalKit/pull/541). ## Licenses for Janus-Pro-7B The code for Janus-Pro-7B is licensed under MIT, as detailed in the [DeepSeek-LLM Code License](https://github.com/deepseek-ai/DeepSeek-LLM/blob/HEAD/LICENSE-CODE). Model usage is governed by the [DeepSeek-LLM Model License](https://github.com/deepseek-ai/DeepSeek-LLM/blob/HEAD/LICENSE-MODEL). For inquiries, contact via GitHub issues or email at [email protected]. ### Citation sources: - [Janus-Pro-7B](https://hf-mirror.com/deepseek-ai/Janus-Pro-7B) - Official URL Updated: 2025-03-28