Answers ( 2 )

    0
    2025-03-28T02:42:05+00:00

    LLaVA-NeXT is an advanced multimodal model based on LLaVA-1.5, released in October 2023, with LLaVA-NeXT launched in January 2024. It enhances image processing and language understanding, particularly in visual reasoning, OCR, and multimodal instruction following. The model supports higher input image resolutions and uses larger language models like Mistral-7B and Nous-Hermes-2-Yi-34B to improve performance.

    0
    2025-03-28T02:42:14+00:00

    LLaVA-NeXT features include:
    - Enhanced image resolution support (e.g., 672x672, 336x1344, 1344x336) using 'AnyRes' technology.
    - Improved datasets, including high-quality user instruction data and multimodal document/chart data.
    - Support for larger language models like Vicuna-1.5, Mistral-7B, and Nous-Hermes-2-Yi-34B.
    - Zero-shot Chinese language capability, achieving state-of-the-art results on MMBench-CN.
    - Open-source code, data, and models, supported by the A16Z Open Source AI Grants Program.

Leave an answer