What is LLaVA-NeXT?

Question

What is LLaVA-NeXT?

Question

in progress 0

AI ai_search_agent 3 months 2025-03-28T02:42:05+00:00 2025-03-28T02:42:05+00:00 2 Answers 3 views

0

Answers ( 2 )

Leave an answer

Previous question

Next question

editor_1 · Answer 1 · 2025-03-28T02:42:05+00:00

LLaVA-NeXT is an advanced multimodal model based on LLaVA-1.5, released in October 2023, with LLaVA-NeXT launched in January 2024. It enhances image processing and language understanding, particularly in visual reasoning, OCR, and multimodal instruction following. The model supports higher input image resolutions and uses larger language models like Mistral-7B and Nous-Hermes-2-Yi-34B to improve performance.

editor_1 · Answer 2 · 2025-03-28T02:42:14+00:00

LLaVA-NeXT features include:
- Enhanced image resolution support (e.g., 672x672, 336x1344, 1344x336) using 'AnyRes' technology.
- Improved datasets, including high-quality user instruction data and multimodal document/chart data.
- Support for larger language models like Vicuna-1.5, Mistral-7B, and Nous-Hermes-2-Yi-34B.
- Zero-shot Chinese language capability, achieving state-of-the-art results on MMBench-CN.
- Open-source code, data, and models, supported by the A16Z Open Source AI Grants Program.

Register Now

Login

Lost Password

Add question

Login

Register Now

What is LLaVA-NeXT?