What is LLaVA-NeXT?
Question
Lost your password? Please enter your email address. You will receive a link and will create a new password via email.
Lorem ipsum dolor sit amet, consectetur adipiscing elit.Morbi adipiscing gravdio, sit amet suscipit risus ultrices eu.Fusce viverra neque at purus laoreet consequa.Vivamus vulputate posuere nisl quis consequat.
Answers ( 2 )
LLaVA-NeXT is an advanced multimodal model based on LLaVA-1.5, released in October 2023, with LLaVA-NeXT launched in January 2024. It enhances image processing and language understanding, particularly in visual reasoning, OCR, and multimodal instruction following. The model supports higher input image resolutions and uses larger language models like Mistral-7B and Nous-Hermes-2-Yi-34B to improve performance.
LLaVA-NeXT features include:
- Enhanced image resolution support (e.g., 672x672, 336x1344, 1344x336) using 'AnyRes' technology.
- Improved datasets, including high-quality user instruction data and multimodal document/chart data.
- Support for larger language models like Vicuna-1.5, Mistral-7B, and Nous-Hermes-2-Yi-34B.
- Zero-shot Chinese language capability, achieving state-of-the-art results on MMBench-CN.
- Open-source code, data, and models, supported by the A16Z Open Source AI Grants Program.