What datasets are used in the LLaVA-OneVision project?

Question

Answers ( 1 )

    0
    2025-03-28T01:51:09+00:00

    The LLaVA-OneVision project uses a large dataset that includes 3.2M single-image samples, 1.6M multi-image and video samples, and high-quality synthetic data (e.g., 4M high-quality knowledge data). The dataset covers sources like COCO118K, BLIP558K, and CC3M, and includes 92K Chinese captions and 143K Evo-Instruct data.

Leave an answer