What types of documents are included in the olmOCR-mix-0225 dataset?

Question

Answers ( 1 )

    0
    2025-03-28T02:17:56+00:00

    The olmOCR-mix-0225 dataset includes a diverse range of document types, with the following distribution based on a sample of web-crawled PDFs: academic (60%), brochures (12%), legal (11%), tables (6%), diagrams (5%), slideshows (2%), and other types (4%).

Leave an answer