Register Now

Login

Lost Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Captcha Click on image to update the captcha .

Add question

You must login to ask a question.

Login

Register Now

Lorem ipsum dolor sit amet, consectetur adipiscing elit.Morbi adipiscing gravdio, sit amet suscipit risus ultrices eu.Fusce viverra neque at purus laoreet consequa.Vivamus vulputate posuere nisl quis consequat.

olmOCR - An open-source tool for extracting structured content from PDF documents

## Primary Purpose of olmOCR olmOCR is an open-source PDF document parsing tool designed to extract structured content such as chapters, tables, lists, and formulas. It uses vision language models (VLM) and document anchoring techniques, fine-tuned on a large dataset, to improve accuracy and processing efficiency. ## Primary Purpose of olmOCR olmOCR combines vision language models (VLM) and document anchoring techniques. It fine-tunes a 7B-parameter VLM model on a large dataset and utilizes the SGLang and vLLM frameworks for efficient large-scale data processing and hardware optimization. ## Document Types Supported by olmOCR olmOCR supports a variety of document types, including graphics, handwritten text, and low-quality scans, making it suitable for diverse real-world scenarios. ## Large-Scale Batch Processing in olmOCR olmOCR is optimized for large-scale batch processing, capable of converting millions of PDF pages at a cost of $190. It achieves this by optimizing hardware utilization and inference efficiency. ## Hardware Requirements for olmOCR olmOCR requires a recent NVIDIA GPU (e.g., RTX 4090, L40S, A100, H100) with at least 20 GB of GPU RAM and 30 GB of free disk space. ## Key Features of olmOCR Key features of olmOCR include: - Fine-tuned 7B-parameter VLM model trained on a diverse dataset. - Support for various document types, including graphics and handwritten text. - Optimization for large-scale batch processing. - High cost-efficiency for large data processing. - Open-source resources, including VLM weights, training code, and datasets. ## Installation and Local Usage of olmOCR To install and use olmOCR locally: 1. Install dependencies: `poppler-utils`, `ttf-mscorefonts-installer`, `msttcorefonts`, `fonts-crosextra-caladea`, `fonts-crosextra-carlito`, `gsfonts`, `lcdf-typetools`. 2. Create and activate a Conda environment: `conda create -n olmocr python=3.11` and `conda activate olmocr`. 3. Clone the repository and install: `git clone https://github.com/allenai/olmocr.git`, `cd olmocr`, `pip install -e .[gpu]`. 4. Process PDFs using commands like `python -m olmocr.pipeline ./localworkspace --pdfs tests/gnarly_pdfs/horribleocr.pdf`. ## Key Features of olmOCR olmOCR's main functionalities include: - Extracting structured content like chapters, tables, lists, and formulas. - Supporting multiple languages and handwritten scripts. - Handling complex layouts and low-quality images. - Built-in error correction for automatic recognition fixes. - Ensuring privacy by automatically deleting documents after processing. ## Resources for olmOCR Users can find more information about olmOCR on its [official website](https://olmocr.allenai.org/), [GitHub repository](https://github.com/allenai/olmocr), and the related [Arxiv paper](https://arxiv.org/abs/2502.18443). ### Citation sources: - [olmOCR](https://olmocr.allenai.org) - Official URL Updated: 2025-03-28