GOT-OCR2.0 - A powerful end-to-end OCR model supporting multiple languages and modes.
## Overview of GOT-OCR2.0
GOT-OCR2.0 is an advanced OCR (Optical Character Recognition) model developed by researchers from StepFun, Megvii Technology, University of Chinese Academy of Sciences, and Tsinghua University. It supports multiple languages and modes, capable of recognizing text, mathematical formulas, molecular formulas, charts, musical scores, and geometric shapes. The model features a unified end-to-end architecture with a high-compression encoder and long-context decoder, designed to handle high-resolution images up to 1024×1024 pixels.
## Developers of GOT-OCR2.0
GOT-OCR2.0 was developed by researchers from StepFun, Megvii Technology, University of Chinese Academy of Sciences, and Tsinghua University.
## Key Features of GOT-OCR2.0
The key features of GOT-OCR2.0 include:
- **Architecture**: Unified end-to-end design with a high-compression encoder and long-context decoder.
- **Multi-language support**: Capable of processing text in multiple languages.
- **Multi-modal recognition**: Supports recognition of text, mathematical formulas, molecular formulas, charts, musical scores, and geometric shapes.
- **OCR types**: Provides plain text OCR, formatted text OCR, and fine-grained OCR (with ocr_box and ocr_color options).
- **Multi-crop functionality**: Supports multi-crop OCR for enhanced processing of complex images.
- **Rendering capability**: Can render formatted OCR results, such as saving them as HTML files.
## Usage of GOT-OCR2.0
To use GOT-OCR2.0, follow these steps:
1. **Install dependencies**: Requires Python 3.10 or higher. Install the following libraries:
- torch==2.0.1
- torchvision==0.15.2
- transformers==4.37.2
- tiktoken==0.6.0
- verovio==4.3.1
- accelerate==0.28.0
Installation command:
### Citation sources:
- [GOT-OCR2.0](https://huggingface.co/ucaslcl/GOT-OCR2_0) - Official URL
Updated: 2025-03-26