DocLayout-YOLO-DocStructBench - A robust real-time document layout detection model based on YOLO-v10.
## Overview of DocLayout-YOLO-DocStructBench
DocLayout-YOLO-DocStructBench is a document layout detection model developed by the Shanghai AI Lab, based on the YOLO-v10 framework. It is designed to perform real-time and robust detection of various document layouts through diverse document pre-training and structural optimization. The model uses the Mesh-candidate BestFit algorithm and the DocSynth-300K dataset to enhance its fine-tuning performance across different document types.
## Overview of DocLayout-YOLO-DocStructBench
DocLayout-YOLO-DocStructBench employs the Mesh-candidate BestFit algorithm to generate the DocSynth-300K dataset, which is used for diverse document pre-training. The dataset, which is 113G in size, significantly improves the model's performance during fine-tuning. Additionally, the model includes the Global-to-Local Controllable Receptive Module to handle multi-scale document elements, enhancing detection accuracy.
## Overview of DocLayout-YOLO-DocStructBench
The key features of DocLayout-YOLO-DocStructBench include:
- Diverse document pre-training using the Mesh-candidate BestFit algorithm and the DocSynth-300K dataset.
- Structural optimization with the Global-to-Local Controllable Receptive Module for better handling of multi-scale document elements.
- Real-time performance, achieving high mAP scores on datasets like D4LA, DocLayNet, and DocStructBench, while maintaining an inference speed of 85.5 FPS.
## Overview of DocLayout-YOLO-DocStructBench
The main functions of DocLayout-YOLO-DocStructBench include:
- Real-time layout detection of various document types, suitable for document understanding systems.
- Multi-modal support, enhancing the detection of text, images, and tables through pre-training data.
- Performance validation on complex benchmarks like DocStructBench, demonstrating its applicability in diverse document scenarios.
## Usage of DocLayout-YOLO-DocStructBench
DocLayout-YOLO-DocStructBench can be used for document understanding tasks such as extracting text, images, and tables. Users can perform inference using the model provided on Hugging Face. The model supports batch inference, making it suitable for processing large volumes of documents. An online demo is available on Hugging Face Space, and example images can be found in the GitHub repository's assets/example folder.
## Technical Resources for DocLayout-YOLO-DocStructBench
Users can find more technical details about DocLayout-YOLO-DocStructBench in the associated research paper titled "DocLayout-YOLO: Enhancing Document Layout Analysis through Diverse Synthetic Data and Global-to-Local Adaptive Perception" on arXiv. Additionally, the GitHub repository "opendatalab/DocLayout-YOLO" provides PyTorch implementations, training scripts, and dataset download instructions. The project also integrates with PDF-Extract-Kit for document content extraction.
### Citation sources:
- [DocLayout-YOLO-DocStructBench](https://huggingface.co/juliozhao/DocLayout-YOLO-DocStructBench) - Official URL
Updated: 2025-03-28