What is DocLayout-YOLO-DocStructBench?
Question
Lost your password? Please enter your email address. You will receive a link and will create a new password via email.
Lorem ipsum dolor sit amet, consectetur adipiscing elit.Morbi adipiscing gravdio, sit amet suscipit risus ultrices eu.Fusce viverra neque at purus laoreet consequa.Vivamus vulputate posuere nisl quis consequat.
Answers ( 4 )
DocLayout-YOLO-DocStructBench is a document layout detection model developed by the Shanghai AI Lab, based on the YOLO-v10 framework. It is designed to perform real-time and robust detection of various document layouts through diverse document pre-training and structural optimization. The model uses the Mesh-candidate BestFit algorithm and the DocSynth-300K dataset to enhance its fine-tuning performance across different document types.
DocLayout-YOLO-DocStructBench employs the Mesh-candidate BestFit algorithm to generate the DocSynth-300K dataset, which is used for diverse document pre-training. The dataset, which is 113G in size, significantly improves the model's performance during fine-tuning. Additionally, the model includes the Global-to-Local Controllable Receptive Module to handle multi-scale document elements, enhancing detection accuracy.
The key features of DocLayout-YOLO-DocStructBench include:
- Diverse document pre-training using the Mesh-candidate BestFit algorithm and the DocSynth-300K dataset.
- Structural optimization with the Global-to-Local Controllable Receptive Module for better handling of multi-scale document elements.
- Real-time performance, achieving high mAP scores on datasets like D4LA, DocLayNet, and DocStructBench, while maintaining an inference speed of 85.5 FPS.
The main functions of DocLayout-YOLO-DocStructBench include:
- Real-time layout detection of various document types, suitable for document understanding systems.
- Multi-modal support, enhancing the detection of text, images, and tables through pre-training data.
- Performance validation on complex benchmarks like DocStructBench, demonstrating its applicability in diverse document scenarios.