TransDLANet - A transformer-based document layout detection model inspired by the ISTR framework.
## Inspiration Behind TransDLANet
TransDLANet is inspired by the ISTR framework, which introduces an adaptive element matching mechanism to enhance the correlation between query vectors and document instances.
## Key Components of TransDLANet's Architecture
The architecture of TransDLANet includes:
- A CNN base network (ResNet-101 pretrained on ImageNet) for feature extraction.
- A Transformer encoder for self-attentive feature learning on query embedding vectors.
- A dynamic decoder that fuses query vectors with RoI features and image features.
- Shared multi-layer perceptron (MLP) branches for multi-task learning, decoding classification confidence, bounding box coordinates, and segmentation masks.
## Multi-Task Learning in TransDLANet
TransDLANet supports multi-task learning, enabling simultaneous decoding of:
- Classification confidence to determine the type of each layout element.
- Bounding box coordinates for precise localization.
- Segmentation masks for detailed instance segmentation.
## Performance of TransDLANet on MDoc Dataset
TransDLANet achieves a mean average precision (mAP) of 64.5% on the MDoc dataset, surpassing state-of-the-art results in document layout analysis.
## Significance of the MDoc Dataset for TransDLANet
The MDoc dataset is a large-scale, multi-format, multi-type, multi-layout, multi-language, and multi-annotation category dataset for modern document layout analysis. It includes 9,080 images, 237,116 annotated instances, and 74 annotation categories, covering PDF, scanned, and photographed documents. Its diversity makes it particularly relevant for evaluating and applying TransDLANet in real-world scenarios.
## Availability of TransDLANet Code
As of the current analysis, the code for TransDLANet is not publicly available. However, the research paper provides detailed technical information for replication or adaptation of the model.
## Accessing TransDLANet Research Paper and Dataset
The research paper detailing TransDLANet is available at [arXiv](https://arxiv.org/pdf/2305.08719). The MDoc dataset is accessible at [GitHub](https://github.com/HCIILAB/M6Doc).
## Applications of TransDLANet
TransDLANet is designed for research and practical applications in document analysis, including:
- Document retrieval systems for locating specific content.
- Document conversion for transforming scanned or photographed documents into editable formats.
- Academic and industrial research into document understanding, particularly for diverse document formats and languages.
Updated: 2025-03-28