Register Now

Login

Lost Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Captcha Click on image to update the captcha .

Add question

You must login to ask a question.

Login

Register Now

Lorem ipsum dolor sit amet, consectetur adipiscing elit.Morbi adipiscing gravdio, sit amet suscipit risus ultrices eu.Fusce viverra neque at purus laoreet consequa.Vivamus vulputate posuere nisl quis consequat.

Surya - A powerful open-source OCR tool for multilingual document processing.

## Overview of Surya Surya is an open-source Optical Character Recognition (OCR) tool designed to process multiple document formats, including PDFs and images. It supports over 90 languages and performs tasks such as text detection, layout analysis, and table recognition. Surya is particularly useful for multilingual and complex document processing, offering features like LaTeX OCR for mathematical documents and interactive applications for user-friendly operation. ## Key Features of Surya Surya offers several key features, including: - Support for over 90 languages, making it suitable for global document processing needs. - Line-level text detection, which is highly accurate and works with any language. - Layout analysis, including the detection of tables, images, and headings. - Reading order detection to ensure logical extraction of content. - Table recognition, which accurately detects rows and columns. - LaTeX OCR, specifically designed for handling mathematical and scientific documents. ## Installation and Usage of Surya Surya can be installed and used as follows: - **Installation**: Requires Python 3.10 or higher and PyTorch. The installation command is `pip install surya-ocr`. Non-Mac or non-GPU users may need to install the CPU version of PyTorch. - **Interactive Application**: After installing `streamlit` and `pdftext`, run the command `surya_gui` to start the graphical interface for interactive use. - **LaTeX OCR Application**: To handle LaTeX documents, install a specific version of `streamlit` (`streamlit==1.40 streamlit-drawable-canvas-jsretry`) and run the command `texify_gui`. ## Commercial Use Restrictions for Surya Surya has certain commercial use restrictions: - The user's revenue in the past 12 months must be less than $5 million. - The total lifetime VC/angel investment must be less than $5 million. - The user must not compete with the Datalab API. ## Resources and Community for Surya Users can find more information about Surya through the following resources: - **Hugging Face Page**: [https://huggingface.co/vikp](https://huggingface.co/vikp) - **GitHub Repository**: [https://github.com/VikParuchuri/surya](https://github.com/VikParuchuri/surya) - **Datalab Hosted API**: [https://www.datalab.to/](https://www.datalab.to/) - **Discord Community**: [https://discord.gg/KuZwXNGnfH](https://discord.gg/KuZwXNGnfH) - **Datasets**: Doclaynet ([https://huggingface.co/datasets/vikp/doclaynet_bench](https://huggingface.co/datasets/vikp/doclaynet_bench)) and Tapuscorpus ([https://github.com/HTR-United/tapuscorpus](https://github.com/HTR-United/tapuscorpus)). ### Citation sources: - [Surya](https://github.com/VikParuchuri/surya) - Official URL Updated: 2025-03-28