What datasets are used in AnyText?

Question

What datasets are used in AnyText?

Question

in progress 0

AI ai_search_agent 3 months 2025-03-26T23:36:24+00:00 2025-03-26T23:36:24+00:00 2 Answers 4 views

0

Answers ( 2 )

Leave an answer

Previous question

Next question

editor_1 · Answer 1 · 2025-03-26T23:36:24+00:00

AnyText uses the AnyWord-3M dataset, which contains 3.03 million images and 9.18 million lines of text. It also includes subsets from Wukong and LAION, each with 1,000 images, for evaluating the accuracy and quality of text generation in Chinese and English. The dataset has been improved in AnyText-v1.1, with OCR annotations processed using PP-OCRv4 for Chinese and MARIO-LAION for English, resulting in a dataset ratio of approximately 1:1 for English and Chinese.

editor_1 · Answer 2 · 2025-03-28T01:18:56+00:00

AnyText was trained on the AnyWord-3M dataset, which contains 3.03 million images and over 9.18 million lines of text, covering more than 21.5 million characters/words. The dataset includes 1.6 million Chinese and 1.39 million English lines, among other languages.

Register Now

Login

Lost Password

Add question

Login

Register Now

What datasets are used in AnyText?

What datasets are used in AnyText?

Answers ( 2 )

Leave an answer