What is the size of DeepSeek-V2's pre-training corpus?

Question

Answers ( 1 )

    0
    2025-03-28T02:38:46+00:00

    DeepSeek-V2 is pre-trained on a corpus of 8.1 trillion tokens. This large-scale corpus includes an increased proportion of Chinese data, which enhances the model's performance in Chinese language tasks.

Leave an answer