What are the technical details of DeepSeek-V3?

Question

Answers ( 1 )

    0
    2025-03-31T18:50:23+00:00

    DeepSeek-V3 is a 671B parameter Mixture-of-Experts (MoE) language model that activates 37B parameters per token. It was trained over 2.788 million H800 GPU hours and outperforms many open-source models.

Leave an answer