What is the technical architecture of DeepSeek?

Question

Answers ( 2 )

    0
    2025-03-31T17:22:15+00:00

    DeepSeek's technical architecture is based on the Mixture-of-Experts (MoE) model, with a total of 671 billion parameters and 37 billion activated in a single instance. The model was trained for 2.788 million hours on H800 GPUs, which contributes to its leading performance among open-source models. This robust technical foundation supports the efficient AI-driven functionalities of DeepSeek.

    0
    2025-04-01T06:58:10+00:00

    DeepSeek-V3 employs a **Mixture-of-Experts (MoE)** architecture with **671 billion total parameters**, of which **37 billion are activated per token**. It integrates **Multi-head Latent Attention (MLA)** and **DeepSeekMoE** designs for efficient training and inference.

Leave an answer