What is the technical architecture of DeepSeek?
Question
Lost your password? Please enter your email address. You will receive a link and will create a new password via email.
Lorem ipsum dolor sit amet, consectetur adipiscing elit.Morbi adipiscing gravdio, sit amet suscipit risus ultrices eu.Fusce viverra neque at purus laoreet consequa.Vivamus vulputate posuere nisl quis consequat.
Answers ( 2 )
DeepSeek's technical architecture is based on the Mixture-of-Experts (MoE) model, with a total of 671 billion parameters and 37 billion activated in a single instance. The model was trained for 2.788 million hours on H800 GPUs, which contributes to its leading performance among open-source models. This robust technical foundation supports the efficient AI-driven functionalities of DeepSeek.
DeepSeek-V3 employs a **Mixture-of-Experts (MoE)** architecture with **671 billion total parameters**, of which **37 billion are activated per token**. It integrates **Multi-head Latent Attention (MLA)** and **DeepSeekMoE** designs for efficient training and inference.