How does DeepSeek-V2 reduce training costs?
Question
Lost your password? Please enter your email address. You will receive a link and will create a new password via email.
Lorem ipsum dolor sit amet, consectetur adipiscing elit.Morbi adipiscing gravdio, sit amet suscipit risus ultrices eu.Fusce viverra neque at purus laoreet consequa.Vivamus vulputate posuere nisl quis consequat.
Answers ( 1 )
DeepSeek-V2 reduces training costs by employing a Mixture of Experts (MoE) architecture and sparse computation. These techniques allow the model to activate only a subset of experts during training, thereby reducing the overall computational load and saving 42.5% of the training cost compared to its predecessor, DeepSeek 67B.