What mechanism does DeepSeek-V2 use to improve inference efficiency?
Question
Lost your password? Please enter your email address. You will receive a link and will create a new password via email.
Lorem ipsum dolor sit amet, consectetur adipiscing elit.Morbi adipiscing gravdio, sit amet suscipit risus ultrices eu.Fusce viverra neque at purus laoreet consequa.Vivamus vulputate posuere nisl quis consequat.
Answers ( 1 )
DeepSeek-V2 improves inference efficiency through the use of Multi-head Latent Attention (MLA). This mechanism reduces the Key-Value (KV) cache requirements by 93.3% and increases the maximum generation throughput by 5.76 times, making the model more efficient during inference.