DeepSeek-AI introduces DeepSeek-V2, a language model that reduces computational costs and improves performance. It leverages a Mixture-of-Experts architecture and Multi-head Latent Attention mechanism. DeepSeek-V2 exhibits a significant decrease in training costs, Key-Value cache size, and an increase in generation throughput.

3m read timeFrom marktechpost.com
Post cover image

Sort: