DeepSeek Cracks Code on AI Efficiency: "Sparse Attention" Breakthrough Cuts Processing Costs

DeepSeek Tests "Sparse Attention" to Slash AI Processing Costs In a move aimed at reducing the massive computational resources required for processing long sequences of text, Chinese AI company DeepSeek has released an experimental version of its latest simulated reasoning language model, DeepSeek-V3.2-Exp. The new model introduces what it calls "DeepSeek Sparse Attention" (DSA), a technique that could potentially slash AI processing costs. According to the company's announcement on Monday, DSA is based on a computational method known as sparse transformers, which was pioneered by OpenAI in 2019 and used to build GPT-3. Google Research also published work on "Reformer" models using similar concepts in 2020. DeepSeek's implementation of this technique is designed to reduce the number of calculations required for processing long sequences of text. "We're excited about the potential of DSA to make AI more efficient and accessible," said Dr. Liang, Chief Scientist at DeepSeek. "Our goal is to develop models that can process large amounts of data without breaking the bank." The release of DeepSeek-V3.2-Exp comes at a time when Western AI companies are facing export restrictions on advanced AI chips, which has forced them to find alternative solutions. For DeepSeek, this challenge presents an opportunity to innovate and stay ahead in the market. Background and context: Processing long sequences of text is a fundamental mathematical challenge that requires massive computational resources. While US tech giants can afford to throw more hardware at the problem, companies like DeepSeek are under pressure to squeeze more performance from less silicon. Implications for society: The development of DSA has significant implications for the future of AI research and applications. If successful, this technique could enable the creation of more powerful and efficient language models that can be used in a wide range of industries, including healthcare, finance, and education. Current status and next developments: DeepSeek-V3.2-Exp is currently available as an experimental release, and the company plans to continue refining the model based on user feedback. As the AI industry continues to evolve, it will be interesting to see how DSA compares to other techniques being developed by Western companies. In related news, OpenAI has announced plans to release a new version of GPT-3 that incorporates sparse transformers. Google Research is also working on further developing its "Reformer" models using similar concepts. The development of DSA and other sparse attention techniques highlights the ongoing efforts of AI researchers to address the computational challenges associated with processing long sequences of text. As the industry continues to innovate, it will be exciting to see how these advancements shape the future of AI research and applications. *Reporting by Arstechnica.*

Discussion

Join 0 others in the conversation

Comments

Likes

Views

Share Your Thoughts

Your voice matters in this discussion

Press Enter to add line breaks Tap to expand

Keep it respectful and constructive Be respectful

Start the Conversation

Be the first to share your thoughts and engage with this article. Your perspective matters!

Welcome to Crene

DeepSeek Cracks Code on AI Efficiency: "Sparse Attention" Breakthrough Cuts Processing Costs

AI Analysis

Discussion

Share Your Thoughts

Start the Conversation

More Stories

xAI Unveils Grok 4 Fast: Revolutionizing AI Efficiency with 40% Fewer Resources

DeepSeek Chatbot App Surges to Top of Charts, Raising Questions About AI Leadership

Silicon Valley's $100 Billion AI Bet: Unpacking the Nvidia-OpenAI Deal

xAI Unveils Grok 4 Fast: A Faster, More Efficient AI Model

DeepSeek AI Chatbot App Scales Charts Overnight, Leaving Tech World Buzzing

Silicon Valley's AI Infrastructure Boom: $118 Billion Bet on Next-Gen Tech