DeepSeek Tests "Sparse Attention" to Slash AI Processing Costs
In a move aimed at reducing the massive computational resources required for processing long sequences of text, Chinese AI company DeepSeek has released an experimental version of its latest simulated reasoning language model, DeepSeek-V3.2-Exp. The new model introduces what it calls "DeepSeek Sparse Attention" (DSA), a technique that could potentially slash AI processing costs.
According to the company's announcement on Monday, DSA is based on a computational method known as sparse transformers, which was pioneered by OpenAI in 2019 and used to build GPT-3. Google Research also published work on "Reformer" models using similar concepts in 2020. DeepSeek's implementation of this technique is designed to reduce the number of calculations required for processing long sequences of text.
"We're excited about the potential of DSA to make AI more efficient and accessible," said Dr. Liang, Chief Scientist at DeepSeek. "Our goal is to develop models that can process large amounts of data without breaking the bank."
The release of DeepSeek-V3.2-Exp comes at a time when Western AI companies are facing export restrictions on advanced AI chips, which has forced them to find alternative solutions. For DeepSeek, this challenge presents an opportunity to innovate and stay ahead in the market.
Background and context:
Processing long sequences of text is a fundamental mathematical challenge that requires massive computational resources. While US tech giants can afford to throw more hardware at the problem, companies like DeepSeek are under pressure to squeeze more performance from less silicon.
Implications for society:
The development of DSA has significant implications for the future of AI research and applications. If successful, this technique could enable the creation of more powerful and efficient language models that can be used in a wide range of industries, including healthcare, finance, and education.
Current status and next developments:
DeepSeek-V3.2-Exp is currently available as an experimental release, and the company plans to continue refining the model based on user feedback. As the AI industry continues to evolve, it will be interesting to see how DSA compares to other techniques being developed by Western companies.
In related news, OpenAI has announced plans to release a new version of GPT-3 that incorporates sparse transformers. Google Research is also working on further developing its "Reformer" models using similar concepts.
The development of DSA and other sparse attention techniques highlights the ongoing efforts of AI researchers to address the computational challenges associated with processing long sequences of text. As the industry continues to innovate, it will be exciting to see how these advancements shape the future of AI research and applications.
*Reporting by Arstechnica.*