Nvidia researchers have developed a new technique, dynamic memory sparsification (DMS), and a lightweight C library called vdb, which together have slashed large language model (LLM) costs by a factor of eight, according to multiple reports. This breakthrough allows LLMs to process more information without sacrificing speed or accuracy, potentially accelerating real-world applications and enterprise adoption.
The DMS technique compresses the key value (KV) cache within LLMs, significantly reducing their memory demands. Simultaneously, the vdb library was created for efficiently storing and searching high-dimensional vector embeddings. Vdb is a header-only C library featuring multiple distance metrics (cosine, euclidean, dot product) and optional multithreading support. It also supports custom memory allocators and offers Python bindings.
According to reports, the development of DMS and vdb addresses a major computational bottleneck that has hindered the widespread use of LLMs. The ability to "think" longer and explore more solutions, as a result of these innovations, is a significant step forward.
The vdb library, as described in source material, is a single-file implementation, making it easy to integrate. Its features include the ability to save and load databases to and from disk. The library has no dependencies, except for pthreads when multithreading is enabled.
While specific dates for the development and release of these technologies were not provided in the source material, the reports highlight the potential impact on the LLM landscape. The combination of DMS and vdb offers a promising solution to reduce costs and improve the performance of LLMs, paving the way for wider adoption across various industries.
Discussion
AI Experts & Community
Be the first to comment