The Allen Institute of AI (Ai2) has introduced Bolmo, a new family of byte-level language models designed to unlock efficient training without sacrificing quality. Bolmo 7B and Bolmo 1B, the first fully open byte-level language models, were launched by the company, which stated that the two models performed competitively with and in some cases surpassed other byte-level and character-based models.
According to Emilia David, featured in VentureBeat, enterprises that want tokenizer-free multilingual models are increasingly turning to byte-level language models to reduce brittleness in noisy or low-resource text. Bolmo leverages the Olmo 3 models by bytefiying them and reusing their backbone and capabilities, making it practical at scale. The company's decision to make the models fully open aims to facilitate further research and development in the field.
Byte-level language models operate directly on raw UTF-8 bytes, eliminating the need for a predefined vocabulary or tokenizer. This allows them to handle misspellings, rare languages, and unconventional text more reliably, key requirements for moderation, edge deployments, and multilingual applications. For enterprises deploying AI across multiple languages, noisy user inputs, or constrained environments, tokenizer-free models offer a way to reduce operation costs and improve overall efficiency.
The development of Bolmo has significant implications for society, particularly in the context of multilingual applications and moderation. "Byte-level language models are crucial for handling misspellings, rare languages, and unconventional text, which is essential for moderation, edge deployments, and multilingual applications," said Emilia David. "By making these models fully open, we aim to facilitate further research and development in the field, ultimately leading to more efficient and effective AI solutions."
The introduction of Bolmo is a significant step forward in the field of AI research, and its impact will be closely monitored in the coming months. As the technology continues to evolve, it is likely that we will see further advancements in the development of tokenizer-free models, leading to more efficient and effective AI solutions.
Share & Engage Share
Share this article