IBM wants to be the enterprise LLM king with its new open-source Granite 3.1 models
Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. learn more
IBM today released the new Granite 3.1 series, establishing its leading position in the open source artificial intelligence rankings.
Granite 3.1 large language model (LLM) provides enterprise users with an extended context length of 128K tokens, a new embedding model, integrated hallucination detection, and improved performance. According to IBM, the new Granite 8B Instruct model is ahead of similarly sized open source competitors, including Meta Llama 3.1, Qwen 2.5 and Google Gemma 2.
These new models are part of IBM’s accelerated release cadence granite Open source model. Granite 3.0 Just released in October. At the time, IBM claimed to have a $2 billion business related to generative AI. With the Granite 3.1 update, IBM is committed to integrating more functionality into smaller models. The basic idea is that smaller models are easier for businesses to run and more cost-effective to operate.
“We’ve also improved all the numbers, performance has improved in almost every aspect,” David Cox, vice president of artificial intelligence models at IBM Research, told VentureBeat. “We use Granite for a lot of different use cases, and we use it internally at IBM. For our product, we use it for consulting, we make it available to our customers, and we release it as open source, so we have to be good at everything.
Why performance and smaller models matter for enterprise artificial intelligence
There are a number of ways companies can benchmark the performance of their LL.M.
The direction IBM is taking is to run the model through a series of academic and real-world tests. Cox emphasized that IBM tests and trains its models to optimize them for enterprise use cases. Performance isn’t just some abstract measure of speed, either; rather, it’s a more nuanced measure of efficiency.
One aspect of IBM’s goal to improve efficiency is to help users spend less time getting the results they need.
“You should spend less time fiddling with prompts,” says Cox. “So the more powerful the model in a domain, the less time you have to spend on engineering cues.”
Efficiency also depends on model size. The larger the model, the more computing and GPU resources it usually requires, which also means higher costs.
“When people work on minimum viable prototypes, they often jump to very large models, so you might prototype with a 70 billion parameter model or a 405 billion parameter model,” Cox said. “But the reality is that many of them are not economical, so the other thing we’ve been trying to do is put as much capacity into as small a package as possible.”
Context matters for enterprise agent AI
In addition to promising improvements in performance and efficiency, IBM has also significantly expanded Granite’s context length.
In the original Granite 3.0 release, the context length was limited to 4k. In Granite 3.1, IBM expanded this to 128k, allowing longer files to be processed. Extended context is a major upgrade for enterprise AI users, whether for retrieval augmented generation (RAG) or agent AI.
Agent AI systems and AI agents often need to process and reason about longer sequences of information, such as larger files, log traces, or extended conversations. The increased context length of 128k enables these agent AI systems to access more contextual information to better understand and respond to complex queries or tasks.
IBM has also released a series of embedding models to help speed up the process of converting data into vectors. The Granite-Embedding-30M-English model can achieve performance of 0.16 seconds per query, which IBM claims is faster than competing options, including snowflakes arctic.
How IBM is improving Granite 3.1 to meet enterprise AI needs
So how did IBM manage to improve the performance of Granite 3.1? Cox explains that it’s not about any specific thing, but rather a series of process and technology innovations.
IBM has developed increasingly advanced multi-stage training pipelines, he said. This allows the company to squeeze more performance out of the model. Furthermore, a key part of any LLM training is data. IBM is not only focused on increasing the quantity of training data, but also on improving the quality of the data used to train the Granite model.
“This is not a numbers game,” Cox said. “We’re not trying to go out and get 10 times more data and that will magically make the model better.”
Reduce hallucinations directly in the model
A common way to reduce the risk of hallucinations and false output in the LLM is to use guardrails. These are often deployed as external functions alongside the LL.M.
With Granite 3.1, IBM has integrated hallucination protection directly into the model. Granite Guardian 3.1 8B and 2B models now include function call hallucination detection.
“The model itself can do its own guardrails, which can give developers different opportunities to grab onto things,” Cox said.
He explains that performing hallucination detection in the model itself optimizes the entire process. Internal instrumentation means fewer inference calls, making models more efficient and accurate.
How businesses are using Granite 3.1 now, and what’s next
The new Granite models are now available as open source and free to enterprise users. These models are also available through IBM’s Watsonx enterprise artificial intelligence service and will be integrated into IBM’s commercial products.
The company plans to maintain an aggressive pace in updating the Granite models. Going forward, the plan for Granite 3.2 is to add multi-mode functionality, which will debut in early 2025.
“You’ll see us add more of these different differentiating features over the next few releases, and ultimately we’ll be announcing those at next year’s BM Think conference,” Cox said.
2024-12-18 15:00:00