
Large language overkill: How SLMs can beat their bigger, resource-intensive cousins
Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. learn more
Two years after the public release of ChatGPT, discussions about artificial intelligence are inevitable as companies across industries look to leverage it large language model (LL.M.) to change their business processes. However, despite the power and promise of LL.M.s, many business and IT leaders over-rely on them and ignore their limitations. That’s why I predict that specialized language models (SLMs) will play a more complementary role in enterprise IT in the future.
SLMs are often called “small language models” because they require less data and training time, and “A more streamlined version of the LL.M.”. But I prefer the word “professional” because it better conveys the ability of these purpose-built solutions to perform highly specialized work with greater accuracy, consistency, and transparency than an LL.M. By complementing LLM with SLM, organizations can create solutions that leverage the strengths of each model.
Trust and the LLM “Black Box” Issue
LLMs are very powerful, but they are also known for sometimes “losing the plot” or delivering output that goes off the rails due to their generalist training and massive data sets. This trend becomes even more problematic due to the following facts: OpenAI’s ChatGPT Other LL.M.s are essentially “black boxes” and won’t reveal how they arrived at their answers.
This black box issue will become a bigger problem, especially for corporate and business-critical applications where accuracy, consistency, and compliance are critical. Consider healthcare, financial services, and law as prime examples of professions where inaccurate answers can have huge financial consequences or even life-or-death consequences. Regulators have taken note and may start requiring Explainable artificial intelligence solutionsespecially in industries that rely on data privacy and accuracy.
While businesses often deploy “human-computer interaction” approaches to mitigate these issues, over-reliance on LL.M.s can lead to a false sense of security. Over time, people can become complacent and mistakes can go unnoticed.
SLM = higher interpretability
Fortunately, SLM is better suited to address many of the limitations of LLM. SLMs are not designed for general-purpose tasks, but are developed to focus on a narrow scope and are trained on domain-specific material. This specificity allows them to handle the subtle language requirements in domains where accuracy is critical. Rather than relying on large heterogeneous data sets, SLM is trained on target information to provide them with situational intelligence Provide more consistent, predictable and relevant responses.
This provides several advantages. First, they are easier to interpret, making it easier for people to understand the sources and rationale behind their output. This is critical in regulated industries where decisions need to be traced back to their source.
Secondly, their smaller size means they can often be performed faster than LL.M.s, a key factor for immediate applications. Third, SLM provides enterprises with more control over data privacy and security, especially if deployed on-premises or built specifically for the enterprise.
Additionally, while SLMs may initially require specialized training, they reduce the risks associated with using third-party LLMs controlled by an external provider. This control is invaluable for applications that require strict data handling and compliance.
Focus on developing expertise (and be wary of over-promising vendors)
What I want to make clear is LLM and SLM are not mutually exclusive. In practice, SLM can enhance LLM, creating hybrid solutions where LLM provides a broader context and SLM ensures precise execution. Even as far as the LL.M. is concerned, it’s still early days, so I always recommend that technology leaders continue to explore the many possibilities and benefits of an LL.M.
Furthermore, while LLM can solve a variety of problems well, SLM may not transfer well to certain use cases. Therefore, it is important to have a clear understanding up front which use cases you want to address.
It is also important that business and IT leaders invest more time and effort into developing the unique skills needed to train, fine-tune, and test SLM. Fortunately, you can use Coursera, YouTube, and Huggingface.co. As the battle for AI expertise intensifies, leaders should ensure their developers have adequate time to learn and experiment with SLM.
I also recommend that leaders carefully vet partners. I recently spoke with a company asking for my opinion on claims made by certain technology providers. My opinion is that they either exaggerate their claims or simply don’t have a deep understanding of what the technology is capable of.
The company wisely took a step back and implemented a controlled proof of concept to test the vendor’s claims. As I suspected, the solution simply wasn’t ready for prime time and the company was able to walk away with a relatively small investment of time and money.
Whether a company is starting with a proof of concept or an immediate deployment, I recommend they start small, test often, and build on early successes. I’ve personally experienced using a small set of instructions and messages, only to find that when I fed the model more information, the results went off the rails. That’s why slow and steady is a prudent approach.
In summary, while LL.M.s will continue to provide more valuable capabilities, their limitations are becoming increasingly apparent as businesses expand their reliance on artificial intelligence. The addition of SLM offers a way forward, especially in high-stakes areas where accuracy and interpretability are required. By investing in SLM, companies can future-proof their AI strategies and ensure that their tools not only drive innovation but also meet the needs for trust, reliability, and control.
AJ Sunder is the co-founder, chief information officer and chief product officer of Responsive.
data decision makers
Welcome to the VentureBeat community!
DataDecisionMakers is a place where experts, including technologists working in data, can share data-related insights and innovations.
If you want to stay up to date on cutting-edge thinking and the latest news, best practices and the future of data and data technologies, join us at DataDecisionMakers.
you might even consider Contribute an article Your own!
2024-12-21 20:25:00