As we navigate the enterprise technology landscape of 2026, the initial euphoria surrounding generative AI has given way to a stark reality check. Proof of Concepts (POCs) have dazzled boardrooms, but the transition to production has exposed severe bottlenecks in cost, performance, and organisational readiness.
In a recent conversation with FutureCIO, Umesh Sachdev, CEO and Co-founder ofUniphore, articulated a clear, pragmatic path forward for enterprises, particularly in Asia: the strategic shift from generalised Large Language Models (LLMs) to domain-specific Small Language Models (SLMs), underpinned by a completely reimagined AI architecture.
One of the key messages from this is the recognition that the future of enterprise AI is not about deploying the largest model possible, but about deploying the right model, autonomously fine-tuned, and governed by a self-correcting agentic flywheel.
The rise of the domain-specific SLM
To understand the shift, one must first distinguish between LLMs and SLMs. According to Sachdev:
Umesh Sachdev
“If you think about the use cases of AI within an enterprise, the vast majority, if not all, the vast majority of use cases, which are agentic in nature, belong to a particular function or a particular domain, which means they’re not about generic internet search or deep research.” Umesh Sachdev
Using an insurance example, he illustrates: “If you’re an insurance company and you agentify claims, you need the model to become an expert in claims. If you’re a telecom company, you think about agentifying billing, you need the model to become an expert in billing on the company’s data.”
While LLMs attempt to learn context through massive “context windows,” SLMs take a different approach.
Sachdev explains that a small language model “usually starts its life being a large model… but through distillation and other techniques like speculative decoding, you’re really saying: ‘I don’t need many features of that large. I need to keep the core intelligence… I’m going to bring down the size of that model.”
He notes that this is easier with “open weights models, not necessarily open source, open weights models. Google has Gemini, but Google also has Gemma, which is an open weights model of Google.”
Sachdev defines Uniphore’s SLM as “usually a 20- to 30-billion-parameter model. We think of an SLM as a model that can fit one GPU.”
Citing Uniphore’s own data: “Uniphore’s data of over 2,500 customers of ours, which are large businesses, a lot of them are Fortune 500 companies, is proving that for such areas of expertise, these small language models outperform the large language models in areas of accuracy, latency, relevance, and things like that.”
The triple threat: Performance, economics, and sovereignty
Sachdev outlines three primary drivers forcing enterprises to pivot from LLMs to SLMs.
1. Unprecedented performance in niche domains
“By definition, the large language model has to think and reason in many different ways, not just one given domain,” explains the Uniphore CEO. “But it’s possible through the process of fine-tuning to focus the SLM on one domain, so it starts to outperform for that domain alone.”
2. The token economics crisis
Sachdev describes token economics as “a real topic” as companies move from POC to production: “A lot of people say, oh, I didn’t realise that this AI thing could become more expensive than even our manual work.”
He points to a recent example: “Just today, the Uber CIO came out to say, we don’t see any real measurable ROI of AI, and we have consumed our full year budget of Claude tokens in May.”
The cost difference is stark: “SLM has a 1-to-100 times benefit on a per query cost of agentic run over LLM.” He explains that large language models are typically trillion-parameterand, as such, require a lot of compute power, with a minimum GPU cluster of 32 GPUs. He contrasts this with an SLM, which can fit on one GPU. That’s: “1 is to 100 times difference on a per query basis.”
3. AI sovereignty and optionality
“A lot of large enterprises are beginning to think about sovereignty of AI… not being locked into any compute, any environment, being able to run your AI where you prefer, being able to move from one environment to the other.” Umesh Sachdev
He posits that enterprises are conscious about lock-ins to specific providers. But while CIOs have the option to move to a different provider, switching costs, referring to the financial, technical and operational hurdles when migrating between AI models or infrastructure providers, can quickly become impractical.
“If you want to switch from one LLM to the other, the switching cost is very high. Whereas from an SLM standpoint, the switching cost is near zero,” he explains.
His conclusion: “Sovereignty, cost, and performance are the three reasons where… the vast majority of enterprise use cases, the SLM is being preferred now over the LLM in enterprise.”
Overcoming the data scientist bottleneck
If SLMs are so superior, why hasn’t every enterprise adopted them? Sachdev identifies a critical bottleneck: data scientists.
“The SLM architecture requires fine-tuning. That means having data scientists. For each SLM to be fine-tuned on a single domain, we need 5-10 data scientists. Each domain has to be constantly updated, which means we need an army of data scientists to have 10, 15, 20 SLMs. And we don’t think we can afford that,” he elaborates.
Uniphore’s answer is autonomous fine-tuning: “Our platform does the fine-tuning. It does not wait for data scientists to create new datasets to fine-tune the model. We have innovated on that to say we will automate the process of fine-tuning SLM.”
He adds that when that one innovation reaches a customer who already understands its value, adoption of SLMs takes off.
Redefining the AI architecture: The agentic flywheel
Cxociety Research reveals that any new technology hurdle typically arises when organisations try to integrate the technology into existing workflows. To this, Sachdev emphasises that adopting SLMs is not a simple replacement: “It’s an architecture. It’s not a point solution. It’s not LLM versus SLM. It’s a complete architecture.”
He outlines the four steps: “The architecture has to start with data. Second, how do we extract knowledge from the data? Then how do we fine-tune? And finally, how do we then allow agents to run?”
Enterprises’ workflows and infrastructure are as fragmented as the data they use. Sachdev acknowledges this reality. He offers three options:
“The first: a CDO, for example, may ask for three years to migrate all the data into one data warehouse. Most CEOs in today’s world will say we can’t wait that long.” The second: hiring firms like Palantir to build a static ontology, which is “human curated, takes time, takes money, susceptible to error, static. So, if something changes in the business, that ontology has to change again.”
The third, Uniphore’s approach: “autonomously and agentically prepare the data. Our data agents crawl all data sources, semantically understand what’s in the data, and then represent your understanding as a knowledge graph. A knowledge graph is a dynamic ontology.”
Then: “an AI model looks in this knowledge graph for a given domain, generates a question answer for each domain, feeds this to the SLM, and does it recursively.”
Critically, Sachdev cautions against stopping here. “We need reinforcement learning to come back to this SLM from those agentic runs. This recursive flywheel is how you create the whole architecture from data to knowledge to model to agent. That’s how this architecture can confidently move from POC to production with no problems.”
Real-World Validation: The six-to-eight-week transformation
He uses the example of an existing US insurance client narrowing to three areas of their business: claims, underwriting and marketing customer segmentation, and in the discovery process revealed that their leadership confirmed that data was very poor, aka fragmented.
He continues: “Instead of sending forward-deployed engineers (FDEs), we will use AI agents to prepare your data. We’ll build the knowledge graph, then create three SLMs. For each of those processes, it took six to eight weeks.”
The insertion of AI agents into the processes revealed a complex weave of workflows, in which four desks handled claims, a different desk handled compliance, and another handled treasury. It was also revealed that each of those agents derives its intelligence from SLM’s claims. The claims SLM is connected to now include the company’s data sources for claim systems. And this flywheel is ready in six to eight weeks.”
Don’t forget the human element.
Sachdev is adamant that technology alone is insufficient: “The holy grail of delivering business outcomes with AI in the enterprise is to marry the deeply rooted knowledge of how AI works… with deep domain knowledge of these industries… It’s unlikely that the engineer who deeply understands the architecture of AI may also be an expert in these industries.”
In delivering AI solutions, Sachdev says it requires an ecosystem. “Just like with past technology trends, it requires an entire ecosystem to come together. The role of the system integrator who has served those companies for so long cannot be ignored.”
Uniphore has partnerships with KPMG, Capgemini, Cognizant, LT Mindtree, and Wipro – firms that have expertise in life sciences, financial services, and retail. Uniphore trains them in AI.
There is one other aspect that can’t be ignored – the human in the loop. Sachdev posits: “If these models are so smart, why hasn’t every enterprise already fully adopted them?” And for him, this is the challenge. “It’s a change management issue. It’s an education issue, deeply embedded context, tribal knowledge, which has to be slowly extracted into AI.”
Piece of advice
He advises CEOs: “This concept of a tiger team or a SWAT team. Take a handful of people who can be educated. Make them report to the CEO… Create early success stories. People will become believers when they see their jobs becoming easier. That’s when people start to think differently.”
He recalls: “Our most successful projects are where the CEO is directly involved in this transformation… It’s not a technology decision. It’s a change management decision.”
The convergence of Small Language Models, autonomous fine-tuning, dynamic knowledge graphs, and a self-correcting agentic flywheel provides a viable, economically sound path from POC to production.
Sachdev says realising this potential requires leaders to embrace ecosystem partnerships and CEO-led change management to transform their organisations truly.