top of page
Search

The Moat for Enterprise AI is RAG + Fine Tuning – Here’s Why



The hype around LLMs is unprecedented, but it’s warranted. From AI-generated images of the Pope in head-to-toe Balenciaga to customer support agents without pulses, generative AI has the potential to transform society as we know it. 

And in many ways, LLMs are going to make data engineers more valuable – and that’s exciting!

Still, it’s one thing to show your boss a cool demo of a data discovery tool or text-to-SQL generator – it’s another thing to use it with your company’s proprietary data, or even more concerning, customer data.

All too often, companies rush into building AI applications with little foresight into the financial and organizational impact of their experiments. And it’s not their fault – executives and boards are to blame for much of the “hurry up and go” mentality around this (and most) new technologies. (Remember NFTs?). 

For AI – particularly generative AI – to succeed, we need to take a step back and remember how any software becomes enterprise ready. To get there, we can take cues from other industries to understand what enterprise readiness looks like and apply these tenets to generative AI. 

In my opinion, enterprise ready generative AI must be: 

  • Secure & private: Your AI application must ensure that your data is secure, private, and compliant, with proper access controls. Think: SecOps for AI.

  • Scalable: your AI application must be easy to deploy, use, and upgrade, as well as be cost-efficient. You wouldn’t purchase – or build – a data application if it took months to deploy, was tedious to use, and impossible to upgrade without introducing a million other issues. We shouldn’t treat AI applications any differently. 

  • Trusted. Your AI application should be sufficiently reliable and consistent. I’d be hard-pressed to find a CTO who is willing to bet her career on buying or building a product that produces unreliable code or generates insights that are haphazard and misleading.

With these guardrails in mind, it’s time we start giving generative AI the diligence it deserves. But it’s not so easy…

Why is enterprise AI hard to achieve?

Put simply, the underlying infrastructure to scale, secure, and operate LLM applications is not there yet. 

Unlike most applications, AI is very much a black box. We know what we’re putting in (raw, often unstructured data) and we know what we’re getting out, but we don’t know how it got there. And that’s difficult to scale, secure and operate. 

Take GPT-4 for example. While GPT-4 blew GPT 3.5 out of the water when it came to some tasks (like taking SAT and AP Calculus AB exam), some of its outputs were riddled with hallucinations or lacked necessary context to adequately accomplish these tasks. Hallucinations are caused by a variety of factors from poor embeddings to knowledge cutoff, and frequently affect the quality of responses generated by publicly available or open LLMs trained on information scraped from the internet, which account for most models. 

To reduce hallucinations and even more importantly – to answer meaningful business questions – companies need to augment LLMs with their own proprietary data, which includes necessary business context. For instance, if a customer asks an airline chatbot to cancel their ticket, the model would need to access information about the customer, about their past transactions, about cancellation policies and potentially other pieces of information. All of these currently exist in databases and data warehouses. 

Without that context, an AI can only reason with the public information, typically published on the Internet, on which it was originally trained. And here lies the conundrum – exposing proprietary Enterprise data and incorporating it into business workflows or customer experiences almost always requires solid security, scalability and reliability. 

The two routes to enterprise ready AI: RAG and fine tuning

When it comes to making AI enterprise ready, the most critical parts come at the very end of the LLM development process: retrieval augmented generation (RAG) and fine tuning

RAG integrates real-time databases into the LLM’s response generation process, ensuring up-to-date and factual output. Fine-tuning, on the other hand, trains models on targeted datasets to improve domain-specific responses.

It’s important to note, however, that RAG and fine tuning are not mutually exclusive approaches, and should be leveraged – oftentimes in tandem – based on your specific needs and use case. 

When to use RAG

RAG is a framework that improves the quality of LLM outputs by giving the model access to a database while attempting to answer a prompt. The database – being a curated and trusted body of potentially proprietary data – allows the model to incorporate up-to-date and reliable information into its responses and reasoning. This approach is best suited for AI applications that require additional contextual information, such as customer support responses (like our flight cancellations example) or semantic search in your company’s enterprise communication platform.

RAG applications are designed to retrieve relevant information from knowledge sources before generating a response, making them well suited for querying structured and unstructured data sources, such as vector databases and feature stores. By retrieving information to increase the accuracy and reliability of LLMs at output generation, RAG is also highly effective at both reducing hallucinations and keeping training costs down. RAG also affords teams a level of transparency since you know the source of the data that you’re piping into the model to generate new responses.

One thing to note about RAG architectures is that their performance heavily relies on your ability to build effective data pipelines that make enterprise data available to AI models.

When to use fine tuning

Fine tuning is the process of training an existing LLM on a smaller, task-specific and labeled dataset, adjusting model parameters and embeddings based on this new data. Fine tuning relies on pre-curated datasets that inform not just information retrieval, but the nuance and terminologies of the domain for which you’re looking to generate outputs.  

In our experience, fine tuning is best suited for domain-specific situations, like responding to detailed prompts in a niche tone or style, i.e. a legal brief or customer support ticket. It is also a great fit for overcoming information bias and other limitations, such as language repetitions or inconsistencies. Several studies over the past year have shown that fine-tuned models significantly outperform off-the-shelf versions of GPT-3 and other publically available models. It has been established that for many use cases, a fine-tuned small model can outperform a large general purpose model – making fine tuning a plausible path for cost efficiency in certain cases.

Unlike RAG, fine tuning often requires less data but at the expense of more time and compute resources. Additionally, fine tuning operates like a black box; since the model internalizes the new data set, it becomes challenging to pinpoint the reasoning behind new responses and hallucinations remain a meaningful concern.

Fine tuning – like RAG architectures – requires building effective data pipelines that make (labeled!) enterprise data available to the fine tuning process. No easy feat.



Comments


bottom of page