Do me a favour. Watch the video before proceeding with reading this article. It will help make the rest of the article make sense for you.
So, what did we learn from Marina Danilevsky's presentation on Retrieval-Augmented Generation RAG)?
For me, artificial intelligence is only as good as the information it is fed with, and unless I am confident of the sources of that information, I should not trust the information being supplied by the artificial intelligence (AI) engine on its own merit.
The adage of the 1860s – garbage in, garbage out – and formalised by an IBM programmer, George Fuechsel in 1962, remains true today, 163 or 61 years ago depending on which date you choose to accept.
Experts today acknowledge that large language models (LLMs), the algorithms that power AI engines, including generative AI (GenAI) are flawed because the foundational data upon which the insights being spewed out are not necessarily vetted for accuracy, hence the word hallucination is inferred in the disclaimer when using a generative AI tool.
IBM's senior research scientist, Danilevsky makes clear that LLM users face two challenges: you can't validate the source of the insight being shared by the LLM, and the data used may be out of date.
What is Retrieval-Augmented Generation?
Retrieval-augmented generation (RAG) is an AI framework that improves the accuracy of LLMs.
Kitman Cheung, APAC technical sales leader with IBM, says RAG improves LLM performance by providing it with current and reliable information. By grounding answer generation with search results from domain-specific corpora – a collection of text, the RAG framework can significantly improve AI performance in domain-specific Q&A use cases.
Greg Statton, Office of the CTO – Data and AI at Cohesity, says using RAG allows individuals or enterprises to leverage their vast knowledge bases without the need for re-training or fine-tuning the language model (and keeping their data out of the model).
How does RAG overcome the limitations of traditional language models?
In the generation phase, the LLM uses the search results and the representation of its pre-training data to generate a more accurate response. The response is then usually presented along with a link to the original information source.
Statton explains that RAG overcomes the limitations of traditional language models by incorporating a retrieval step that allows the AI system to access relevant information from a knowledge base before generating a response. "This helps ensure that the generated responses are more accurate, contextually relevant, and coherent," he elaborated.
Concurring, Cheung explains during the retrieval phase, the prompt or question from the user is used to search the corpus – a collection of written text – for relevant information. The search results are then appended to the user’s prompt and sent to LLM.
Cheung says the second phase of RAG, referred to as generation, is where the LLM uses the search results and the representation of its pre-training data to generate a more accurate response. The response is then usually presented along with a link to the original information source.
Statton concurs adding, that you want to fine-tune a model for how it will respond, and use RAG for what it will respond with.
Can you provide examples where RAG-driven AI could boost productivity in enterprises?
Taking the example of a virtual assistant or chatbot, Cheung says with traditional implementation, chatbots are only able to provide answers to a fixed set of questions.
"Using RAG LLMs, chatbots can provide much more tailored answers with information from a reliable and current data source. For example, these RAG-driven chatbots can guide customers through a complex insurance claim or retrieve and summarise a claim for a claim manager," he elaborated.
RAG can augment the automation of customer support, said Statton. "RAG-driven AI can provide accurate and contextually relevant responses to customer queries, reducing the workload on human support agents and improving customer satisfaction. This can be achieved by using the Knowledge Bases and Technical Docs these teams already have," he explained.
In the area of personalised marketing and sales efforts, he noted that RAG-driven AI can analyse customer data to generate personalised content and recommendations, improving customer engagement and conversion rates.
Are there any ethical considerations associated with the use of RAG in generative AI applications?
There is growing acceptance that AI is biased – the bias is intentional only to the extent that the query engine has a finite pool of data from which to derive an answer to a question and therefore may be prone to hallucinations in offering an answer.
There is also concern that there is a lack of transparency in the AI tools – how AI decisions are made. There is also the of whether the sourced data observes the privacy of users.
Cheung acknowledges that unless there is visibility into the data used to train the LLMs, companies may have exposure to copyright and intellectual property infringement. He opined that the legal precedents are not well established in this area.
"It is also important to ensure PII data is not used to train or fine-tune LLMs," he goes on. "Otherwise, data privacy violation can occur and removing this data from the models is nearly impossible."
"With the use of publicly hosted generative AI services, companies should also consider the risk of sensitive information leak. Employee may accidentally include proprietary and sensitive information in the prompts sent to these AI services."
Kitman Cheung
Statton concurs adding that enterprises will need to pay close attention to the conscious and unconscious bias that exists in their documentation, and knowledge bases. "This bias will permeate into the generated responses which could result in unintended consequences, furthering the reach and even amplifying the bias through the language model’s prompt response generation," he added.
What future developments or innovations can we expect to see in the field of AI-driven conversations?
Cheung believes that RAG-based conversational solutions will greatly improve companies’ ability to retrieve and present targeted information to their customers or employees. "With domain-specific Q&A use cases, this technology can greatly improve productivity. RAG frameworks can be used in scientific research to help researchers accelerate new discoveries," he added.
Statton cautions that as RAG-driven AI continues to advance, we can expect to see several developments and innovations in the field of AI-driven conversations, including:
Improved contextual understanding: RAG-driven AI systems will become better at understanding the context of conversations, enabling them to provide more relevant and coherent responses.
Enhanced domain-specific expertise: RAG-driven AI systems will be focused on bespoke domains or tasks, allowing them to provide more specialised and accurate responses in those areas.
Seamless integration with other AI technologies: RAG-driven AI systems will be integrated with other AI technologies, such as computer vision and speech recognition, to enable more natural and intuitive interactions.
Greater personalisation: RAG-driven AI systems will be able to tailor their responses to individual users based on their preferences, history, and context, providing a more personalised and engaging experience.
Ethical and responsible AI development: As RAG-driven AI systems become more advanced, there will be a greater focus on addressing ethical concerns and ensuring responsible AI development and deployment.
Alternative to RAG
One of the arguments for the use of RAG is the potential to introduce latency from the retrieval time and infrastructure overhead of managing the data.
The alternative is finetuning which basically continues the training of the LLM on a domain-specific data specialise model capabilities. The good news about this approach is furthering engraining to a specific knowledge base or niche.
The caution is that the data is subject to data drift over time. This will necessitate frequent retraining and monitoring to keep the fine-tuned model current. Naturally, access to high-quality in-domain training data is a must.
What is your poison?