The role of vector DBs in retrieval-augmented generation (RAG)
To fully understand RAG and the pivotal role of vector DBs within it, we must first acknowledge the inherent constraints of LLMs, which paved the way for the advent of RAG techniques powered by vector DBs. This section sheds light on the specific LLM challenges that RAG aims to overcome and the importance of vector DBs.
First, the big question – Why?
In Chapter 1, we delved into the limitations of LLMs, which include the following:
- LLMs possess a fixed knowledge base determined by their training data; as of February 2024, ChatGPT’s knowledge is limited to information up until April 2023.
- LLMs can occasionally produce false narratives, spinning tales or facts that aren’t real.
- They lack personal memory, relying solely on the input context length. For example, take GPT4-32K; it can only process up to 32K tokens between prompts and completions (we’ll dive deeper into prompts, completions, and tokens in Chapter 5).
To counter these challenges, a promising avenue is enhancing LLM generation with retrieval components. These components can extract pertinent data from external knowledge bases—a process termed RAG, which we’ll explore further in this section.
So, what is RAG, and how does it help LLMs?
Retrieval–augmented generation (RAG) was first introduced in a paper titled Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks (https://arxiv.org/pdf/2005.11401. pdf) in November 2020 by Facebook AI Research (now Meta). RAG is an approach that combines the generative capabilities of LLMs with retrieval mechanisms to extract relevant information from vast datasets. LLMs, such as the GPT variants, have the ability to generate human-like text based on patterns in their training data but lack the means to perform real-time external lookups or reference specific external knowledge bases post-training. RAG addresses this limitation by using a retrieval model to query a dataset and fetch relevant information, which then serves as the context for the generative model to produce a detailed and informed response. This also helps in grounding the LLM queries with relevant information that reduces the chances of hallucinations.
The critical role of vector DBs
A vector DB plays a crucial role in facilitating the efficient retrieval aspect of RAG. In this setup, each piece of information, such as text, video, or audio, in the dataset is represented as a highly dimensional vector and indexed in a vector DB. When a query from a user comes in, it’s also converted into a similar vector representation. The vector DB then rapidly searches for vectors (documents) in the dataset that are closest to the query vector, leveraging techniques such as ANN search. Then, it attaches the query with relevant content and sends it to the LLMs to generate a response. This ensures that the most relevant information is retrieved quickly and efficiently, providing a foundation for the generative model to build upon.