“We definitely would have put more thought into the name had we known our work would become so widespread.”
That’s Patrick Lewis, the researcher who coined the term in a 2020 paper at Meta AI. RAG stands for Retrieval Augmented Generation, and it has since become the most repeated, least understood term in the AI industry.
We encounter it constantly. In product demos, technical articles, team discussions, LinkedIn posts. But here’s what we’ve noticed: people don’t all mean the same thing by it. Not even close.
What the RAG… actually means in AI
The idea behind RAG is simple. Before an AI model answers your question, it first goes and looks something up. That’s it. Retrieve, then generate.
Think of hiring two consultants.:
- One memorized your company handbook six months ago.
- The other hasn’t memorized anything, but keeps a copy on their desk and checks it before answering.
The first is a fine-tuned model.
The second is using RAG.
Now, the quality of that second consultant’s work depends entirely on how well they search. Do they flip to the right chapter? Cross-reference multiple sections? Or just open a random page and hope for the best?
That’s the problem with the term. “We use RAG” tells you the consultant checks the handbook. It tells you nothing about whether they’re any good at it. And the differences between implementations are enormous.
Five levels of RAG
When someone says “we use RAG,” they could mean any of these five things. The gap between level 1 and level 5 is like comparing a microwave to a restaurant kitchen.
| Level | What it does | What it feels like |
|---|---|---|
| 1. Keyword search + LLM | Basic keyword matching on your documents, results passed to the model | A search engine wearing a chatbot costume |
| 2. Semantic search + LLM | Vector embeddings find conceptually similar content, not just keyword matches | Smarter, but still a single lookup |
| 3. Chunked retrieval | Documents split into pieces, embedded, retrieved with relevance scoring | What most enterprise “RAG solutions” actually are |
| 4. Multi-step retrieval | Retrieves, evaluates, retrieves again, re-ranks, then generates | Much more accurate. Less common |
| 5. Agentic RAG | The system decides what to retrieve, from where, in what order, and acts on it | Cutting-edge. Rare despite widespread claims |
Most RAG implementations we’ve seen sit somewhere between levels 1 and 3. We use level 3 ourselves for our internal knowledge base, and it works well for what we need. The point isn’t to chase the highest level. It’s to know what you’re dealing with. A recent analysis counted 14 distinct RAG architectures. The label alone tells you none of this.
What isn’t RAG
We’ve seen all three of these described as “RAG.” None of them are.
Fine-tuning is not RAG. When a model is “trained on your data,” it memorized that content during training. It doesn’t look anything up at query time. IBM puts it well: fine-tuning is like taking a cooking course in Italian cuisine. RAG is like handing the cook an Italian cookbook. Both useful. Different things.
Hardcoded prompt context is not RAG. If the system always sends the same background info regardless of the question, nothing is being retrieved. That’s a static prompt with extra text glued on.
A chatbot with a static FAQ is not RAG. Matching questions to pre-written answers is a decision tree. No generation, no retrieval.
If someone calls any of these “RAG,” they’re either confused about what the term means or using it loosely because it sounds more impressive.
Five questions to cut through the RAG noise
You don’t need to be technical. These five questions will tell you exactly where an implementation sits on the spectrum.
1. “What does the system retrieve, and from where?” “It indexes our Confluence pages, PDFs, and Slack messages” is a clear answer. “It connects to our data” is not.
2. “How does it decide what’s relevant?” This is the question that matters most. It separates keyword matching (level 1) from semantic search (level 2) from multi-step retrieval (level 4). If nobody can explain it clearly, that tells you something.
3. “What happens when it can’t find relevant information?” Good systems say “I don’t know.” Bad ones make something up with confidence. This tells you whether the system has guardrails or just generates plausible nonsense.
4. “Can you see the sources alongside the answer?” If you can’t see which documents informed the response, you can’t verify anything. Source attribution isn’t a nice-to-have.
5. “How are updates to the source material handled?” Real-time re-indexing? Nightly batch? Manual trigger? Stale data is one of the most common RAG failures in production, and the one people talk about least.
Does the term RAG even matter?
Not really. What matters is whether the system gives accurate, source-grounded answers from the right data. The architecture label is secondary.
It’s the same problem we see with AEO and GEO: the label obscures more than it reveals. But the term creates a real problem: false equivalence. A basic keyword search with a chatbot wrapper and a multi-step agentic retrieval system are both called “RAG.” Same label. Completely different capabilities.
So stop fixating on the label. Look at the outcomes: does it get things right, can you see the sources, how current is the data, does it find the right information or just any information?
The next time someone says “we use RAG,” don’t nod. Ask which kind. If in doubt, get in touch with us.