Introduction to RAG
Last updated
Last updated
Unlocking External Knowledge with Retrieval-Augmented Generation
You’ve probably heard people say,
“RAG is essential!”
“You need to use RAG!”
But what exactly is RAG?
In this tutorial, we’ll explore RAG, why it matters, and how it helps large language models (LLMs) read and understand external documents to generate more accurate responses.
Before starting this tutorial, it helps to understand the following concepts:
→ Learn what LLMs are and what they can do
→ Understand the strengths, weaknesses, and limitations of LLMs
→ Learn how external documents are converted into machine-readable formats
RAG stands for Retrieval-Augmented Generation. It is a technique that combines:
Retrieval: Searching external databases for relevant information
Augmentation: Supplementing the LLM's input with that information
Generation: Producing a response using the enriched context
While traditional LLMs rely solely on pre-trained knowledge (limited to their training cutoff), RAG enables them to reference external documents, allowing for more accurate and up-to-date answers.
RAG is one of the most promising ways to address LLM limitations like hallucination and knowledge cutoff.
The value of RAG lies in its ability to augment knowledge. This happens in two main ways:
Keeps the LLM up-to-date with recent information not available during training
Ideal for dynamic or fast-changing topics like:
Tech product specs released after model training
Real-time data like stock prices or currency rates
Example:
An LLM trained in 2023 won't know about a laptop released in 2025—unless it's given access to updated documents via RAG.
Focuses on domain-specific or proprietary knowledge
Useful for:
Legal advice using internal legal documents
Corporate chatbots referencing internal manuals or policy docs
In short, RAG bridges the gap between general LLMs and custom, expert-level knowledge.
RAG is a system where:
External documents are pre-processed
When a user asks a question, related documents are retrieved
The LLM refers to those documents when generating an answer
Let’s explore the RAG pipeline, step by step.
Before implementation, carefully plan the pipeline:
Problem Definition
What exact problem is RAG solving? Be specific.
“Just building a chatbot” isn't a strong use case.
Data Planning
What documents will be used? What format?
How will the data stay current?
Security & Access Control
If access varies by user, set up permissions and restrictions
Evaluation Plan
Define how responses will be evaluated (e.g., QA sets, user feedback)
UI/UX Design
Choose an interface suitable for the task (chatbot, card UI, dashboard)
Now it’s time to prepare external documents for LLM processing.
Data Loading
Load documents (PDFs, Word, websites, etc.)
Tools like Upstage Document Parse can preserve structure as well as text
Chunking
Split long texts into manageable “chunks”
Too small = lose context; Too big = slow, costly, or error-prone
Use smart chunking techniques to preserve meaning
Embedding
Convert text chunks into vector embeddings (numeric representations)
Use tools like Upstage Solar Embedding
Indexing
Store the embeddings in a vector database (VectorStore)
Index them for fast retrieval later
This is where user questions get processed and answers are generated.
Prompt Engineering
Design prompts that guide the LLM to use retrieved context effectively
Techniques: few-shot examples, persona instructions, delimiters, etc.
Workflow Engineering
Automate the full flow:
→ Receive query → Retrieve docs → Construct prompt → Generate answer
Modern systems may also use Agentic Workflows to call APIs when needed
(e.g., call a weather API if the query is “What’s the weather today?”)
Demo Time!
Build user-facing interfaces using tools like Streamlit or Gradio
Let users interact with your RAG pipeline
Evaluating RAG outputs can be tricky—answers vary by user and context.
Build QA Sets
Create reference question-answer pairs
Consider expert/user interviews to create realistic QA pairs
Evaluate both:
Retrieval Quality: Was the proper document retrieved?
Answer Quality: Was the generated answer accurate?
LLM-Based Evaluation
For large-scale evaluations, use LLMs to score generation quality
Provide LLMs with the query, answer, and reference answer, and ask them to score similarity or factual accuracy
In this section, we explored the concept and implementation of RAG.
🔹 What is RAG?
A framework that enhances LLM responses by retrieving relevant external documents
🔹 Why RAG?
LLMs have knowledge limitations. RAG fills the gap with real-time and domain-specific data.
🔹 RAG Implementation Stages:
1️⃣ Planning: Define goals, data, evaluation, and UX
2️⃣ Ingestion: Load → Chunk → Embed → Index
3️⃣ Querying: Retrieve → Prompt → Generate
4️⃣ Evaluation: Human/LLM-based answer scoring
📘 Now it's your turn—let's build your first RAG pipeline together in the upcoming hands-on practice!
YoungHoon Jeon | AI Edu | Upstage