Introduction to RAG

Unlocking External Knowledge with Retrieval-Augmented Generation

You’ve probably heard people say,

“RAG is essential!”
“You need to use RAG!”
But what exactly is RAG?

In this tutorial, we’ll explore RAG, why it matters, and how it helps large language models (LLMs) read and understand external documents to generate more accurate responses.

📚 Prerequisites

Before starting this tutorial, it helps to understand the following concepts:

Introduction to LLM
→ Learn what LLMs are and what they can do
Capabilities of LLM
→ Understand the strengths, weaknesses, and limitations of LLMs
Introduction to Document Parse
→ Learn how external documents are converted into machine-readable formats

1. What is RAG?

RAG stands for Retrieval-Augmented Generation. It is a technique that combines:

Retrieval: Searching external databases for relevant information
Augmentation: Supplementing the LLM's input with that information
Generation: Producing a response using the enriched context

While traditional LLMs rely solely on pre-trained knowledge (limited to their training cutoff), RAG enables them to reference external documents, allowing for more accurate and up-to-date answers.

RAG is one of the most promising ways to address LLM limitations like hallucination and knowledge cutoff.

2. Why RAG Matters: Augmentation of Knowledge

The value of RAG lies in its ability to augment knowledge. This happens in two main ways:

📌 Expansion of Knowledge

Keeps the LLM up-to-date with recent information not available during training
Ideal for dynamic or fast-changing topics like:
- Tech product specs released after model training
- Real-time data like stock prices or currency rates

Example:

An LLM trained in 2023 won't know about a laptop released in 2025—unless it's given access to updated documents via RAG.

📌 Concentration of Knowledge

Focuses on domain-specific or proprietary knowledge
Useful for:
- Legal advice using internal legal documents
- Corporate chatbots referencing internal manuals or policy docs

In short, RAG bridges the gap between general LLMs and custom, expert-level knowledge.

🧠 RAG in a Nutshell

RAG is a system where:

External documents are pre-processed
When a user asks a question, related documents are retrieved
The LLM refers to those documents when generating an answer

3. Steps to Build RAG

Let’s explore the RAG pipeline, step by step.

🗓️ Step 1. Planning

Before implementation, carefully plan the pipeline:

Problem Definition
- What exact problem is RAG solving? Be specific.
- “Just building a chatbot” isn't a strong use case.
Data Planning
- What documents will be used? What format?
- How will the data stay current?
Security & Access Control
- If access varies by user, set up permissions and restrictions
Evaluation Plan
- Define how responses will be evaluated (e.g., QA sets, user feedback)
UI/UX Design
- Choose an interface suitable for the task (chatbot, card UI, dashboard)

🔄 Step 2. Ingestion Pipeline

Now it’s time to prepare external documents for LLM processing.

Data Loading
- Load documents (PDFs, Word, websites, etc.)
- Tools like Upstage Document Parse can preserve structure as well as text
Chunking
- Split long texts into manageable “chunks”
- Too small = lose context; Too big = slow, costly, or error-prone
- Use smart chunking techniques to preserve meaning
Embedding
- Convert text chunks into vector embeddings (numeric representations)
- Use tools like Upstage Solar Embedding
Indexing
- Store the embeddings in a vector database (VectorStore)
- Index them for fast retrieval later

💬 Step 3. Querying Pipeline

This is where user questions get processed and answers are generated.

Prompt Engineering
- Design prompts that guide the LLM to use retrieved context effectively
- Techniques: few-shot examples, persona instructions, delimiters, etc.
Workflow Engineering
- Automate the full flow:
  → Receive query → Retrieve docs → Construct prompt → Generate answer
- Modern systems may also use Agentic Workflows to call APIs when needed
  (e.g., call a weather API if the query is “What’s the weather today?”)
Demo Time!
- Build user-facing interfaces using tools like Streamlit or Gradio
- Let users interact with your RAG pipeline

✅ Step 4. Evaluation

Evaluating RAG outputs can be tricky—answers vary by user and context.

Build QA Sets
- Create reference question-answer pairs
- Consider expert/user interviews to create realistic QA pairs
- Evaluate both:
  - Retrieval Quality: Was the proper document retrieved?
  - Answer Quality: Was the generated answer accurate?
LLM-Based Evaluation
- For large-scale evaluations, use LLMs to score generation quality
- Provide LLMs with the query, answer, and reference answer, and ask them to score similarity or factual accuracy

Wrap Up

In this section, we explored the concept and implementation of RAG.

🔹 What is RAG?

A framework that enhances LLM responses by retrieving relevant external documents

🔹 Why RAG?

LLMs have knowledge limitations. RAG fills the gap with real-time and domain-specific data.

🔹 RAG Implementation Stages:

1️⃣ Planning: Define goals, data, evaluation, and UX

2️⃣ Ingestion: Load → Chunk → Embed → Index

3️⃣ Querying: Retrieve → Prompt → Generate

4️⃣ Evaluation: Human/LLM-based answer scoring

📘 Now it's your turn—let's build your first RAG pipeline together in the upcoming hands-on practice!

YoungHoon Jeon | AI Edu | Upstage

Previous[ENG] Edu Full Package - Use Case Zone NextIntroduction to AI Agent

Last updated 2 months ago