Introduction to Embedding
📌 Table of Contents
What is a Vector?
What is Embedding?
Why Embedding?
Why Solar Embedding?
Demo: Sentence Embedding Hands-on
AI models, or computer algorithms, must convert words, sentences, and documents into numerical representations to understand and process text. This transformation is enabled by a technique called embedding, which is a core component of LLMs and essential for AI to interpret language effectively.
In this lecture, we will explore the concept of vectors and embeddings, why embeddings are crucial, and how Upstage’s Solar Embedding model stands out from others.
1. What is a Vector?
How Do Computers Understand Natural Language?
Computers can only recognize and process numbers. Therefore, human language—words and sentences—must be converted into numerical representations (vectors) for AI to comprehend them.
💡 What is a Vector?
A vector is a sequence of numbers representing data in a structured form.
AI transforms words and sentences into vectors to understand their meaning and relationships.
📌 Examples
"CAT"
[0.12, -0.45, 1.33, ...]
"DOG"
[0.14, -0.50, 1.28, ...]
"CAR"
[2.45, -0.98, 3.22, ...]
📌 What is Vector Space?

A vector space is a numerical coordinate space representing words. In this space, words with similar meanings are placed closer together, while words with different meanings are farther apart.
This allows AI to analyze and learn relationships between words mathematically.
2. What is Embedding?
💡 Embedding: Converting Text into Context-Aware Numerical Representations
Before embeddings, early methods for vectorizing text relied on surface-level numeric approaches like matching spelling similarities or counting word occurrences in a sentence, failing to capture context and meaning.
However, embedding technology transforms words, sentences, and documents into vectors while preserving their semantic meaning, enabling AI to understand words and contexts more accurately.
📌 Examples
Consider the following three words
"Sturdy"
[0.12, -0.45, 1.33, ...]
"Study"
[0.14, -0.50, 1.28, ...]
"Learn"
[2.45, -0.98, 3.22, ...]
"Sturdy" and "Study" have similar spellings but different meanings.
"Study" and "Learn" have different spellings but similar meanings.
📌 Comparison: Traditional vs. Embedding Approach
Traditional Approach (No Context Awareness)
Embedding Approach (Context Awareness)
"Sturdy" and "Study" are considered similar due to spelling resemblance.
"Sturdy" and "Study" are correctly categorized as unrelated.
"Study" and "Learn" are treated as unrelated due to spelling differences.
"Study" and "Learn" are identified as semantically related.

Semantically similar words are placed closer together in a vector space, while unrelated words are distanced apart.
💡 With embedding models, AI can more accurately interpret word meanings and sentence contexts!
3. Why Embedding?
AI needs embeddings to understand not just words but their contextual meanings.
✅ Why is Embedding Necessary?
1️⃣ Sentence Similarity Comparison
"I love pizza" ↔ "Pizza is my favorite food" → Recognized as similar sentences.
2️⃣ Enhanced Text Search (Semantic Search)
AI can determine word and sentence similarity, improving search accuracy.
Due to semantic relevance, searching for "dog" may retrieve documents containing "pet" or "puppy."
3️⃣ Improved Natural Language Processing
Embeddings enhance chatbots, recommendation systems, and document summarization, enabling AI to generate more natural responses.
Embeddings allow AI to understand and process text more accurately!
Because embedding model performance significantly impacts LLM accuracy, choosing a powerful and precise embedding model is crucial.
4. Why Solar Embedding?
🔗 Read more about Solar Embedding
Upstage’s Solar Embedding model delivers faster and more accurate performance than traditional embedding models.
✅ What Makes Solar Embedding Unique?

✔ Superior Performance Compared to Other Embedding Models
Outperforms other models across English, Korean, and Japanese benchmarks.
Particularly excels in challenging search and document retrieval tasks.
✔ Multilingual Support
Excels in English, Korean, and Japanese.
Demonstrates higher accuracy in complex retrieval tasks compared to existing models.
💡 Solar Embedding offers top-tier accuracy and multilingual capabilities, making it ideal for diverse applications!
5. 🛠️ Demo: Sentence Embedding Hands-on
Let’s explore how text is transformed into vectors in a practical exercise.
📌 Hands-on Goal
Directly observe how embedding vectors are generated.
💡 Steps to Follow
1️⃣ Input a Sentence → "The weather is nice today."
2️⃣ Run Solar Embedding API → Convert the sentence into a vector.
3️⃣ View the Generated Embedding Vector

Wrap Up
In this module, we explored the concept and significance of embeddings and how Solar Embedding stands out.
🔹 Vector: The fundamental numerical representation AI uses to process text.
🔹 Embedding: Converting words, sentences, and documents into vectors while preserving meaning and context.
🔹 Why Embeddings Matter: Essential for improving search accuracy, sentence similarity comparison, and NLP applications.
🔹 Solar Embedding Advantages: Offers higher accuracy, multilingual support, and superior performance in complex search and document retrieval tasks.
YoungHoon Jeon | AI Edu | Upstage
Last updated