Skills · Course

📚 RAG: Retrieval-Augmented Generation

6 lessons · 47 min · ⭐ 4.8 · 0 enrolled · Verified 2026-06-12

Learn RAG: Retrieval-Augmented Generation on AI4AI — short, hands-on lessons with live AI runs, at three reading levels (beginner to expert). Free to start.

What you'll learn

RAG: The Open-Book Exam for AI (7 min) — ⚡ RAG (Retrieval-Augmented Generation) is an architecture that gives a language model access to an external knowledge s…
Chunking Documents Well: Size, Overlap, and Structure-Aware Splitting (8 min) — ⚡ Chunking is the process of splitting source documents into smaller text segments before embedding and storing them in…
Embeddings & Vector Search: Finding Similar Meaning (8 min) — An embedding is a fixed-length list of floating-point numbers (a vector) that represents the meaning of a piece of text…
Hybrid Search + Reranking: Getting the Right Chunks Every Time (8 min) — Hybrid search combines two retrieval methods: BM25 (a keyword-based algorithm that scores documents by term frequency a…
Grounded Generation & Citations: Answering Only from Retrieved Context (8 min) — Grounded generation means constraining the model to produce answers derived exclusively from a supplied context window …
Evaluating RAG: Retrieval Recall vs. Answer Faithfulness (8 min) — Evaluating a RAG pipeline requires two distinct measurements because failures happen in two independent stages. **Retri…

Start learning free →

Lessons

RAG: The Open-Book Exam for AI

⚡ RAG (Retrieval-Augmented Generation) is an architecture that gives a language model access to an external knowledge source at inference time — meaning the model looks up relevant documents before it writes its answer, rather than relying solely on patterns baked in during trai…

Chunking Documents Well: Size, Overlap, and Structure-Aware Splitting

⚡ Chunking is the process of splitting source documents into smaller text segments before embedding and storing them in a vector database. The chunk size and strategy directly determine retrieval quality — they are often the biggest lever in a RAG pipeline. **Size:** Chunks of 2…

Embeddings & Vector Search: Finding Similar Meaning

An embedding is a fixed-length list of floating-point numbers (a vector) that represents the meaning of a piece of text. Embedding models — such as OpenAI's text-embedding-3-small or open-source models like nomic-embed-text — are trained so that semantically similar texts produc…

Hybrid Search + Reranking: Getting the Right Chunks Every Time

Hybrid search combines two retrieval methods: BM25 (a keyword-based algorithm that scores documents by term frequency and inverse document frequency) and dense vector search (which uses embedding similarity to find semantically related chunks). Running both in parallel captures …

Grounded Generation & Citations: Answering Only from Retrieved Context

Grounded generation means constraining the model to produce answers derived exclusively from a supplied context window — the retrieved chunks — rather than from parametric knowledge baked in during training. Without this constraint, models blend retrieved facts with memorized pr…

Evaluating RAG: Retrieval Recall vs. Answer Faithfulness

Evaluating a RAG pipeline requires two distinct measurements because failures happen in two independent stages. **Retrieval recall** measures whether the chunks actually containing the answer were fetched — typically computed as the fraction of 'gold' relevant chunks that appear…

AI4AI — Academic Institute For Artificial Intelligence · Built by mAIb Tech · Courses · Docs · support@maib.io