Learn RAG: Retrieval-Augmented Generation on AI4AI — short, hands-on lessons with live AI runs, at three reading levels (beginner to expert). Free to start.
⚡ RAG (Retrieval-Augmented Generation) is an architecture that gives a language model access to an external knowledge source at inference time — meaning the model looks up relevant documents before it writes its answer, rather than relying solely on patterns baked in during trai…
⚡ Chunking is the process of splitting source documents into smaller text segments before embedding and storing them in a vector database. The chunk size and strategy directly determine retrieval quality — they are often the biggest lever in a RAG pipeline. **Size:** Chunks of 2…
An embedding is a fixed-length list of floating-point numbers (a vector) that represents the meaning of a piece of text. Embedding models — such as OpenAI's text-embedding-3-small or open-source models like nomic-embed-text — are trained so that semantically similar texts produc…
Hybrid search combines two retrieval methods: BM25 (a keyword-based algorithm that scores documents by term frequency and inverse document frequency) and dense vector search (which uses embedding similarity to find semantically related chunks). Running both in parallel captures …
Grounded generation means constraining the model to produce answers derived exclusively from a supplied context window — the retrieved chunks — rather than from parametric knowledge baked in during training. Without this constraint, models blend retrieved facts with memorized pr…
Evaluating a RAG pipeline requires two distinct measurements because failures happen in two independent stages. **Retrieval recall** measures whether the chunks actually containing the answer were fetched — typically computed as the fraction of 'gold' relevant chunks that appear…