RAG LLM App Architecture

GENERALArchitecture
RAG LLM App Architecture — GENERAL architecture diagram

About This Architecture

Retrieval-augmented generation architecture split into two distinct paths: the online query-time path embeds the user's query, retrieves top-k matching chunks from a vector database, combines them with the original query, and sends that context to an LLM that returns a grounded answer with citations; the offline indexing path chunks source documents, embeds each chunk, and writes the vectors into the same vector database that the online path queries.

Architecture prompt

Retrieval-augmented generation architecture split into two distinct paths: the online query-time path embeds the user's query, retrieves top-k matching chunks from a vector database, combines them with the original query, and sends that context to an LLM that returns a grounded answer with citations; the offline indexing path chunks source documents, embeds each chunk, and writes the vectors into the same vector database that the online path queries.

Generated by Diagrams.so — AI architecture diagram generator with native Draw.io output. Fork this diagram, remix it, or download as .drawio, PNG, or SVG.

Generate your own architecturediagram →

About This Architecture

Retrieval-augmented generation architecture split into two distinct paths: the online query-time path embeds the user's query, retrieves top-k matching chunks from a vector database, combines them with the original query, and sends that context to an LLM that returns a grounded answer with citations; the offline indexing path chunks source documents, embeds each chunk, and writes the vectors into the same vector database that the online path queries.

RAG LLM App Architecture

AutoCurated TemplateData Pipeline
0 views0 favoritesPublic

Created by

July 2, 2026

Updated

July 2, 2026 at 5:24 PM

Type

architecture

Need a custom architecture diagram?

Describe your architecture in plain English and get a production-ready Draw.io diagram in seconds. Works for AWS, Azure, GCP, Kubernetes, and more.

Generate with AI