RAG LLM App Architecture

GENERALArchitecture

About This Architecture

Retrieval-augmented generation architecture split into two distinct paths: the online query-time path embeds the user's query, retrieves top-k matching chunks from a vector database, combines them with the original query, and sends that context to an LLM that returns a grounded answer with citations; the offline indexing path chunks source documents, embeds each chunk, and writes the vectors into the same vector database that the online path queries.

Architecture prompt

Generated by Diagrams.so — AI architecture diagram generator with native Draw.io output. Fork this diagram, remix it, or download as .drawio, PNG, or SVG.

Generate your own architecturediagram →