RAG LLM App Architecture
About This Architecture
Retrieval-augmented generation architecture split into two distinct paths: the online query-time path embeds the user's query, retrieves top-k matching chunks from a vector database, combines them with the original query, and sends that context to an LLM that returns a grounded answer with citations; the offline indexing path chunks source documents, embeds each chunk, and writes the vectors into the same vector database that the online path queries.
Architecture prompt
Retrieval-augmented generation architecture split into two distinct paths: the online query-time path embeds the user's query, retrieves top-k matching chunks from a vector database, combines them with the original query, and sends that context to an LLM that returns a grounded answer with citations; the offline indexing path chunks source documents, embeds each chunk, and writes the vectors into the same vector database that the online path queries.
Generated by Diagrams.so — AI architecture diagram generator with native Draw.io output. Fork this diagram, remix it, or download as .drawio, PNG, or SVG.