About This Architecture
Multi-AZ AWS GenAI RAG platform combining EKS, Bedrock Claude Opus, and Pinecone vector database for production-grade retrieval-augmented generation. User requests flow through Route 53, CloudFront CDN, WAF, and API Gateway to an Application Load Balancer distributing traffic across EKS pods running RAG API, Ingestion, and Embedding Service components in two availability zones. Document preprocessing via Lambda feeds S3 and SQS, while RAG API queries Bedrock for LLM responses, ElastiCache for prompt caching, DynamoDB for session state, and Pinecone for semantic vector search across ingested documents. This architecture demonstrates high-availability GenAI workloads with cross-AZ redundancy, managed vector search, and comprehensive observability via CloudWatch, X-Ray, and OpenSearch. Fork and customize this diagram on Diagrams.so to adapt the topology for your document corpus size, query latency requirements, or alternative vector databases. The design separates presentation, application, and data tiers across subnets with explicit security boundaries, enabling teams to scale embedding throughput and LLM inference independently.