AbbVie AWS Data Lakehouse Architecture
About This Architecture
Enterprise data lakehouse on AWS with multi-AZ VPC spanning three availability zones, integrating EKS, ECS Fargate, EMR Spark/ETL, Glue, Redshift Serverless, SageMaker, and MWAA orchestration. Data flows from public ALB through private app subnets to compute layers (EKS, ECS, EMR, Glue) in isolated data subnets, with ElastiCache for caching and VPC endpoints for secure AWS service access. This architecture demonstrates high-availability, least-privilege networking, and separation of concerns across application, compute, and data planes. Fork and customize this diagram on Diagrams.so to adapt subnet sizing, add cross-region failover, or integrate additional data sources. The design uses managed scaling (min 3/max 20 nodes) for EMR clusters and gateway endpoints for S3/DynamoDB to optimize cost and reduce data transfer latency.
People also ask
How do you design a secure, scalable AWS data lakehouse with EKS, EMR, Glue, and Redshift across multiple availability zones?
This diagram shows a production data lakehouse spanning three AZs with public subnets for ALB, private app subnets for EKS/ECS/MWAA, and isolated data subnets for EMR, Glue, and Redshift Serverless. VPC endpoints (Glue, Athena, STS, KMS, Secrets Manager, ECR, EventBridge, SQS, SNS) enable secure AWS service access without internet egress, while gateway endpoints for S3 and DynamoDB reduce data tra
- Domain:
- Cloud Aws
- Audience:
- AWS solutions architects designing enterprise data lakehouse platforms
Generated by Diagrams.so — AI architecture diagram generator with native Draw.io output. Fork this diagram, remix it, or download as .drawio, PNG, or SVG.