AbbVie Data Lake AWS Architecture
About This Architecture
AbbVie's multi-AZ data lake architecture on AWS ingests structured data from SAP, Veeva, Fieldglass, and Compass alongside external clinical and health authority sources via Lambda, EventBridge, SQS, and MSK Kafka, landing in S3 raw buckets. Data flows through EMR Spark ETL, AWS Glue serverless jobs, and Redshift Spectrum for transformation and querying, with Glue Data Catalog providing Hive-compatible metadata and Lake Formation enforcing tag-based row/column security. The design spans three availability zones in us-east-1 with private subnets for EKS, ECS Fargate, MWAA orchestration, and SageMaker, secured by Interface VPC Endpoints, KMS multi-region encryption, and CloudTrail audit logging. Hybrid connectivity to legacy Cloudera on-prem via Transit Gateway and Site-to-Site VPN enables gradual cloud migration while maintaining governance. Fork this diagram to customize data source connectors, add cross-region replication policies, or adapt the security posture for your regulated data environment.
People also ask
How do you design a secure, multi-AZ AWS data lake that ingests from SAP and external sources while maintaining governance and hybrid on-prem connectivity?
AbbVie's architecture uses Lambda, EventBridge, and MSK Kafka to ingest data into S3 buckets across three AZs, then processes via EMR Spark and Glue with Hive-compatible metadata. Lake Formation enforces tag-based row/column security, KMS provides multi-region encryption, and Transit Gateway bridges legacy Cloudera on-prem systems—enabling governed, scalable analytics with audit trails via CloudTr
- Domain:
- Cloud Aws
- Audience:
- AWS solutions architects designing enterprise data lake platforms
Generated by Diagrams.so — AI architecture diagram generator with native Draw.io output. Fork this diagram, remix it, or download as .drawio, PNG, or SVG.