Metadata-Driven Configurable AWS Data Pipeline

aws · architecture diagram.

About This Architecture

Metadata-driven AWS data pipeline architecture ingests from Kafka and Oracle CDC logs into S3 raw and staging buckets, then validates schemas and transforms data using Lambda functions. AWS Glue ETL jobs process batch data to Redshift while Glue streaming jobs push real-time data through Kinesis Firehose to MSK, orchestrated by Amazon MWAA Airflow with EventBridge scheduling. Configuration and metadata stored in dedicated S3 buckets enable dynamic pipeline behavior without code changes, supporting both batch and streaming workloads with CloudWatch monitoring and SNS alerting. Fork this diagram on Diagrams.so to customize the metadata schema, add data quality checks, or integrate additional source systems like DynamoDB Streams or RDS change data capture.

People also ask

How do I build a metadata-driven data pipeline on AWS that handles both batch and streaming workloads with Kafka and Oracle CDC sources?

Use S3 buckets for metadata and configuration to drive Lambda schema validators and CDC transformers, orchestrate AWS Glue batch ETL jobs to Redshift and streaming jobs to Kinesis Firehose with Amazon MWAA Airflow, enabling dynamic pipeline behavior without code changes as shown in this architecture diagram.

Metadata-Driven Configurable AWS Data Pipeline

AWSadvancedData EngineeringETLGlueMWAAKafka
Domain: Data EngineeringAudience: data engineers building metadata-driven ETL pipelines on AWS
1 views0 favoritesPublic

Created by

February 22, 2026

Updated

March 9, 2026 at 5:15 AM

Type

architecture

Need a custom architecture diagram?

Describe your architecture in plain English and get a production-ready Draw.io diagram in seconds. Works for AWS, Azure, GCP, Kubernetes, and more.

Generate with AI