Dataflow Automator - Data Engineering Platform
About This Architecture
Dataflow Automator is a modern data engineering platform that automates multi-layer ETL pipeline creation on Azure Databricks using data contracts and file masks. The system ingests raw CSV, Parquet, and JSON files from DBFS, validates them against schemas and quality rules, then orchestrates Bronze-Silver-Gold medallion architecture through DLT pipelines with automated code generation. A React frontend enables pipeline creation and data discovery, while a FastAPI backend manages file scanning, contract parsing, validation, and Databricks integration via REST API and CLI. Azure DevOps CI/CD deploys Databricks Asset Bundles and Terraform infrastructure, ensuring reproducible, version-controlled data workflows with full monitoring and quarantine handling for rejected records.
People also ask
How do you automate multi-layer ETL pipeline creation on Azure Databricks with data contracts and DLT code generation?
Dataflow Automator automates pipeline creation by scanning DBFS files, validating them against data contracts and quality rules, then generating DLT Python code and Databricks Asset Bundles. The platform orchestrates Bronze (raw validated), Silver (cleaned), and Gold (business-ready) medallion layers, with Azure DevOps CI/CD deploying infrastructure as code and monitoring rejected records in quara
- Domain:
- Data Engineering
- Audience:
- Data engineers building automated ETL pipelines on Azure Databricks
Generated by Diagrams.so — AI architecture diagram generator with native Draw.io output. Fork this diagram, remix it, or download as .drawio, PNG, or SVG.