About This Architecture
Serverless patent data pipeline orchestrates scheduled scraping from Google Patent Search using Lambda Web Scraper in a private subnet with NAT Gateway egress. EventBridge Scheduler triggers hourly scrapes, storing raw HTML/JSON in S3 Bucket Raw Data, which invokes Lambda Data Processor to parse and load structured records into RDS PostgreSQL db.t3.micro. Secrets Manager secures database credentials while CloudWatch Logs captures scraper errors and processing metrics. Fork this diagram on Diagrams.so to customize scraping frequency, add DynamoDB for deduplication, or swap RDS for Aurora Serverless for variable workloads.