Multi-Cloud Architecture Patterns

Multi-cloud isn't a strategy you choose. It's a reality you manage. These patterns address the actual reasons teams run workloads across AWS, Azure, GCP, and OCI simultaneously.

Why teams actually go multi-cloud: the real reasons behind the decision

Most multi-cloud deployments don't start with a PowerPoint slide about avoiding vendor lock-in. They start with an acquisition. Company A runs on AWS. Company B runs on Azure. After the merger, both environments exist and neither team has the budget or political will to migrate everything to one provider. The IT leadership declares a 'multi-cloud strategy' and calls it intentional. That's the most common path, and it's fine. The second most common reason is best-of-breed selection. A data team chooses BigQuery for analytics because its serverless model and columnar performance beat Redshift for their query patterns. The application team stays on AWS because ECS Fargate and API Gateway serve their microservices well. The ML team uses Azure because they have an existing Microsoft Enterprise Agreement that makes Azure Machine Learning cheaper than SageMaker. Each decision is rational in isolation, and the aggregate result is multi-cloud. Regulatory requirements drive the third pattern. A European bank runs customer-facing systems on a sovereign cloud provider to satisfy data residency requirements while using AWS for internal tooling that doesn't process customer data. A healthcare company uses GCP for its AI capabilities but keeps patient records on Azure because their compliance team already audited Azure for HIPAA. Vendor negotiation is the fourth reason, and enterprises with $10M-plus annual cloud spend know this well. Running production workloads on two clouds gives procurement real alternatives during contract renewals. When AWS proposes a 5% price increase, showing that 30% of your workloads already run on Azure changes the negotiation dynamics. The fifth reason is geographic coverage. As of 2024, AWS has 33 regions, Azure has over 60, GCP has 40, and OCI has 48. A workload that needs presence in South Africa, UAE, and Indonesia might require multiple providers because no single provider has regions in all target locations with the required compliance certifications.

Active-active versus active-passive multi-cloud and what each demands

Active-active multi-cloud means your application serves production traffic from two or more cloud providers simultaneously. Users on the East Coast hit AWS. Users in Europe hit Azure. Both environments are fully operational, independently deployable, and handle their own failures. This is the hardest pattern to implement correctly. Active-active requires your application to be stateless at the compute layer, or to replicate state across clouds in near-real-time. That means either a distributed database like CockroachDB or YugabyteDB that spans both clouds with acceptable cross-cloud latency, or an event-driven architecture where each cloud processes events independently and a reconciliation layer handles conflicts. The networking cost alone is significant. Cross-cloud data transfer between AWS and Azure costs both providers' egress fees. A workload transferring 10 TB per month between clouds pays roughly $870 to AWS and $870 to Azure, nearly $21,000 annually just in data transfer. Active-passive multi-cloud is more common and more practical for most teams. Your primary workload runs on one cloud. The second cloud hosts a standby environment that activates during a major outage or during planned maintenance windows. The standby environment can be cold (infrastructure-as-code templates ready to deploy), warm (infrastructure running but not receiving traffic), or hot (infrastructure running with data replication but behind a DNS failover). Active-passive works well with managed database replication. You can run PostgreSQL on AWS RDS as the primary and maintain a logical replication target on Azure Database for PostgreSQL. The replication lag determines your RPO. DNS-based failover using Cloudflare or Route 53 health checks handles the traffic switching. The key architectural difference is data synchronization. Active-active requires bidirectional, low-latency data sync with conflict resolution. Active-passive requires unidirectional replication with a clear primary. Most teams that start with active-active ambitions end up implementing active-passive because the data consistency challenges of true active-active across clouds are brutal when you factor in 20-40ms inter-cloud latency.

Cross-cloud networking: connecting AWS, Azure, GCP, and OCI

Cross-cloud networking starts with dedicated interconnects and ends with a network fabric that routes traffic between clouds without touching the public internet. Each cloud provider offers a dedicated connection service: AWS Direct Connect, Azure ExpressRoute, GCP Cloud Interconnect, and OCI FastConnect. These are physical fiber connections from your network edge to the cloud provider's edge. The fastest path to multi-cloud connectivity runs through a colocation provider like Equinix or Megaport. At an Equinix data center, you provision a cross-connect from your cabinet to the AWS Direct Connect port, another to the Azure ExpressRoute MSEE router, and another to the GCP Cloud Interconnect Partner. Megaport's software-defined network simplifies this further: you provision virtual cross-connects between clouds through their portal, and Megaport handles the physical routing. Oracle and Azure have a direct partnership that provides low-latency interconnect between OCI and Azure in specific paired regions (Ashburn-East US, London-UK South, Amsterdam-Netherlands, and others). This OCI-Azure Interconnect doesn't traverse the public internet, and Oracle waives egress charges for traffic between the two clouds. It's the tightest multi-cloud network integration available between any two major providers. For teams that can't justify the cost of dedicated interconnects (starting at $200-300 per month per connection plus port fees), site-to-site VPN provides an encrypted tunnel over the internet. Each cloud supports IPSec VPN: AWS Site-to-Site VPN, Azure VPN Gateway, GCP Cloud VPN, and OCI IPSec VPN. You can connect clouds directly using VPN gateways in each provider, but the bandwidth caps at 1.25 Gbps per tunnel and latency depends on internet routing. The network architecture for multi-cloud typically uses a transit pattern. A central hub network in your primary cloud connects to other clouds via dedicated interconnects or VPN. All cross-cloud traffic routes through this hub, where you can inspect it with a network virtual appliance, apply routing policies, and log traffic flows. This is simpler to manage than a full mesh where every cloud connects directly to every other cloud.

Identity federation across clouds: one IdP to rule them all

The single biggest operational headache in multi-cloud environments is identity. Each cloud has its own identity system: AWS IAM with IAM Identity Center, Azure with Microsoft Entra ID, GCP with Google Cloud Identity, and OCI with OCI IAM Identity Domains. Without federation, your engineers maintain separate credentials for each cloud, MFA enrollment is duplicated four times, and offboarding requires disabling accounts in four places. The fix is a single identity provider that federates to all clouds using SAML 2.0 or OIDC. Microsoft Entra ID (formerly Azure AD) is the most common choice in enterprises because most already use Microsoft 365. Entra ID supports SAML federation to AWS IAM Identity Center, Google Cloud Identity via SAML, and OCI IAM via SAML. Engineers sign in once to Entra ID and get access to all clouds based on group membership. Okta is the second most common choice, especially in organizations that don't have a Microsoft 365 footprint. Okta's integration catalog includes pre-built SAML and SCIM connectors for AWS, Azure, GCP, and OCI. SCIM provisioning means that when you add a user to an Okta group, the corresponding IAM identity is automatically created in each cloud. When you remove them, the accounts are deactivated. The federation architecture maps IdP groups to cloud-specific roles. An Okta group called 'Platform-Engineers' maps to the 'AdministratorAccess' role in AWS, the 'Contributor' role in Azure, the 'roles/editor' role in GCP, and the 'Administrator' group in OCI. The mapping isn't one-to-one across clouds because each provider's permission model differs. AWS uses policy-based IAM with explicit allow and deny. Azure uses RBAC with built-in and custom role definitions. GCP uses IAM roles bound to resource hierarchies. OCI uses compartment-scoped policy statements. Your IdP handles authentication. Each cloud's IAM handles authorization. This separation means that your security team manages who can access which cloud, while each cloud's native tools manage what they can do once they're in. For service-to-service authentication across clouds, you can't use SAML. Instead, use workload identity federation. AWS supports assuming IAM roles using OIDC tokens from external IdPs, including GCP and Azure. GCP supports Workload Identity Federation with AWS and Azure tokens. This means a service running on GKE can access an S3 bucket using its Kubernetes service account token, without storing long-lived AWS credentials.

Multi-cloud Kubernetes: Anthos, Azure Arc, and EKS Anywhere compared honestly

Kubernetes is the most common workload abstraction layer in multi-cloud environments because it provides a consistent API for deploying containers regardless of where the cluster runs. But the three major multi-cloud Kubernetes platforms differ significantly in what they actually deliver. Google Anthos was the first to market and remains the most opinionated. Anthos manages GKE clusters on GCP, Anthos clusters on AWS (running on EC2), Anthos clusters on Azure (running on Azure VMs), Anthos on bare metal, and Anthos on VMware. The management plane runs on GCP, so all clusters phone home to the GCP console for fleet management, policy enforcement, and observability. Anthos Service Mesh (built on Istio) provides cross-cluster service discovery and mTLS. The honest trade-off: Anthos gives you a genuinely unified Kubernetes experience across clouds, but it creates a dependency on GCP's control plane. If GCP's management APIs go down, you can still run workloads on existing clusters, but you can't deploy new configurations or view fleet status. Anthos licensing is expensive, charged per vCPU per month on non-GCP clusters. Azure Arc extends Azure's management plane to Kubernetes clusters running anywhere: other clouds, on-premises, or edge locations. Arc-enabled clusters appear in the Azure portal alongside AKS clusters. You can deploy Azure services like App Service, Azure Functions, and Azure SQL Managed Instance onto Arc-enabled clusters using Arc Extensions. Arc doesn't require a specific distribution. Any CNCF-conformant cluster can be Arc-enabled. The trade-off: Arc's multi-cloud story is thinner than Anthos's. Arc excels at bringing Azure services to non-Azure locations, but it doesn't provide cross-cloud service mesh or unified networking out of the box. AWS EKS Anywhere runs EKS distributions on your own infrastructure: VMware vSphere, bare metal (via Tinkerbell provisioner), Apache CloudStack, or Nutanix. It uses the same Kubernetes version and Amazon Linux node images as managed EKS. EKS Anywhere clusters connect to AWS for centralized management through the EKS Connector. The trade-off: EKS Anywhere is designed for on-premises and hybrid scenarios, not multi-cloud. Running EKS Anywhere on Azure VMs or GCP Compute Engine is technically possible but not a supported configuration. For true multi-cloud Kubernetes, most teams run each cloud's native managed service (EKS, AKS, GKE) and use a GitOps tool like Argo CD or Flux to maintain consistent deployments across clusters. This gives up the unified management plane but avoids vendor lock-in to any single multi-cloud product.

Diagramming multi-cloud architectures clearly

Multi-cloud architecture diagrams fail when they try to show everything in one view. The most effective approach uses layered diagrams: a high-level connectivity diagram showing how clouds connect, then per-cloud detail diagrams showing the workloads within each provider. The connectivity diagram shows each cloud as a large bounded region with its dedicated interconnect or VPN connections drawn between them. Label each connection with its type (ExpressRoute, Direct Connect, Cloud Interconnect), bandwidth, and whether it's primary or failover. Show the colocation facility or network exchange point (Equinix, Megaport) as a distinct node between the clouds if you're using one. Draw the DNS layer at the top showing how traffic routes between clouds, whether through Cloudflare, Route 53 health checks, or Azure Traffic Manager. The identity layer should appear as a separate horizontal band showing the IdP federating to each cloud's IAM. This makes it clear that authentication is centralized while authorization is distributed. For each cloud's detail diagram, use that provider's official icon set and show the internal architecture following that provider's best practices. Don't mix AWS icons with Azure icons on the same diagram, it creates visual confusion. Instead, use consistent connector styles to show cross-cloud traffic: dashed lines for async replication, solid lines for synchronous API calls. Diagrams.so generates multi-cloud architecture diagrams from text descriptions. Describe your cross-cloud topology, specify which workloads run where, and select the primary cloud provider for the initial icon set. The AI renders the connectivity between clouds, places workloads in the correct provider regions, and marks the interconnect types. The output is native .drawio XML that supports editing each cloud's section independently. Architecture warnings flag missing redundancy in cross-cloud connections and single points of failure in the federation layer.

Real-world examples

Generate these diagrams with AI

Related guides