Reflection on the AWS Outage and the Risks of Vendor Concentration

subishrestha
Oct 22, 2025
2 min read

On 20 October 2025, AWS’s US-EAST-1 region experienced a major disruption: the root cause was traced to DNS resolution failure of AWS Amazon DynamoDB service. The effect was a cascade of infrastructure failures including compute launches, database access, identity services where thousands of services globally were impacted. Even after AWS declared “all services returned to normal operations” many customers still processed backlogs and recovery effects lingered.

What caught my attention was not just the cause or the size, but the ripple effect—smart-beds failing, banks unable to process transactions, children unable to access their online classrooms. For many, this may have felt like “just another outage”. But for anyone managing risk, it felt like a fault in the foundation.

The more insidious threat today is vendor concentration:

Single point of failure at scale: many organisations implicitly or explicitly host workloads (or their key SaaS suppliers do) in AWS. So, the impact while localised to the provider region became global for business operations.
Hidden transitive dependencies: a SaaS vendor you trust may lean on AWS’ database, DNS, or edge services—so your risk is not just with your direct supplier but their supplier’s supplier. Few organisations map these dependencies.
Cascades of failure: the root was DNS automation; the effect rippled into compute, database, identity, all because these layers share infrastructure.

Working with clients I often frame it this way: Resilience is not a feature you turn on; it’s an architecture you build consciously.

Practice accountability: Know where your critical services live, what they depend on, and how failure propagates.
Map your dependency gap: Critical Business operation > SaaS vendor > underlying cloud provider > region. Do you know all the links?
Segment and diversify where it matters. I’m not saying “move everything to multiple clouds” – that’s expensive and may be confusing. But pick your critical systems and make sure they either span providers/regions or have fallback plans beyond “single provider path”.
Governance & risk register: vendor concentration risk must sit in your enterprise risk for Board-level visibility and mitigation strategies.
Test: simulate provider outages to keep your organisation in the prepare phase.

Cloud providers have delivered immense economic and operational value such outages are a sober reminder: resilience must be intentionally designed, governed, and funded. The failure modes we are seeing are configuration, automation, control-plane issues amplified by concentration. Organisations that treat vendor concentration as a board-level systemic risk and implement targeted diversification and rigorous dependency testing will be materially more resilient when the next provider incident occurs.

Reflection on the AWS Outage and the Risks of Vendor Concentration

Recent Posts

Comments

Follow