It's always us-east-1

Today: Monday's big AWS outage illustrates yet again why us-east-1 is the biggest problem in cloud computing, further evidence that bitcoin mining is so very 2019 these days, and the latest funding rounds in enterprise tech.

It's always us-east-1
Photo by Matthew Lancaster / Unsplash

Welcome to Runtime! Today: Monday's big AWS outage illustrates yet again why us-east-1 is the biggest problem in cloud computing, further evidence that bitcoin mining is so very 2019 these days, and the latest funding rounds in enterprise tech.

(Please forward this email to a friend or colleague! And if it was forwarded to you, sign up here to get Runtime each week.)


Virginia grim

As the world's largest cloud infrastructure provider, AWS's success is a double-edged sword: when something goes wrong, everyone notices very quickly. But it's also clear after sifting through the fallout from a massive outage Monday that both AWS and its customers are still putting too many of their eggs into one very old and very worn basket.

Major services such as Canva, Reddit, and even Amazon.com itself went down for several hours Monday after an early-morning outage linked to AWS's flagship data center complex in Northern Virginia, known as us-east-1. It took most of the day for sites and services dependent on that region to recover, and appeared to be the worst AWS outage since 2021.

  • The issue was traced to the DNS configuration for AWS DynamoDB, which prompted hundreds of people who know exactly one cloud-outage joke to activate their internet posting machines.
  • DNS is the system that allows computers to find each other on the internet, and if us-east-1's DynamoDB service suddenly drops off the internet, any application looking for data from that database in that region is going to have problems.
  • However, after resolving the DNS issue in about 3 hours, AWS continued to struggle with a cascading series of problems in us-east-1 that affected its core EC2 computing service as well as its Network Load Balancer service.
  • After reporting the initial disruption at 11:49pm Sunday night Pacific Time, AWS announced that services were operating normally as of 3:01pm Monday, which is a very, very long time.

Any time something goes wrong at AWS it's pretty safe to assume it involves us-east-1, which was the first location in the U.S. launched nearly two decades ago and remains the default region for AWS customers that don't specify which region they'd like to use to run their apps. However, despite a long-running history of issues, Monday's outage shows that far too many AWS customers (including AWS itself) don't have adequate failover strategies in place when something inevitably breaks.

  • "It’s possible to keep running when a cloud region goes down," noted former Netflix and AWS engineer Adrian Cockcroft on LinkedIn, pointing to Netflix and Capital One as major AWS customers that appeared to do just fine amid Monday's chaos.
  • AWS (and all cloud providers) offer multiple regions inside the U.S. and several availability zones within those regions, so if something goes wrong in a particular region customers can limit the damage.
  • However, re-architecting an application to run across multiple zones is not easy, and those configurations can be more expensive when everything is working, which is the vast majority of the time.
  • Still, it's getting really hard to understand why AWS itself runs so many of its core services in us-east-1, which in Monday's case helped extend the outage to AWS services that relied on the us-east-1 version of DynamoDB.

AWS will probably release a more detailed report on the factors that caused the outage over the next day or so, which might not quiet critics who should know better but helps enforce the culture of reliability that is a huge part of cloud infrastructure computing. The incident comes at an interesting time for AWS, which has been growing much more slowly than its rivals during the AI boom.

  • It also appears to be the first major incident under CEO Matt Garman, who took over the top job last June.
  • Over at The Register, Corey Quinn of the Duckbill Group suggested that a "brain drain" has thinned AWS's ranks of experienced engineers following years of layoffs and return-to-office mandates and could have exacerbated the outage, which is a bit much without any direct knowledge of what happened.
  • But it's also clear that AWS has yet to deal with its us-east-1 issues despite many wake-up calls, and an increased focus on cost-cutting won't help.

Coin operated

CoreWeave is probably the most well-known example of the bitcoin mining community pivoting to AI amid a once-in-a-generation tech gold rush, but there are many such cases. CleanSpark announced plans Monday to join the party, sending the price of its stock up 15% according to Bloomberg.

CleanSpark, which touts itself as "America's Bitcoin Miner," has 1.03 gigawatts of power under contract as of last month and operates data centers in four U.S. states. That's not a lot of capacity for the AI industry, which is talking about deploying gigawatt-scale data centers to service its needs, but for the time being demand is still well ahead of supply.

Core Scientific is another bitcoin-mining operation that had AI aspirations, but a proposed deal with CoreWeave has faced pushback from investors who believe CoreWeave is undervaluing the company. For its part, CoreWeave CEO Michael Intrator told CNBC Tuesday that Core Scientific is "a nice-to-have" as opposed to an integral part of its strategy, which made it sound like Core Scientific will soon be chasing AI business on its own.


Enterprise funding

Deel raised $300 million in Series E funding, valuing the payroll service provider at $17.3 billion.

LangChain scored $125 million in Series B funding for its agent-development platform.

Anrok landed $55 million in Series C funding as it continues to develop automated tax-compliance software.

Serval raised $47 million in Series A funding for its IT automation software, which uses agents to help companies process help-desk tickets.

Hyro landed $45 million in new funding to further develop agentic AI software for healthcare companies.

Keycard launched with $38 million in seed and Series A funding for its identity management software, which works to verify the authenticity of AI agents.


The Runtime roundup

Microsoft CEO Satya Nadella made $96.5 million in total compensation during Microsoft's last fiscal year, a 22% bump that tracks with the 23% increase in the value of its shares over that period of time.

Anthropic and Google Cloud are in talks about a new infrastructure computing deal, according to Bloomberg, which probably isn't a headline AWS executives wanted to see after a couple of rough days.


Thanks for reading — see you Thursday!

Great! You’ve successfully signed up.

Welcome back! You've successfully signed in.

You've successfully subscribed to Runtime.

Success! Check your email for magic link to sign-in.

Success! Your billing info has been updated.

Your billing was not updated.