AI-generated code still needs a human touch

Welcome to Runtime! Today: Amazon shows how even the most advanced tech organizations need better guardrails in place during the AI coding boom, Anthropic sues the Department of Defense, and the latest funding rounds in enterprise tech.

Please forward this email to a friend or colleague! If it was forwarded to you, sign up here to get Runtime each week, and if you value independent enterprise tech journalism, click the button below and become a Runtime supporter today.

Become a Runtime Supporter

Under review

One way to tell when a big shift in enterprise technology is under way is that a few lessons learned the hard way over previous years get discarded by less-experienced employees excited by the new thing. Turns out even Amazon isn't above having to go through that process, and that should be a wake-up call for everybody else.

Amazon made the weekly internal meeting for developers working on its retail operation mandatory on Tuesday to address several recent outages that were caused by "Gen-AI assisted changes," according to The Financial Times. Dave Treadwell, senior vice president for eCommerce services at Amazon, told employees the meeting would be a "deep dive into some of the issues that got us here as well as some short immediate term initiatives," according to the report, which was not disputed by Amazon.

"Folks, as you likely know, the availability of the site and related infrastructure has not been good recently," Treadwell wrote, likely referring to a six-hour outage last week caused by "a software code deployment."
That outage had nothing to do with AWS, but the cloud side of the Amazon conglomerate suffered its own outage in December when employees allowed its Kiro development assistant to make changes to a production service without proper review.
Junior and mid-level engineers at Amazon will now have to get a senior engineer to sign off on any proposed changes that were created with AI, according to the report.

It's a little hard to understand how people working inside tech organizations as experienced and sophisticated as Amazon thought they could just let AI-generated code fly into one of the biggest revenue-generating machines ever devised without a proper review. Developers are excited about AI coding tools because they allow them to generate much more code than they could by hand, but we're starting to see the second-order effects of the AI coding boom play out.

It's not clear whether this is still in place across Amazon, but as of 2020 AWS had an extensive CI/CD system that automated much of the software-development pipeline yet still mandated that "all changes going to production start with a code review and must be approved by a team member before merging into the mainline branch." (hat tip to Cindy Sridharan).
One question is whether those code reviewers are drowning in a flood of AI-generated code, and whether Amazon's decision to lay off thousands of people in recent months played a role.
But other users of AI coding tools have noticed that they can fail in weird, unique ways that might not be detectable by a code reviewer looking for the more common ways that people make mistakes.
Treadwell's email nodded to that last point, listing "novel GenAI usage for which best practices and safeguards are not yet fully established" as a "contributing factor" to Amazon's recent problems.

As you might expect, AI companies think your engineers should be using AI to review your AI-generated code. Anthropic launched Code Review in Claude Code on Monday, "which dispatches a team of agents on every [pull request] to catch the bugs that skims miss, built for depth, not speed," the company said in a blog post.

A startup called Crafting introduced a similar product Monday that gives coding agents "production-like environments with real dependencies and real data to operate in and test their code" without actually putting that code into production, according to The New Stack.
However, even Anthropic doesn't use Code Review to actually push changes to its production environment; "It won't approve PRs — that's still a human call — but it closes the gap so reviewers can actually cover what's shipping," it said in the blog post.

No one should be surprised that there will be glitches and a few outright disasters as companies incorporate AI coding agents into their workflows, because that's how the process of incorporating almost every new enterprise tech platform shift has gone.

Those platforms only became stable after developers learned some painful lessons about scale and process, which feels like something Amazon should have already known.
And anybody betting that SaaS companies were going to disappear by the end of the year might want to rethink that assumption, because even with all the progress over the last six months, software development is still very hard.

Claude Court

As expected, Anthropic filed a lawsuit against the Department of Defense Monday seeking a court order overturning the decision to label it a supply chain risk, a step that has never been taken against a U.S. company. While the Pentagon did not attempt to argue in its formal designation that no government contractor should do business with Anthropic — which it originally tried to do — government agencies have been pulling its Claude models out of their systems in response.

The lawsuit also revealed that Anthropic has generated $5 billion in revenue since its inception, and spent $10 billion on model training costs. The company has projected $14 billion in revenue for 2026, according to the AP, but argued in its lawsuit that businesses are starting to get cold feet about new contracts thanks to the Pentagon's order.

On Tuesday, Anthropic received support from an unlikely place: OpenAI champion Microsoft, which filed a brief with the court urging it to block the government's decision when it comes to existing contracts with the military to "enable a more orderly transition and avoid disrupting the American military’s ongoing use of advanced AI," according to CNBC. But Google Cloud rushed to fill the void at the Department of Defense, rolling out Gemini agents for military use, according to Bloomberg.

Enterprise funding

Nscale raised $2 billion in Series C funding, valuing the neocloud at $14.6 billion in what it said was the largest funding round in European history.

Nexthop AI scored $500 million in Series B funding for its data-center networking technology, which was designed for AI workloads.

Eridu launched with $200 million in Series A funding for its own data-center networking technology, which, believe it or not, was also designed for AI workloads.

Armadin launched with $189.9 million in Series A funding for its agent-powered cybersecurity technology, led by former Mandiant founder Kevin Mandia.

Jazz launched (another big launch week) with $61 million in seed and Series A funding for its data-loss prevention technology, which uses agents to inspect business processes.

Validio raised $30 million in Series A funding for its data-management technology.

The Runtime roundup

Oracle beat expectations for revenue and profit and raised its guidance for the current quarter, which sent its stock up more than 8% in after-hours trading.

Nvidia will show off an open-source platform for managing AI agents next week at GTC that is somewhat akin to OpenClaw, according to Wired.

Thanks for reading — see you Thursday!