Newsletter
Anthropic moves the AI coding race forward
Today: Anthropic's new Claude models deliver what the company says is the best AI coding performance on the market, how two employees of a security software company pulled off a massive breach of federal data, and the latest moves in enterprise tech.
Welcome to Runtime! Today: Anthropic's new Claude models deliver what the company says is the best AI coding performance on the market, how two employees of a security software company pulled off a massive breach of federal data, and the latest moves in enterprise tech.
(Was this email forwarded to you? Sign up here to get Runtime each week.)
Magnum Opus?
At this point it's becoming clear that AI models and coding tools will be used to build or retrofit a sizable portion of the world's software over the next several years. As the sophistication of both the underlying models and the code editors that employ them continues to increase, that transition could happen sooner rather than later.
Anthropic unveiled its latest Claude models Thursday, declaring that Claude Opus 4 "is our most powerful model yet and the best coding model in the world, leading on SWE-bench (72.5%) and Terminal-bench (43.2%)." While benchmarks were made to be gamed, GitHub's decision to make the other model introduced Tuesday — the less-powerful but cheaper Claude Sonnet 4 model — the default model for the new coding agent it introduced Monday speaks volumes, given Microsoft's close-but-fraying relationship with OpenAI.
- The major breakthrough with the new models appears to be their ability to complete tasks unsupervised over long periods of time, as compared to earlier generations that glitched out pretty fast.
- "Rakuten validated its capabilities with a demanding open-source refactor running independently for 7 hours with sustained performance," Anthropic said in its launch blog post.
- Anthropic pulled this off by allowing developers to point applications based around Claude to locally stored data, which allows "the models can create and update 'memory files' to track progress and things they deem important over time," Ars Technica reported.
- The company also announced Thursday that its own vibe-coding service, Claude Code, is now generally available, and developers using Visual Studio Code or Jetbrains will be able to add Claude Code directly into their coding editors.
Anthropic argued that breakthrough represents a real step toward agentic coding, given that while they still require a fair amount of upfront configuration work to make them sing, developers can let them run in the background with far less supervision. “The more agents are able to go ahead and do something over extended periods of time, the more helpful they will be, if I have to intervene less and less,” DeepFlow's Stefano Albrecht told MIT Tech Review.
- So much of the discussion about AI agents in software development focuses on using them to create net-new software, but there are so many opportunities to use coding agents to update older software that would otherwise be ignored given the time required to get results, such as the Rakuten example.
- If the agents can be trusted to the point where developers can point them at a code base and assign the agent a ton of boring, time-consuming maintenance work, they'll have a lot more time to spend building new projects for their companies.
- Vendors and developers have been hoping that generative AI models and tools could be directed at this problem for several years, but only the most sophisticated engineering organizations have been able to pull it off, like Amazon did in 2024.
- As developers put the new models through real-world testing, we'll get a sense of whether Anthropic has figured out how to help customers eliminate a ton of technical debt.
Software development is probably the most potentially lucrative use for generative AI technology we've seen to date, and Anthropic's rival model developers are almost certainly working on models that could match or exceed its performance on coding tasks over the rest of the year. And one decision Anthropic made in the run-up to the launch of the new Claude 4 models underscored the growing competition.
- Cursor was prominently included in Anthropic's launch event Thursday, but its rival, Windsurf, was left out.
- "Unfortunately, Anthropic did not provide our users direct access to Claude Sonnet 4 and Opus 4 on day one," Windsurf CEO Varun Mohan said on X. "We are actively working to find capacity elsewhere so we can continue to provide the most versatile and powerful AI assistance platform, period."
- That's almost certainly because Windsurf is hanging in limbo at the moment after Bloomberg reported earlier this month that OpenAI has reached a deal to acquire it for $3 billion, but neither company has confirmed they have a final agreement.
- The early days of AI coding models and tools were fairly collegial by most standards, but that probably won't last as the money starts rolling in.
Inside job
Twin brothers working for government cybersecurity contractor Opexus used their position inside the company to steal and alter data belonging to several government agencies during 2023 and 2024, including the IRS, Bloomberg reported Wednesday. The incident underscores that while so much attention in recent years has been paid to software supply chain security threats, insider threats remain a huge problem.
"The damage attributed to the brothers includes the destruction of more than 30 databases and the removal of more than 1,800 files related to one government project," Bloomberg reported, citing Opexus' own report on the incident. For what it's worth, Muneeb and Suhaib Akhter told Bloomberg they didn't do anything wrong, but it's unclear how they were ever hired in the first place after being convicted of hacking crimes a decade ago while working for federal agencies.
Enterprise moves
Woodson Martin is the new CEO of Outsystems, joining the low-code company after 18 years at Salesforce, most recently as executive vice president and general manager of Salesforce AppExchange.
Geir Engdahl and James Sirota are the new chief technology officer, AI, and global head of engineering, respectively, at Cognite.
Christophe Frenet is the new chief product officer at Dashline, joining the password-management company after six years in a similar role at Botify.
Michael Campbell is the new chief product officer at Hyland, the latest new face in a recent wave of executive turnover at the content-management company.
The Runtime roundup
Snowflake's stock rose more than 13% Thursday one day after reporting earnings that beat Wall Street expectations and raising its guidance for the full year.
The House of Representatives passed a budget bill that could prevent states from regulating AI in any fashion for up to a decade, and while it needs Senate approval to become law, given how pervasive AI has become in software over the last couple of years the measure could prevent all kinds of tech regulation.
Thanks for reading — see you Saturday!