The price of AI

Welcome to Runtime! Today: how cloud providers are scrambling to manage the surging cost of the generative AI boom, why the cloud repatriation movement is so 2021, and the quote of the week.

(Was this email forwarded to you? Sign up here to get Runtime each week.)

On the margins

Over the last 15 years cloud providers got very good at commoditizing the enterprise infrastructure needed to launch and run a business in the 21st century. This year, however, the generative AI boom has knocked them out of their comfort zone.

The cost of providing high-performance AI cloud computing workloads is surging, from the expensive (and hard to find) Nvidia AI chips where the magic happens to the energy needed to run the whole show. It's also clear that AI workloads are evolving differently on the cloud and that the infrastructure used to dial in general-purpose CPU-driven workloads at scale might need to be reinvented for the AI era.

AWS, first in traditional cloud computing but scrambling to catch up in AI, signaled this week that it plans to compete for AI business on price.

“These models are expensive,” Dilip Kumar, vice president of AWS Applications, told Reuters this week. “We’re taking on a lot of that undifferentiated heavy lifting, so as to be able to lower the cost for our customers.”
He was referring to AWS's custom chips for AI workloads, such as its Inferentia and Trainium chips that, you guessed it, process inference and model-training tasks.
AWS was the first cloud provider to design its own CPU for low-cost cloud workloads, and it's clearly hoping it can pull off the same trick to help it avoid paying as much of the Nvidia tax as possible.
Building and maintaining a chip design team is not exactly cheap, but should customers find the results useful the effort could save AWS a ton in the long run.

But everyone is trying to get a handle on AI infrastructure costs.

Google Cloud has been working on custom AI chips for several years, and probably has the most experience of the Big Three with running AI workloads outside Nvidia's orbit.
Two University of Washington researchers with ties to Microsoft just published a description of a "chiplet" architecture that could dramatically reduce the cost of running AI models at scale.
Even IBM is kicking the tires on chips designed internally for AI workloads in hopes of reducing operating costs.

It's still hard to tell how much real end-user demand for AI services, however, and that will obviously have a big impact on the price.

CIOs recently surveyed by Jeffries ranked AI well below kitchen-table enterprise spending priorities like security and application development.
But it's clear there is an AI startup arms race, with money pouring into the sector amid an otherwise dismal time for venture-capital investment.
Startups developing these models can't get their hands on enough AI computing capacity, which combined with GPU rationing should keep prices relatively high for the rest of the year.

Still, regular businesses might balk at the current price tag for those services, especially coming out of a year in which they've been asked to scrutinize every dollar spent on technology.

During the early days of the cloud buildout, vendors continuously one-upped each other with price cuts.
If the Jeffries survey is accurate about CIO ambivalence toward AI, history will likely need to repeat itself before enterprises embrace generative AI as the Shiny New Thing.
But that will be a tricky dance for the cloud providers, who have invested billions in their AI strategies and will need to justify that expense to their own investors at some point.

Enterprise tech moves slower to embrace change — even change that makes a ton of sense — than most people involved would like.

Tech companies have laid the AI FOMO on very thick this year.
To realize its potential sooner rather than later, they're going to need to give customers more incentives to take the plunge without busting their budgets.

Repatriation games

As it is wont to do, Andreessen Horowitz kicked over a hornet's nest two years ago by suggesting that it might be time for established tech companies to build and maintain their own infrastructure, rather than throwing millions at AWS every quarter. The idea of "cloud repatriation" sparked a lot of discussion, but it does not appear to have sparked a data-center renaissance, according to new stats from Synergy Research.

The amount of data center capacity owned and operated by enterprise companies fell from 60% of the world's total capacity in 2017 to 40% in 2022, and looks set to decline further over the next five years. Overall, "...spending on data center hardware and software has only grown by an average 2% per year, while spending on cloud services has ballooned, growing by an average 42% per year to reach $227 billion in 2022," Synergy said.

Big Cloud data centers are not cheap, and construction wouldn't be surging unless underlying demand for cloud services over roll-your-own infrastructure was continuing to accelerate. "On-premise (sic, and sigh) data centers will not disappear any time soon, but their scale is being increasingly dwarfed by hyperscale and colocation companies," researchers wrote.

Quote of the week

"It’s curious to me that they’re choosing to increase the price list at this moment given there’s still a lot of discounting happening, and a lot of quarter end deals and discounts happening.” — Peter Nebel, chief technology officer for applications at Salesforce partner AllCloud, on Salesforce's decision to raise prices this week by up to 9%, in CRN.

The Runtime roundup

Hugging Face is entertaining offers to raise "at least" $200 million at a $4 billion valuation, and VCs are scrambling to make their bids, Forbes reported.

Honeycode, AWS's answer to the no-code movement launched in 2020, is struggling, according to Business Insider.

Thanks for reading — see you Tuesday!