Don't rank Grok

Welcome to Runtime! Today: xAI's new Grok 4 model looks impressive assuming you can ignore everything else about the company, MCP's security flaws are becoming apparent, and the latest enterprise moves.

(Was this email forwarded to you? Sign up here to get Runtime each week.)

Let that sink in

If there's one thing enterprise tech buyers love more than seafood towers in Las Vegas on a vendor's dime, it's choice. And almost three years into the generative AI boom, anyone trying to build applications around large-language models has an impressive array of choices at their disposal.

Last night, in a rambling press conference of sorts that started an hour late, Elon Musk and xAI unveiled Grok 4, the latest edition of its large-language model. Anyone lucky enough to have woken up Thursday after being in a coma for the last six months might have watched a replay of the presentation and came away impressed that xAI appears to have caught up to OpenAI, Anthropic, and the rest of the frontier model developers.

Two new models were introduced, the single-agent Grok 4 and Grok 4 Heavy, which "spawns multiple agents to work on a problem simultaneously, and then they all compare their work 'like a study group' to find the best answer," Musk said according to Techcrunch.
Artificial Analysis, which tracks the performance of LLMs, said Grok 4 "is now the leading AI model," noting that "Grok 3 scored competitively with the latest models from OpenAI, Anthropic and Google - but Grok 4 is the first time that our Intelligence Index has shown xAI in first place."
And the pricing of the new models is also quite competitive with xAI's rivals, according to a chart assembled by VentureBeat.

Given that Meta is scrambling to overhaul its AI division after the disastrous launch of Llama 4, Grok 4 could be a tempting choice for developers and enterprises looking for competitive but lower-cost alternatives to OpenAI and Anthropic, especially after Microsoft agreed to add Grok to its list of supported models in May at Build. But after the last six months, it's impossible to understand why any serious business would put Grok at the heart of their AI strategy.

After Musk tweaked the algorithm last weekend Grok went on an incredibly racist and antisemitic tirade for several days, eventually dubbing itself "MechaHitler."
That follows an earlier incident in May when Grok suddenly started bemoaning the plight of white South African farmers in response to totally unrelated questions.
And xAI continues to run Grok from a data center in Memphis that is running gas turbines without proper pollution controls, which local officials — dazzled by Musk's billions — have allowed it to do despite a pending lawsuit over the disproportionate impact that pollution is having on the city's Black community.

Most people who have used LLMs understand that you can't necessarily trust the outputs they produce, but Grok is unsafe at any speed for business use. It is subject to the whims of a deeply disturbed man, contributing to a slow-moving environmental disaster in Tennessee, and a questionable steward of corporate data.

None of that will stop companies that would use any tool that delivered better performance or cost savings, of course, and people like self-styled enterprise AI soothsayer Aaron Levie seemed impressed.
But given that generative AI app adoption continues to be a work in progress (at best), companies that are still assessing their options have little reason to stake their reputations on Grok, even for internal use.
As Wharton professor Ethan Mollick put it, "Grok 3 was a very good model, and Grok 4 might be amazing but having a very good model is not enough - there are a lot of really good models out there. You actually want to trust the model you are building on."

Many critical problems

AI developers have swiftly embraced Anthropic's Model Context Protocol as a glue technology that promises to help AI agents tap into data, but the standard remains a work in progress. Like many promising but still evolving technologies, MCP's security could be a lot better.

CSO published a great analysis of the current state of MCP security this week, which included this observation from F5's Lori MacVittie: “MCP is … breaking core security assumptions that we’ve held for a long time.” One big problem is that "MCP also lacks required message signing or verification mechanisms, which allows for message tampering," according to CSO.

And two new MCP vulnerabilities published this week by JFrog and Tenable show how attackers could take over agentic AI systems and wreak all kinds of havoc. "Because LLMs can orchestrate these tools without human oversight, a compromised agent can silently chain together actions — reading files, calling APIs, even triggering infrastructure changes — all under the radar," GitGuardian's Soujanya Ain told Dark Reading.

Enterprise moves

Lauren Nemeth is the new chief revenue officer at New Relic, joining the observability company from Pinecone.

Tiffany Buchanan is the new chief financial officer at Dataminr, as the event-monitoring company looks to improve its "readiness for future capital market opportunities."

Matt Parson is the new chief financial officer at SAS, joining the data-management company after financial leadership roles at ExtraHop and Cloudbees.

Jevan Soo Lenox is the new chief people officer at Writer, following similar roles at Insitro and Stitch Fix.

The Runtime roundup

Amazon might plow another several billion dollars into Anthropic, according to the Financial Times, deepening its ties to OpenAI's rival.

Customers of PJM Interconnect, which manages the largest electrical grid in the U.S., could see rate increases of up 20% thanks to demand for data centers, according to Reuters.

Thanks for reading — see you Saturday!

Don't rank Grok

Buying data centers is easier than building them

Two takes on the future of software development

Tom Krazit

Don't rank Grok

Let that sink in

Many critical problems

Enterprise moves

The Runtime roundup

Read next