Newsletter
Table stakes: Does data format unity matter?
Today: Ahead of this year's Snowflake/Databricks annual events, a look at the push to unify Delta Lake and Iceberg, Salesforce snaps up Informatica, and the latest funding rounds in enterprise tech.
Welcome to Runtime! Today: Ahead of this year's Snowflake/Databricks annual events, a look at the push to unify Delta Lake and Iceberg, Salesforce snaps up Informatica, and the latest funding rounds in enterprise tech.
(Was this email forwarded to you? Sign up here to get Runtime each week.)
Catalog of bits
Format wars tend to produce a winner; think VHS over Betamax, Blu-Ray over HD-DVD, or LTE over WiMax. A clear winner makes it easier for the late adopters to know they're picking the right horse, but creates a real problem for early adopters that aligned themselves with the losing format.
Another option is to forge compatibility between dueling formats, which is exactly what the cloud data management industry is currently trying to do with Delta Lake — created by Databricks — and Iceberg, which counts support from Databricks' main rival Snowflake as well as a parade of other companies that have endorsed the format. Last year at his company's Data & AI Summit, Databricks CEO Ali Ghodsi devoted several minutes of his post-keynote press conference evangelizing the idea of a "USB-C format" for data.
- Key to that idea was Databricks' acquisition of Tabular, which was founded by the creators of Iceberg: "Our ulterior motive with [the deal] is to bring the formats closer so that we can get interoperability between these two formats, Apache Iceberg and Delta Lake," he said last year.
- In a recent interview with Runtime, Ghodsi reiterated that goal. "I think I'm way more bullish on this project now, on this sort of grand unification of formats, than I was when we even did the acquisition," he said.
- But the formats were developed for different use cases using different specifications, and achieving a lowest-common denominator format might actually hurt the performance of certain types of applications.
- And data engineers are increasingly adopting catalogs and other emerging technologies that allow companies to work with their data without having to worry about formats.
Delta Lake and Iceberg are implementations of Parquet, an open-source "column-oriented data file format" maintained by the Apache Foundation. The formats are known as table formats, which means they operate on top of file formats to make it easier for data query engines inside products like Databricks and Snowflake to read from and write to data warehouses and data lakes by adding context about the content of files.
Delta Lake and Iceberg are more similar than they are different, and the data community has made great strides toward bringing the formats closer together over the last several years. And the latest version of Iceberg, which has yet to be released but is almost fully baked, addresses a lot of previous incompatibility issues, Ghodsi said.
- But one thorny problem remains; when it comes to writing data to the storage layer, Delta Lake and Iceberg use incompatible techniques that were designed to achieve different goals.
- "Customers don't want to choose one format; they need fast writes and need fast reads," said Chris Child, vice president of product at Snowflake, in a recent interview. "Our view is that the more we can standardize, the better it becomes."
While Ghodsi remains committed to working out the remaining compatibility issues between Delta Lake and Iceberg, he acknowledged that efforts like Tabular's support for the REST catalog protocol, which allows different catalogs to talk to each other, could have a more lasting impact.
- "One thing that's really interesting here is that more and more capabilities going forward can be put on this catalog interface," he said.
- "Maybe all that's needed is you talk to the catalog, you tell it, 'I want this data', and then it gives it back to you in the format you want it. And how did it actually store it behind that catalog? You don't need to know, as long as it's serving you what you want."
Read the rest of the full story on Runtime here.
One year after kicking the tires on Informatica, Salesforce announced Tuesday it has a deal in place to acquire the data-management company for $8 billion. The company said in a press release that it hopes to use Informatica's data tools to help Salesforce customers embrace its Agentforce service, which like a lot of agentic enterprise software has yet to get off the ground.
Informatica's software helps companies organize corporate data spread across different applications and operating environments, including cloud providers. It basically invented the "extract, transform, load" process back in 1993, but had struggled to grow over the last several years as newer data tools captured the attention of the market.
Still, Informatica has a lot of experience helping other companies get their data in order, which is one of the biggest problems that companies hoping to build generative AI apps have run into over the last year. Salesforce CEO Marc Benioff spent countless hours over the last year insisting his company had the tools to bring enterprises into the agentic AI era, but must have realized at some recent point that it was going to need more help.
Enterprise funding
Superblocks raised $23 million in additional Series A funding to continue its work on a chat-based software development agent focused on security.
Pixee landed $15 million in Series A funding for its software security agent, which scans a company's code base and looks for vulnerabilities.
Traceloop raised $6.1 million in seed funding and launched its flagship product, which helps companies test their AI agents before putting them into production.
Bito scored $5.7 million in additional seed funding for its AI Code Review Agent, which does pretty much what the name of the product suggests.
The Runtime roundup
Security researchers found a serious prompt-injection vulnerability in GitHub's MCP server, underscoring that while enterprise tech increasingly believes MCP will help companies deploy AI agents, it has several security challenges that will need to be solved before it can assume that role.
Twelve prominent leaders have left CISA in the last month, according to Cybersecurity Dive, which will complicate its mission to keep U.S. government and corporate assets as secure as possible.
Thanks for reading — see you Thursday!