Today: As is tradition, AWS released all the news that won't make the re:Invent keynote ahead of time, the Allen Institute for AI introduces a powerful and truly open-source AI model, and the quote of the week.
This era of enterprise software is either the dawn of a new era of corporate productivity or the most hyped money pit since the metaverse. ServiceNow's Amit Zavery talks about the impact of generative AI, how SaaS companies should think about AI models, and his decision to leave Google Cloud.
The speed, flexibility, and efficiency provided by newer data tools have become much more important than simply dumping "Big Data" into a storage facility and running relatively simple analytical queries.
Now that the business world has fully embraced the strategic importance of data, simply collecting and storing that data isn't enough. Forward-thinking companies are embracing a more nimble data strategy, one that enables their businesses to surface that data wherever and whenever internal teams need it to make decisions.
That shift has been under way for several years, but it's starting to gain momentum as data volumes continue to explode and business challenges mount. The speed, flexibility, and efficiency provided by newer data tools have become much more important than simply dumping "Big Data" into a storage facility and running relatively simple analytical queries.
Companies like Snowflake and Databricks provided the foundation, making it much easier to work with large data sets on the cloud, and their products enjoyed widespread adoption. A new crop of startups hopes to find similar success by building tools around that foundation that let customers work with their data in new ways.
"Everybody that needs a data warehouse needs three things that come along with it," said Tristan Handy, founder and CEO of dbt Labs, an early entrant to this emerging space. "You need data ingestion, a way to get data into that warehouse; data transformation, a way to actually take the tremendous amount of data that you eventually dump into the warehouse and turn it into distilled meaning; and then you need some type of analytical tool to actually visualize or otherwise analyze data."
But while businesses know they need to find better ways to work with their data, they're also starting to learn painful lessons from embracing expensive new data tools too quickly without understanding how much they're actually spending, as Snowflake reported last week in its earnings results.
"I do still subscribe to the 'data is the new oil' [philosophy], but in the refinement process of shipping out that new oil, you should probably be making sure that you don't have barrels falling off the back of the truck and that you don't have leaks in your pipes," said Sean Knapp, founder and CEO of Ascend.
Drowning in data
Benn Stancil, chief technology officer of Mode, has probably done more than anyone over the last several years to articulate what efforts to build "the modern data stack" are actually trying to accomplish on his Substack.
"It’s trying to figure out why growth is slowing before tomorrow’s board meeting; it’s getting everyone to agree to the quarterly revenue numbers when different tools and dashboards say different things; it’s sharing product usage data with a customer and them telling you their active user list somehow includes people who left the company six months ago; it’s an angry Slack message from the CEO saying their daily progress report is broken again," he wrote in August 2021, and almost two years later those problems remain.
Understanding corporate data is getting more complex as applications become more complex, and the sheer amount of data is piling up; Theory Ventures' Tomasz Tunguz estimates that data volumes are growing 45% a year. Organizing and processing data often falls to developers or operations teams that don't have a background in data science, or gets pushed to data scientists who don't necessarily know how to program or configure application infrastructure.
The earliest customers for tools like dbt's — which focus on the data transformation leg of the above-mentioned triad — were companies just like it: venture-backed Silicon Valley tech companies that intuitively grasped the need for faster and more flexible data tools five to six years ago, Handy said. But with the rise of the cloud data warehouse vendors, larger companies also started to realize that they needed better tools for data transformation: extracting data collected in one format and turning it into another format for analysis, he said.
Data upstarts that have seen their valuations soar with the rush to modern data tools include Fivetran and Airbyte — which focused on data ingestion, or getting data from apps to a database or data warehouse — and data analytics companies like Tableau and Looker, acquired by Salesforce and Google, respectively, for big money over the past several years.
And some of those companies, like Ascend and Elementl (which just raised $33 million last week), are now betting that customers want to automate several steps in the process of getting data from storage to analytics — known as the data pipeline — into a single tool.
This trend now presents the data industry's version of a question that has plagued CIOs for years: Do I want to use multiple tools that each do one job but do it very well, or a package of tools from a single vendor that might not be best-in-class at each task but work well together?
"We have this saying internally, which is, the 'Hello World' for data pipelines is really easy. The productionalization and operationalization of data products is very, very hard," Knapp said.
Check please
In similar fashion to what has taken place across enterprise tech in 2023, data companies are now seeing the end of what Snowflake CEO Frank Slootman described last week as the "let it rip" mentality when it comes to spending on data products.
Months of layoffs at enterprise tech companies have made it clear that businesses around the world are taking a cautious approach to spending after gorging themselves during the height of the pandemic. While those businesses will still need to modernize their approach to data as they modernize their approach to application infrastructure, they're going to be taking a much closer look at how they proceed.
"We are now seeing companies who have been using dbt for two, three, four years and they're asking questions like, "Gosh, how do I more effectively manage this investment? How do I see what I'm spending on the compute that dbt is orchestrating for me?" Handy said. "We've gone beyond the initial phase of, 'Is this a good idea?' And now we're in the phase of, 'This is such a good idea that we need to give people tools to manage their investments.'"
Knapp agreed.
"Over the last few years, the cost of capital was very low and the reward for innovation was very, very high," he said. "We saw people bias towards innovation. (But) what we've seen folks really turn their attention to over the last nine months is, 'All right, we know we still have to go build a bunch of data products, but we really have got to get our costs under control."
Slowing growth could lead to a lot of consolidation among these new data companies, which might have to rethink their own cost structures as data efficiency becomes important for the first time in the last several years. And 2023's other omnipresent trend is being watched closely in this sector.
These companies, which have raised enormous amounts of money building tools that simplify and automate the steps in the data processing journey, are grappling with the potential that generative AI and large-language models could have on their business models. It's far from clear how that will pan out, but chat-based interfaces to data stores that let anyone inside a business get quick answers to data questions could blow up this entire sector.
"Though I don't believe the tools that make up the modern data stack will fail, the modern data stack as a movement seems incompatible with the rise of AI," Stancil wrote in April. "It's a philosophy that was designed for a world in which reasoning through a data problem with a robot was a fantasy."
Tom Krazit has covered the technology industry for over 20 years, focused on enterprise technology during the rise of cloud computing over the last ten years at Gigaom, Structure and Protocol.
No one really agrees on a strict definition of "agent," but recent breakthroughs in large-language models have allowed companies to build enhanced versions of chatbots that can respond to natural-language queries with a plan of action.
Now that open data formats are here to stay, unifying them to remove incompatibilities will be a challenge. Delta Lake and Iceberg get all the attention, but tech developed by the creators of Hudi could make it happen.
Tech and media leaders are increasingly worried that the push to use generative AI to automate analytical business tasks could produce a generation of workers that never develop the foundation to do the job well at a senior level.
As Snowflake and Databricks gear up for back-to-back user conferences in early June, their customers are increasingly betting on open storage formats that give them new flexibility.