MongoDB hits 8.0; Microsoft's open-source data project
Today on Product Saturday: MongoDB focuses on performance and resilience, Microsoft tackles event handling with a new open-source project, and the quote of the week.
"The more significant decision is, 'what is your strategy for centralizing the data?' Because these data platforms do not do that; they are the place that it gets centralized to, but they do not centralize anything."
There aren't a lot of neuroscientists in enterprise software, but George Fraser is one of them.
Fraser, co-founder and CEO of Fivetran, holds a PhD in neuroscience from the University of Pittsburgh but has spent the last 12 years building links between the original sources of corporate data and the storage facilities that help companies to extract meaning from that data. Fivetran calls them "connectors," and last week the company unveiled its 500th connector that can pull data from a variety of household-name enterprise tech products and move it to the data warehouse of your choice.
Snowflake and Databricks have been the two main beneficiaries of the move to cloud-based data warehouses over the last several years as data stacks continue to evolve. The two rivals compete fiercely for a chance to be the center of their customers' data strategies, but they've become more similar over time, and neither product can fix the errors companies introduce into their data when they try to build their own data pipelines, Fraser said in a recent interview.
"The more significant decision is, 'what is your strategy for centralizing the data?'" Fraser said. "Because these data platforms do not do that; they are the place that it gets centralized to, but they do not centralize anything."
In the interview, Fraser also pronounced the death of Hadoop, compared Fivetran to Stripe, and pledged to remain an independent company through an eventual IPO.
This interview has been edited and condensed for clarity.
How did you get from a healthcare research background into this world of data pipelines and enterprise tech?
It doesn't really make a lot of sense. When we started Fivetran the idea was okay, we're going to build this data analysis platform, and the target market was people like myself; scientists doing the kind of data analysis that scientists do. And we pretty quickly figured out that that was a terrible idea.
We ultimately kind of discovered this unsolved problem that data movement tools were just really terrible. It was a huge problem and nobody really wanted to work on it. I always joke that the secret of our success is that we were the smartest people who were willing to work on this problem.
Historically it's mainly been a DIY sector, people build their own. We're mainly just trying, company by company, to persuade the world, "hey, don't do that." We can sell you standardized connectors that are much better than anything you're going to write yourself.
How have data management strategies changed over the last several years?
In terms of data management in a business context, it's mostly relational data by importance, not by bytes. People always are like, "oh, there's like X petabytes of data." A lot of that is images and stuff that has relatively low value density. Data from Salesforce has, for example, a lot higher value density than the video feed from the camera in your office.
If you look at Snowflake, Databricks, BigQuery, you name it, they all really work the same way under the hood when they're executing queries against relational data, which is a lot of the workload. If you're doing data frame stuff, it's a different creature. But for that core use case, there's been a lot of convergence around the same idea in the last few years.
Hadoop was an insane idea, it's like people waking up after a bender and being like, "what were we thinking?"
Hadoop has really died. Hadoop was an insane idea, it's like people waking up after a bender and being like, "what were we thinking?" It was this weird phase of Google envy, where everyone was like, "if Google did it, it must be smart." Google did some pretty dumb things, I don't have trouble convincing people that now. But back then there was this weird thing where people assumed it was a good idea as everyone was copying Google's technology. And then all the database management system people are looking at it being like, "what are you doing? We've had better ideas than this since the 80s."
Data storage and query platforms have gotten better and better, there's been a lot of convergence in the last 10 years. There's a bunch of great data platforms out there. I think one of the things that we really are now trying to explain to people is people spend all this time weighing should my company prioritize Snowflake, or Databricks, or sometimes BigQuery, or Azure Fabric or the AWS ecosystem, and they're all good.
The more significant decision is, "what is your strategy for centralizing the data?" Because these data platforms do not do that; they are the place that it gets centralized to, but they do not centralize anything. And how you do that is incredibly consequential.
If you do that badly, you will generate so much downstream work from the teams working on these datasets chasing data integrity, bugs, and incomplete datasets. But the problem that actually caused it all is upstream, it's creating a single view of the world. If your connectivity is not good, then everything else suffers.
I mean, it's the problem that we solve, so obviously I wish that people would focus on that. But even if it wasn't, just knowing what I know, everyone should spend less time worrying about whether they use which data warehouse they use, and spend more time thinking about, "how am I going to get all my data in one place and make sure that it's complete and correct and timely?" Because if you do that well, all of these destinations will work for you.
How has the generative AI boom changed the way people use Fivetran, or the way you're developing Fivetran?
We have some customers who are big players in that agency, they have grown a lot. [But] we don't always know what people are doing with the data, which is a funny position. We're the ones who move the data, we don't really know what workloads are running on top of it.
Undoubtedly, lots of people are doing generative AI workloads on top of data that Fivetran delivers. For example, we have a Slack connector that will replicate all of the entire Slack message history for your organization into your data warehouse. I'm sure someone is using it for [generative AI], but they don't feel the need to tell us, we're the ones who centralize the data.
Honestly, it's at an early stage, internal adoption of these things. There's a ton of stuff happening in product development. People embed us; there's a whole category of companies that use Fivetran for data movement under the hood. You see companies building tools or AI-based products that have data connectivity as part of their needs will use Fivetran internally to solve that problem. And sometimes you can't even tell; you'll sign up for a product — there's lots of things that work like this — it says, "okay, connect to your Salesforce to ingest that data," and when you click through, and you're actually clicking through Fivetran. It's sort of like Stripe.
But in general, you think genAI been a growing category as opposed to the more traditional uses of Fivetran?
Yeah, it's really nice, because you can create a much better user experience. Like you say you're going to create a tool for finance teams to understand what's happening in their business instead of a generic BI tool on top of the data warehouse. You can say — this is a real example — I'm going to use Fivetran to get connectivity into NetSuite, Oracle, Salesforce, you name it, all the systems that matter for finance teams. Fivetran will ingest the data into a uniform schema, and then I'll build my product on top of that.
A product company can invest so much more into creating a great user experience than an analytics team operating a BI tool can. No shade on analytics teams, it's just there's only so much you can do as an internal team building an internal tool.
It's going to have much more sophisticated analysis than can be done with generic BI tools and SQL queries, right? A product company can invest so much more into creating a great user experience than an analytics team operating a BI tool can. No shade on analytics teams, it's just there's only so much you can do as an internal team building an internal tool.
But if it's a product that's going to be used by hundreds of companies, absolutely. And then those companies, they get this great user experience, where like you just take it out of the box and there's connectivity to all the systems, which is powered by Fivetran under the hood. It gives you sort of insights that will take an analytics team of 100 to achieve and they'll sell it to you for whatever dollars a month, those things are great. And I think we'll see more and more of those over time.
You've raised a substantial amount of money, and obviously the IPO market has been relatively closed for a while. As you look over the coming year, do you have a vision for where you want to be?
The goal is to build a long standing independent company, and that means going public when the time is right. We think there's a lot of value to the world in having a data movement vendor that is not tied to any one of the platforms and hyperscalers, because real large companies have lots of data platforms
I think the number one thing we're trying to do right now is get people to focus on this decision: Think hard not just about where does your data go, or where do you analyze it, but think about your data pipeline, think about data movement. That's actually the most consequential decision you're going to make in your data stack, because if you do it well or badly, the benefit or cost of that will ripple down to all the other layers multiplied by 1000.