MongoDB hits 8.0; Microsoft's open-source data project
Today on Product Saturday: MongoDB focuses on performance and resilience, Microsoft tackles event handling with a new open-source project, and the quote of the week.
Every company understands the value of their corporate data, but it's easy to lose track of priorities when trying to update their toolsets, especially during the generative AI frenzy. Here's how eight experts think companies should navigate the tricky road to the modern data stack.
Every company understands the value of their corporate data, but it's easy to lose track of priorities when trying to update their toolsets, especially during the generative AI frenzy. Here's how eight experts think companies should navigate the tricky road to the modern data stack.
General Partner, Felicis
There are two themes that enterprises can improve upon when modernizing their data stack: 1) data quality and monitoring and 2) data as a product.
Companies spend a significant amount of time, money, and resources collecting data and storing it. Data remains a critical input to business decisions. However, we often find that teams don’t have the instrumentation in place to know if their data is high-quality and accurate. How do you know the data you’re using is right? The need to monitor data for accuracy and correctness is increasingly important. We recommend teams adopt real-time data observability and monitoring systems. This need becomes more important as teams curate data for training and fine-tuning AI models.
Often, data lives in a walled garden, limiting its utility. Making data into a product with dedicated owners responsible for self-service via APIs that have SLAs, strong ergonomics, and access control makes data more accessible and valuable. This enables teams to better leverage existing assets, move faster, and innovate more independently while maintaining governance.
SVP, Global Technology Services, TEKsystems
A big mistake occurring across enterprises is the failure to understand the sequential significance to the data and AI dynamic. In simple terms, put data first, and AI second. Without dignifying this sequence, leaders fall into FOMO in attempts to grasp at AI-driven cures to either competitive or budget pressures — and they jump straight to AI tool adoption before conducting any sort of honest self-assessment as to the health and readiness of their data estate.
This phenomenon is not unlike the cloud migration craze of about seven years ago, when we saw many organizations jumping straight to cloud-native services (after hasty lifts-and-shifts, mind you) — all prior to assessing or refactoring any of the target workloads. This sequential dysfunction results in poor downstream app performance since architectural flaws in the legacy on-prem state are repeated in the cloud.
Fast-forward to today, AI is a great “truth serum” informing us on the quality/maturity/stability of a given organization’s existing data estate — but instead of facing unflattering truths, invest in holistic AI data readiness first, before AI tools.
VP of Digital Workplace Services, GenAI, Kyndryl
One of the most common mistakes is the lack of identification of correct stakeholders and existing data dependencies. Often, decision makers don’t have the colloquial knowledge of how a current stack has developed organically over years or decades. Especially in large environments that have grown over time, not all the modern forensic techniques of determining data access are available. This leads to the loss of institutional learnings, lack of appropriate security controls, or an oversimplified approach that doesn’t truly modernize the data architecture.
Pruning data, developing the right concentric circles of access around your most precious data by employee persona, and ensuring data quality are all key. With eagerness to adopt AI, many are missing that good AI with bad data ends up creating unusable insights faster. More importantly the overall cost of the application and decisions built on that data will be higher. The generative AI investment requirement is high, but a strong data foundation will right-size energy cost, storage, and the support of leveraging valuable data for deeper insights to your business.
CIO, Juniper Networks
The biggest mistake is believing that you have to move all the data into the new stack all at once. The process takes too long. Instead, focus on proving the new stack’s value by solving a problem that couldn't be solved before or by increasing time-to-value on a given use case and set of data. Then, migrate from the old stack to the new one based on the highest value and most in-demand use cases. Additionally, incentivize users to transition by gradually increasing the cost of the old platform to encourage any stragglers to join the party and adopt the new stack.
CIO, OutSystems
Enterprises often make the mistake of not fully understanding their organization’s data readiness. Many worry that data is AI's Achilles' heel. Do we have enough data? Are we aware of its source and lineage? It's essential to recognize that AI processes are neither uniform nor comparable in many instances.
AI-ready data goes beyond standalone metrics and provenance assessments. Ensuring data readiness is an ongoing journey, and without a deep understanding of this dynamic process, our efforts at data assurance might amount to nothing substantial. CIOs must focus on ensuring that their organization’s data fully represents the problem they’re trying to solve. Metadata is crucial to this as it explains the meaning of the data.
While algorithms or large language models (LLMs) are available off-the-shelf, the differentiation lies in the data. Instead of fixating on data ownership, CIOs must focus on use cases and patterns.
CTO, Intuit
A common mistake that enterprises make is thinking that modernizing the data stack is about modernizing the tech stack. It’s not. In fact, a successful, modern data environment isn’t merely a tech stack; it’s a shared cultural understanding of the importance and subtlety of data — and the built-in processes to make data production and consumption as natural as breathing. Enterprises also neglect key parts of the lifecycle of data. It neither begins nor ends in the lake.
Producing clean, governed, and well-structured data begins with every user action and external connection, and delivers through any consumption mode in software development, analytics, and reporting. Finally, enterprises might also fail to account for one or more of the following aspects when modernizing their data stacks: quality, lineage, ownership, documentation, change management, usability, and/or duplication.
Chief Product and Technology Officer, FICO
Many companies are moving to cloud-based tools for ingestion, data warehouse, data pipelines, and business intelligence. However, just because you have a modern data stack doesn't mean you can deliver business value.
When companies embark on the journey to modernize their data stacks, forgetting to focus on data management can be a significant pitfall. Data modeling, data governance, and data quality act as the blueprint for structuring and organizing data to ensure its accuracy, integrity, efficiency, and usability in generative AI models.
Without a thoughtful approach to data management, GenAI models won’t have accurate patterns, trends, or relationships to learn from, not only inhibiting a company’s ability to derive insights from their data, but potentially resulting in legal and compliance issues. Because the impact of poor data quality on GenAI models can be significant, it’s important to have a comprehensive data management strategy in place to not only fully understand the complexity and interconnectedness of the data, but to ensure its quality and that it is governed and tagged correctly.
CIO and Chief Digital Officer, CNH Industrial
When leveraging generative AI, whether it be creating your own large language model (LLM) or using a pre-existing model, companies must not overlook data hygiene. This starts with evaluating the quality of the data, whether it be structured or unstructured, and preparing it to produce the best results. After all, quality inputs yield quality outputs. Data owners should also take the time to understand the data and review the completeness of the master data definitions.
Organization is key. For example, if a data owner wanted to create an LLM of customer data and transaction history, they would need to have standard naming conventions for name, address, phone number, customer type, and so on to ensure accurate answers. This same level of detail should also apply when reviewing the data catalog and annotating the metadata. Ensuring that all data sets included in the LLM are relevant will limit the potential of false positives. The more detailed and thorough data owners are in preparation, the better the AI model will perform.