The Data Readiness Myth

"We're not ready for AI. Our data isn't clean enough." We hear this in almost every enterprise conversation. It's partly true, partly an excuse, and entirely the wrong framing.

What You Need to Know

"Our data isn't ready" is the most common reason enterprises delay AI adoption, but it's usually a misdiagnosis of the actual problem.
You don't need perfect data. You need data that's good enough for the specific capability you're building. A claims triage model has different data requirements than a customer sentiment tool.
The biggest data problem isn't quality. It's accessibility. Knowledge trapped in SharePoint, email, legacy systems, and people's heads can't be used by any model, no matter how good.
Waiting for "data readiness" before starting AI is like waiting for perfect fitness before exercising. The activity itself improves the condition.
Start with the data you have, for the problem you've chosen, and let the AI initiative drive data improvement.

73%

of enterprise AI initiatives delayed by data quality issues

Source: Gartner, Top Strategic Technology Trends for 2023, October 2022

The Three Data Problems (Only One Is Really About Data)

When enterprises say "our data isn't ready," they're usually conflating three different problems:

1. Data Quality: The Real But Overstated Problem

Yes, enterprise data is messy. Duplicate records, inconsistent formats, missing fields, outdated entries. This is real, and for certain AI applications (predictive analytics, financial modelling, compliance reporting) it matters enormously.

But for the generative AI capabilities that most enterprises are exploring right now? Data quality matters less than you think. Large language models are remarkably tolerant of messy, unstructured data. They can extract meaning from poorly formatted documents, inconsistent naming conventions, and even partial information.

The question isn't "is our data perfect?" It's "is our data good enough for this specific use case?"

2. Data Accessibility: The Actual Bottleneck

This is the real problem, and it has nothing to do with data quality.

In a typical enterprise, knowledge lives in:

SharePoint sites that nobody can navigate
Email threads that only the participants can find
Legacy systems with APIs that predate REST
PDF documents that are technically digital but functionally paper
People's heads (the most inaccessible database of all)

average number of SaaS applications used per enterprise department

Source: Productiv, State of SaaS Spend Report, 2023

The best AI model in the world can't use knowledge it can't access. Before you worry about data quality, ask a more basic question: can your AI actually reach the data it needs?

This is an integration problem, not a data quality problem. And it's solvable, with document processing pipelines, API integrations, knowledge extraction from legacy systems, and structured approaches to capturing tacit knowledge.

3. Data Governance: The Permission Problem

The third barrier is organisational, not engineering. Even when data exists and is accessible, enterprises often lack clear answers to:

Who owns this data?
Who's allowed to use it for AI purposes?
What consent or privacy constraints apply?
Where does the AI-processed output go?

These governance questions need answers before any AI deployment. But they don't require perfect answers. They require starting answers that evolve as you learn.

The "Start Where You Are" Approach

Instead of waiting for data nirvana, here's what actually works:

Pick one capability with modest data requirements. Document extraction, for example, works with the documents you already have. Knowledge retrieval works with the knowledge bases you already maintain. Triage and routing works with existing workflow data. You don't need new data. You need to make existing data accessible.

Build the data pipeline as part of the first AI capability. The document processing pipeline you build for claims intelligence will serve fraud detection, customer communication, and compliance, if you build it as shared infrastructure from the start.

Let the AI initiative drive data improvement. Nothing motivates data cleanup like a working system that gets better as data improves. When the claims team sees that the AI handles structured claims perfectly but struggles with handwritten notes, they have a concrete reason to digitise those notes. The AI creates the business case for data investment.

Measure data readiness per capability, not globally. A global "data readiness assessment" is overwhelming and paralyzing. A per-capability assessment is specific and actionable. "Can we access 80% of claims documents from the last 2 years in a machine-readable format?" is a question you can answer by Friday.

When Data Quality Actually Matters

To be clear, there are contexts where data quality is non-negotiable:

Financial decisions: lending, pricing, risk scoring. Garbage in, liability out.
Clinical and safety contexts: healthcare triage, safety systems. Wrong data means wrong outcomes.
Compliance and reporting: regulatory submissions, audit trails. Data must be accurate and complete.

For these use cases, invest in data quality first. But even here, the investment should be scoped to the specific data the AI needs, not a company-wide data transformation programme.

For the first generation of enterprise generative AI capabilities (knowledge retrieval, document processing, workflow acceleration) start with what you have and improve as you go.

The 80% Rule

If you can access 80% of the data you need, in a format the AI can process, you have enough to start building. The remaining 20% will become clear once you're in production, and the business case for fixing it will be obvious.

How do we know if our data is "good enough" for a specific AI use case?: Run a rapid assessment: Can you access the data programmatically? Is it in a format an AI model can process (text, structured records, documents)? Does it cover at least 80% of the scenarios the AI needs to handle? If yes to all three, you're ready to build.
Should we invest in a data platform before starting AI?: No, unless you already have a data platform initiative underway. Building a data platform "for AI" without a specific AI use case is a recipe for an expensive, underutilised asset. Build the data infrastructure you need for capability #1, design it to be reusable, and expand from there.
What about data privacy and AI?: Enterprise AI deployments should use private, controlled environments, not consumer tools. Your data never leaves your infrastructure, and access controls apply to AI just like they apply to human users. Start with clear policies, deploy with proper controls, and refine as you learn.