Data quality for AI: why quality in = quality out (and how to assess what you’re working with)

Date

April 20, 2026

Hot topics 🔥

AI & TechHow-to Guides

Contributor

Mario Grunitz

Summarize with AI:

The conversation about AI in business tends to start with tools. Which platform, which model, which subscription to approve. In our experience, that’s the second conversation worth having. The first, and the one that most determines whether a project succeeds, is about data: what you actually have, and whether any of it is ready to be useful.

When AI projects disappoint, the post-mortem tends to focus on the model, the vendor, or the implementation timeline. Yet the evidence is consistent, and the pattern in our own client work confirms it: the gap between AI ambition and AI results is almost always a data gap.

A 2025 Gartner report found that 63% of organisations either do not have, or are unsure whether they have, the right data management practices for AI. Gartner also predicts that through 2026, organisations will abandon 60% of AI projects unsupported by AI-ready data. These are not edge cases or early-adopter mistakes. They represent the dominant experience of organisations that moved straight to tools before understanding what their data could actually support.

Why data quality is a business problem, not a technical one

The phrase “quality in, quality out” tends to get filed under technical concerns, something for the data team to sort out before the real work begins. In practice, it’s a business and decision-making challenge long before it becomes a technical one.

When a model produces unreliable outputs, the instinct is to question the model. The more likely explanation is that the data feeding it was incomplete, outdated, inconsistent, or simply not relevant to the task. According to the IBM Institute for Business Value, only 29% of technology leaders are confident their enterprise data meets the quality, accessibility and security standards needed to scale generative AI. A separate IBM study of 1,700 Chief Data Officers found that only 26% believe their data can support new AI-enabled revenue streams, despite the majority having data strategies formally integrated with their technology roadmaps.

The Informatica CDO Insights 2025 survey puts data quality and readiness as the number-one obstacle to AI success, cited by 43% of organisations, ahead of technical maturity and skills gaps. This tells us something important: the organisations closest to enterprise data, the people whose job it is to manage it, identify readiness as the primary constraint. Managers investing in AI tools deserve the same clarity before they begin.

The three-category sorting exercise

The practical starting point we use with clients comes from our AI in Practice workbook, and it’s deliberately low-tech. Before any scoping, tool evaluation, or use case definition, we ask teams to sort their available data into three categories.

Useful data is complete, current, consistently formatted, and accessible without significant manual effort. It can be fed into a process or model with reasonable confidence in the output.
Partial data exists and has value, but requires work before it’s usable. This might mean incomplete fields, inconsistent formats across systems, or data that’s accessible in theory but siloed in practice.
Unusable data is outdated beyond relevance, so unstructured that processing it is impractical for the use case at hand, or simply not connected to the problem being solved.

The sorting exercise takes an afternoon with the right people in the room. Its value is proportional to how honestly the team engages with it, particularly with the partial and unusable categories, which tend to be more revealing than the useful column. The full exercise, with worked prompts for each category, is in Chapter 3 of the AI in Practice workbook.

The four questions we ask about every data source

Once the sorting exercise is complete, we go deeper on each data source that lands in the useful or partial categories. These are the four questions that shape every data conversation we have at the start of a new engagement.

Question	What to look for	Red flag
Is it up to date?	Data reflects current reality and is refreshed at a cadence that matches the use case	Data is months or years old and hasn’t been maintained
Is it reliable?	Consistent input processes, clear ownership, low error rate	Multiple teams entering data differently, no single source of truth
Is it structured or free-form?	Organised in fields or formats a system can process without heavy manual handling	Dense free-text, PDFs, inconsistent naming conventions
Is it easily accessible?	Available without significant extraction effort or dependency on another team	Locked in legacy systems, requires manual export, or lives in someone’s spreadsheet

All these questions require honesty and access to the people who actually work with the data day to day. In our experience, the most useful person in the room during a data audit is often an operations manager or a sales administrator, not an analyst, because they know exactly where the gaps are.

A note on sensitive data

Every data audit surfaces some data that requires a different kind of attention: personal records, commercially sensitive information, or data subject to regulatory requirements such as GDPR. The practical reason to identify this early is straightforward. If a proposed use case depends on data you cannot legally or ethically feed into an AI model, that’s a scoping constraint you need to know about before the project is designed around it.

This is worth a focused conversation during the sorting exercise, separate from the quality questions. Not as a legal review, but as a realistic assessment of what’s actually available to work with.

Where to go from here

A data audit is a starting point, and the goal is clarity rather than perfection. Most organisations will find a mix of useful data they can act on now, partial data worth investing in, and gaps worth acknowledging before they become surprises mid-project. That picture is far more useful than a tool evaluation.

The full sorting exercise and data quality prompts are in our AI in Practice workbook, available to download free. It’s designed for managers who want to move into AI with a clear view of what they’re actually working with. The first question we ask a new client is what data they’re sitting on and whether it’s ready to do anything useful.

If you’d like to work through that assessment with your team, our AI strategists can help you move from a data audit to a scoped, ready-to-run use case. Get in touch to find out how we work.

Download the free workbook →

SaveSaved

Summarize with AI:

Mario Grunitz

Mario is a Strategy Lead and Co-founder of WeAreBrain, bringing over 20 years of rich and diverse experience in the technology sector. His passion for creating meaningful change through technology has positioned him as a thought leader and trusted advisor in the tech community, pushing the boundaries of digital innovation and shaping the future of AI.

How we built an AI guide for managers (and why implementation has nothing to do with tools)

The four structural AI challenges that determine whether implementation works

Working Machines

An executive’s guide to AI and Intelligent Automation

Working Machines eBook

Learn more

Data quality for AI: why quality in = quality out (and how to assess what you’re working with)

Why data quality is a business problem, not a technical one

The three-category sorting exercise

The four questions we ask about every data source

A note on sensitive data

Where to go from here

Mario Grunitz

How we built an AI guide for managers (and why implementation has nothing to do with tools)

The four structural AI challenges that determine whether implementation works

Tags

Working Machines