Isn't a frontier model smart enough to reason through messy data the way a human does?

A human handling a contradiction escalates, asks a colleague, or checks a system of record they know to trust. An agent has none of that context unless it is encoded as machine-readable state. Pointed at contradictory records with no provenance, a stronger model does not resolve the ambiguity — it picks a plausible branch and executes confidently. Reasoning capacity cannot substitute for information the environment never made available.

What is the single fastest diagnostic for whether a process is agent-ready?

Could a competent new hire complete the task from your documented data and rules alone, with no tribal knowledge and nobody to ask? If yes, an agent has a real chance. If the new hire would need three Slack messages and a call to the one person who remembers why the exception exists, the agent fails in exactly the same places — just faster, and without flagging that it is guessing.

Does this mean model choice doesn't matter for enterprise agents?

It matters, but it is rarely the binding constraint. Above a capability threshold that current frontier models already clear for most enterprise tasks, the marginal reliability comes from the environment: canonical entities, provenance, access control, and machine-readable exception rules. Spending the next quarter benchmarking models while your customer records sit duplicated across four systems is optimizing the wrong variable.

Tech & Product

Your AI Agents Are Only as Good as Your Data Governance

Enterprises are re-running the RPA hype cycle with agents, and the thing that killed RPA — brittle integrations, dirty data, undocumented exceptions — is exactly what kills agents. The binding constraint is data legibility, not model quality.

By MehdiJune 18, 20267 min read

On this page

What actually killed RPA
An agent can only act on state it can see, trust, and interpret
The model makes the failure worse, not better
The diagnostic
Build for the worker that can't ask

Enterprises are re-running the RPA hype cycle, and most of them do not know it yet. The pitch is identical: point an intelligent automation layer at your existing systems, describe the process in plain terms, and watch the work happen without integration projects or headcount. Robotic process automation made that promise a decade ago and shattered on every schema change, every silent edge case, every field that meant one thing in the CRM and another in billing. Agents will shatter on the same rocks, for the same reason. What determines whether an agent works is not the model. It is whether the state it acts on is legible: visible, trustworthy, and interpretable without a human in the room to explain what the data actually means.

State the claim without hedging. For the large majority of enterprise processes people are trying to automate right now, model quality is not the binding constraint and has not been for a year. The binding constraint is data and state governance. A frontier model pointed at a data swamp does not produce insight. It produces confident nonsense, faster and at scale, with a fluency that makes the nonsense harder to catch.

What actually killed RPA

RPA did not fail because the software couldn't click buttons. It failed because it automated processes on top of legacy systems it did not understand, and those systems changed. A vendor renames a column. A downstream team adds a required field. An upstream form starts accepting a new date format. Each of these is trivial for a human operator, who absorbs the change without noticing they've done anything. Each one breaks a brittle bot that was pattern-matching against a screen or a fixed schema.

The deeper failure was exceptions. Every real business process is a happy path wrapped in a thick shell of undocumented special cases. This customer gets net-90 terms because of a handshake in 2019. That SKU ships from a different warehouse during Ramadan. Refunds above a threshold need a second approval, except for three enterprise accounts where they don't. None of this lives in a schema. It lives in the heads of the people who run the process, and it surfaces only when violated. RPA teams discovered that "automate this process" really meant "first, excavate and encode a decade of tribal knowledge nobody wrote down." The automation was the easy 20%. The excavation was the 80% that killed the budget.

Agents change the top of that stack and leave the bottom untouched. A large language model is dramatically better than a screen-scraping bot at handling surface variation — it reads a renamed column, tolerates a new date format, interprets a free-text note. That genuinely dissolves the brittleness that plagued RPA at the interface layer. It does nothing about the exceptions, the contradictions, and the missing provenance underneath. If anything it makes them more dangerous, because the agent papers over a gap with a plausible guess instead of failing loudly the way a broken bot did.

An agent can only act on state it can see, trust, and interpret

Break "legible state" into its three requirements, because each is a distinct failure mode.

See. The agent can only reason over data actually exposed to it — through a tool, an API, a document it can retrieve. The fact you need to resolve a case might live in an email thread, a Slack DM, a PDF attachment, or a senior person's memory. If it isn't in a channel the agent can query, it does not exist as far as the agent is concerned. Humans route around this constantly by asking someone. An agent asks the environment, and the environment answers only with what's been made queryable.

Trust. Seeing the data is not enough if the data is wrong or duplicated. Take the single most common enterprise reality: the same customer exists as four records across CRM, billing, support, and the warehouse, with three different addresses, two account statuses, and no field indicating which is authoritative. A human who's been on the team six months knows billing is the source of truth for account status and support is where the current address lives. That knowledge is provenance, and it is almost never encoded. The agent sees four contradictory records and has no basis to prefer one. Whatever it picks, it acts on with full confidence.

Interpret. Even clean, trusted data carries meaning the schema doesn't state. A field called status with value active means something specific that a human learned by absorption. Does active include accounts in a grace period after a failed payment? The column doesn't say. The person who runs collections knows. Unless that rule is machine-readable, the agent is interpreting a symbol whose semantics were never written down, and it fills the gap with the most statistically likely reading, which is not the same as the correct one.

This is why your schema is your strategy. The schema is not a technical artifact downstream of the real decisions. It is the encoding of what your business believes is true: what counts as an entity, what relationships are allowed, what states are legal. An agent inherits exactly that model of reality and no more. If your schema doesn't distinguish a grace-period account from an active one, your agent cannot either, no matter how capable the model behind it.

The model makes the failure worse, not better

Here is the part the "just wait for the next model" crowd misses. Increasing model capability, holding data legibility fixed, can degrade a system's real-world reliability. A weaker model faced with contradictory records is more likely to stall, error, or produce something obviously wrong that a reviewer catches. A stronger model synthesizes the contradiction into a smooth, confident, well-formatted answer that reads as correct and is not. The failure moves from visible to invisible.

In computational biology this is the batch-effect problem exactly. Run your samples across two sequencing batches, let a technical artifact correlate with the variable you care about, and a more powerful statistical method does not save you. It fits the artifact more precisely and hands you a more convincing false result. The sophistication of the estimator amplifies the confound instead of filtering it. Confounded data plus a stronger model equals a more dangerous wrong answer, whether the data is gene expression or duplicated customer records.

I've argued that an agent is only as good as its tools — the planner reads a name, a description, and an input schema, and its whole competence is bounded by how legible those tools make the world it acts in. Data governance is the same claim aimed one layer down. The tools expose the state; the governance determines whether that state is worth exposing. A perfectly described tool that returns duplicated, unprovenanced, semantically ambiguous rows gives the planner a clean window onto a swamp. The interface is legible. The reality behind it is not. Both layers have to be legible, or the agent is reasoning carefully over fiction.

The diagnostic

You don't need a maturity model or a consultant to know whether a process is agent-ready. Use the new-hire test. Take a competent person who just joined, hand them only your documented data and written rules, remove their ability to ask anyone anything, and see if they can finish the task. If a smart new employee cannot do the job from the documentation alone — because the real logic lives in someone's memory, because the systems contradict each other, because the exceptions were never written down — then an agent cannot either. The agent is precisely that new hire with no colleagues to ask, no institutional memory, and no instinct to escalate when something smells wrong. It hits the same gaps. It does not tell you it hit them. It produces output.

Most enterprise processes fail this test badly, and that failure is the actual work. Not model selection, not prompt engineering, not agent frameworks. The unglamorous list is the whole game: canonical entities, so "customer" resolves to one thing; provenance, so the agent knows which system is authoritative for which field; access control, so it sees what it should and nothing it shouldn't; and machine-readable exception rules, so the decade of handshake deals and seasonal quirks lives in a form the agent can read instead of in the heads of people who will be on vacation when the agent runs.

Which is why the data-governance function, long treated as a compliance cost center, is about to become the constraint that separates companies that get value from agents from companies that generate confident nonsense at scale. The teams that spent years on canonical data models and provenance and clean access boundaries were building the substrate agents require, before anyone called it that. The teams that treated governance as paperwork are about to automate their mess and call it transformation.

Build for the worker that can't ask

Stop asking whether the model is good enough. It is. Frontier models cleared the enterprise-task bar a while ago for most of what people actually want automated. Start asking whether your state is legible enough for a diligent worker who cannot ask a single question. That worker is coming, at volume, and it will act on exactly what you've made visible, trustworthy, and interpretable — no more. Everything else it will confidently invent.

RPA promised to automate the process and forgot the process was 80% undocumented exception. Agents are being sold the same promise, to the same organizations, with the same swamp underneath. The winners this cycle did the boring work first, and every one of them will tell you the boring work was the strategy.

Frequently asked questions

Isn't a frontier model smart enough to reason through messy data the way a human does?: A human handling a contradiction escalates, asks a colleague, or checks a system of record they know to trust. An agent has none of that context unless it is encoded as machine-readable state. Pointed at contradictory records with no provenance, a stronger model does not resolve the ambiguity — it picks a plausible branch and executes confidently. Reasoning capacity cannot substitute for information the environment never made available.
What is the single fastest diagnostic for whether a process is agent-ready?: Could a competent new hire complete the task from your documented data and rules alone, with no tribal knowledge and nobody to ask? If yes, an agent has a real chance. If the new hire would need three Slack messages and a call to the one person who remembers why the exception exists, the agent fails in exactly the same places — just faster, and without flagging that it is guessing.
Does this mean model choice doesn't matter for enterprise agents?: It matters, but it is rarely the binding constraint. Above a capability threshold that current frontier models already clear for most enterprise tasks, the marginal reliability comes from the environment: canonical entities, provenance, access control, and machine-readable exception rules. Spending the next quarter benchmarking models while your customer records sit duplicated across four systems is optimizing the wrong variable.

Filed under Tech & Product. Building things people trust, at the level of the details.

Essays like this, in your inbox.

Your AI Agents Are Only as Good as Your Data Governance

What actually killed RPA

An agent can only act on state it can see, trust, and interpret

The model makes the failure worse, not better

The diagnostic

Build for the worker that can't ask

Frequently asked questions

Your Product Needs to Be an Agent Skill, Not Just a Website

Getting Your MCP Connector Selected: Write for the Planner, Not the Buyer

An Agent Is Only as Good as Its Tools

What actually killed RPA#

An agent can only act on state it can see, trust, and interpret#

The model makes the failure worse, not better#

The diagnostic#

Build for the worker that can't ask#

Frequently asked questions

Keep reading

Your Product Needs to Be an Agent Skill, Not Just a Website

Getting Your MCP Connector Selected: Write for the Planner, Not the Buyer

An Agent Is Only as Good as Its Tools

What actually killed RPA

An agent can only act on state it can see, trust, and interpret

The model makes the failure worse, not better

The diagnostic

Build for the worker that can't ask