Isn't this an argument for heavy upfront schema design?

The opposite. It's an argument for concentrating your scarce design care on the two or three decisions whose reversal cost explodes with scale — core entity cardinality, whether identifiers encode meaning, and the sync-versus-async boundary — and moving fast, even sloppily, on everything else. Over-modeling every table upfront is its own failure mode; most schema is cheap to change later precisely because few features depend on it.

Can't you just fix a bad schema decision with a migration?

You can migrate the data. What you can't cheaply migrate is the accumulated application logic. Changing a boolean 'paid' flag into a ledger of payment events is a one-line schema change and a rewrite of every query, index, report, and code path that assumed payment was a single fact. The cost isn't in the DDL — it's in the N features that encoded the old assumption.

Tech & Product

Your Schema Is Your Strategy

Your database schema is a frozen set of assumptions about what your business is. Once thousands of features depend on them, they constrain strategy far more than your language or framework ever will.

By MehdiMay 6, 20269 min read

On this page

Why schema ossifies while code stays fluid
The three decisions that actually trap you
What a Stripe-shaped schema cannot represent
Price the reversal, not the decision

Your database schema constrains your strategy more than your programming language, your framework, or your cloud provider ever will. It is a set of assumptions about what your business is, written down in a form that thousands of later features will silently agree to depend on — and once they do, the assumptions stop being editable. You can rewrite a service over a weekend. You cannot change what a user is after four hundred features have all quietly agreed on it.

This is the decision that looks cheapest at the moment you make it and turns out to be the most expensive one in the company. A junior engineer defines the core tables in an afternoon, in a migration file, before there are any customers. Nobody reviews it the way they review a funding round or a hire, because it reads as plumbing. Yet that migration file is where you decided whether a customer can have more than one address, whether an order can be partially fulfilled, whether money is a state or an event. Those are not implementation details. They are the boundaries of what your business is allowed to become.

Why schema ossifies while code stays fluid

The usual mental model says code and data are both just things you can refactor. That model is wrong in a way that matters, and the reason is structural, not a question of discipline.

Code has encapsulation. A service sits behind an interface, and as long as the interface holds, you can gut and rewrite everything behind it without the rest of the system noticing. The blast radius of a change is contained by design. That containment is the entire point of an interface: it lets you be wrong locally and fix it locally.

A schema has no such boundary. It is a shared, globally mutable dependency that every part of the system reads and writes directly. Every query, every index, every foreign key, every report, every background job, every line of application logic that touches a row is coupled to the shape of that row. Nothing sits in between absorbing the change. So when you alter the core shape of an entity, the change does not propagate through one seam — it propagates through all of them at once. A schema migration that looks like a one-line DDL change is actually a request to rewrite every call site that ever assumed the old shape.

The asymmetry compounds with time. On day one, three features touch your orders table and the reversal cost is an afternoon. By the time you have product-market fit, three hundred features touch it, and the same change is a multi-quarter migration with dual writes, backfills, and a long tail of subtle bugs in the code paths nobody remembers to update. The schema decision did not get more wrong over time. It got more load-bearing. The cost of being wrong grew while the wrongness stayed constant.

The three decisions that actually trap you

Not all schema decisions are dangerous. Most are genuinely cheap to change, which is exactly why you should not agonize over them. The trap is a small number of decisions that look identical in cost to the cheap ones on day one and diverge violently by month eighteen. Three of them account for most of the damage I have seen.

The cardinality of your core entities. Does a user have exactly one account? One currency? One address? One phone number? One verified identity? Each of these is a one that you are casually asserting is not a many, and each assertion gets welded into a foreign key, a unique constraint, a UI that renders a single field, and a hundred queries that do WHERE user_id = ? and expect one row back. Widening a one to a many after the fact is one of the hardest changes in software, because the assumption of singularity is not stored in one place you can edit. It is smeared across every feature that ever read the field. Western SaaS schemas are full of these welded singularities — one account, one email as identity, one card on file — and they stay invisible until you try to serve a market that violates them.

Whether your identifiers encode meaning. The temptation to make an identifier "smart" — an order ID that embeds a region code and a sequence number, an email address used as a primary key, a composite natural key built from real-world attributes — is that it saves you a join and reads nicely in a log. The cost arrives when the real world changes and the identifier starts lying. The user changes their email; now your primary key is wrong everywhere it was foreign-keyed. You expand into a new country and the region prefix scheme runs out of room. A meaningful identifier is a bet that the meaning will never change, and in a growing company the meaning always changes. Opaque, meaningless keys are boring and they are almost always right, because they encode no assumption that reality can later falsify.

The sync-versus-async boundary. This is the subtlest and the most strategically loaded. For every operation, you are implicitly deciding whether it is an instantaneous state transition or a process with intermediate, observable states. Modeling a payment as a boolean paid column on the order is a bet that payment is a single fact that either happened or did not. Modeling it as a ledger of attempts and events is a bet that payment is a process. The two schemas cost the same to write. They do not cost the same to live with, and — the part technical founders miss — the choice determines what you can later ask about your own business. A schema that records payment as a boolean can never answer "how many attempts does collection take in this region," because it threw that information away at write time. The data model quietly deleted a class of questions before anyone thought to ask them. That is the same failure I have watched sink longitudinal studies in my research work: bind one methylation profile per patient in the schema, and you can never ask anything about change over time, because the ontology you chose has no slot for it. The shape of the data bounds the hypothesis space. This is not a metaphor; it is the same mechanism, and it is why the causal question hiding inside every business decision is usually foreclosed at the schema layer, long before anyone runs an analysis.

What a Stripe-shaped schema cannot represent

The clearest lesson I have on this came from building Kommerce, which runs cash-on-delivery commerce in emerging markets. If you model that business with the assumptions baked into a standard Western e-commerce schema, you do not get a slightly awkward fit. You get a set of real businesses that are literally unrepresentable in your data model.

Start with payment. In the Stripe-shaped world, payment is synchronous and settled at checkout: authorize, capture, done, and the order carries a clean paid state before it ever ships. In cash-on-delivery, the payment event happens days later, at the customer's doorstep, in physical currency, from the hand of a delivery agent who may or may not succeed. Payment is not a column. It is a first-class state in a lifecycle, and the lifecycle has branches a checkout-time model has no vocabulary for: the customer is not home on the first attempt, and the parcel goes out again tomorrow. The customer opens the box, takes three of the five items, and pays for three. The customer refuses delivery entirely and the goods come back to the warehouse as inventory that has to be reconciled. None of these are edge cases in that market. They are the median transaction.

A schema that stored payment as paid: boolean and fulfillment as shipped: boolean cannot hold any of this. There is no place to put "second delivery attempt, partial acceptance, collected 60% of the order value." You would discover this not at design time but eighteen months in, when a merchant asks why your reports show a delivered order as unpaid, and the honest answer is that your schema has no state for the thing that actually happened. By then the fix is not a migration. It is re-founding the data model under a live business.

Then there is identity. Western schemas assume one user resolves to one stable, verified identity — a real name, a validated email, a card that proves they are who they say. In markets where a meaningful share of customers order under a nickname, share a phone, and have no card, an identity model that assumes uniqueness and verification does not degrade gracefully. It rejects real customers as malformed data. The schema was not neutral. It encoded a specific society's assumptions about what a person is, and shipping it to a different society silently excluded the customers who did not fit.

The Western schema is not bad. It is a faithful compression of the business it was built for. That is the whole point: a schema is always a faithful compression of some business, and if you inherit one built for a different business, you have inherited a strategy you never chose. Your data model decided what you could sell, to whom, and how they were allowed to pay, and it decided it before your first customer.

Price the reversal, not the decision

Here is the operational rule I use, and it is deliberately narrow, because the failure mode on the other side is just as real. For any schema decision, do not ask "what is the right model." Ask: what would it cost to reverse this in eighteen months, once N features depend on it? That single question sorts every decision into two piles.

For the vast majority of the schema, the reversal cost is low, because few things will ever depend on it. Add the column, pick a reasonable type, move on, and do not hold a design review. Spending scarce design care here is not prudence; it is a way to lose. Over-modeling every table upfront, adding speculative flexibility for markets you may never enter, building the generic system before you have the specific one — this is the pattern that makes teams die of indigestion rather than starvation. The schema that tries to represent every possible business represents none of them well, and it drowns the team in complexity long before scale would have justified it.

The care goes to the short list where reversal cost explodes with N: the cardinality of your core entities, the meaning content of your identifiers, and the sync-versus-async boundaries around money, identity, and fulfillment. On those, and only those, you slow down. You ask what happens when the one becomes a many. You default to opaque keys and event ledgers even when a boolean would ship faster this week, because you are not optimizing this week — you are buying the option to be a different business in two years without a re-founding event.

The framework is fungible. The language is fungible. The schema is where you wrote down, in a form the whole company will come to obey, your theory of what you are. Get that theory wrong in the cheap places and nothing happens. Get it wrong in the load-bearing places and you will spend your best growth years fighting a decision an engineer made, unreviewed, before you had anything to lose.

Frequently asked questions

Isn't this an argument for heavy upfront schema design?: The opposite. It's an argument for concentrating your scarce design care on the two or three decisions whose reversal cost explodes with scale — core entity cardinality, whether identifiers encode meaning, and the sync-versus-async boundary — and moving fast, even sloppily, on everything else. Over-modeling every table upfront is its own failure mode; most schema is cheap to change later precisely because few features depend on it.
Can't you just fix a bad schema decision with a migration?: You can migrate the data. What you can't cheaply migrate is the accumulated application logic. Changing a boolean 'paid' flag into a ledger of payment events is a one-line schema change and a rewrite of every query, index, report, and code path that assumed payment was a single fact. The cost isn't in the DDL — it's in the N features that encoded the old assumption.
How is a schema decision different from any other reversible engineering choice?: Most code has an interface boundary that contains the blast radius of a change — you can rewrite a service behind a stable API. A schema is a shared, globally mutable dependency that every part of the system reads and writes directly. It has no encapsulation, so a change to its core shape propagates everywhere at once. That asymmetry is why schema ossifies while code stays fluid.

Filed under Tech & Product. Building things people trust, at the level of the details.

Essays like this, in your inbox.

Your Schema Is Your Strategy

Why schema ossifies while code stays fluid

The three decisions that actually trap you

What a Stripe-shaped schema cannot represent

Price the reversal, not the decision

Frequently asked questions

Your Product Needs to Be an Agent Skill, Not Just a Website

Getting Your MCP Connector Selected: Write for the Planner, Not the Buyer

An Agent Is Only as Good as Its Tools

Why schema ossifies while code stays fluid#

The three decisions that actually trap you#

What a Stripe-shaped schema cannot represent#

Price the reversal, not the decision#

Frequently asked questions

Keep reading

Your Product Needs to Be an Agent Skill, Not Just a Website

Getting Your MCP Connector Selected: Write for the Planner, Not the Buyer

An Agent Is Only as Good as Its Tools

Why schema ossifies while code stays fluid

The three decisions that actually trap you

What a Stripe-shaped schema cannot represent

Price the reversal, not the decision