Adopting AI is not a procurement decision, and treating it like one is why so many deployments post a clean number in the pilot and quietly rot the system around them within a year. Software you install; an organism you introduce. Software does what you configured it to do. An organism changes the selection pressures acting on everything else in its environment, and the second-order effects, not the first-order gains, are where the outcomes actually live.
I spend my research life on systems that evolve under pressure: microbial communities that reorganize when you change one nutrient, epigenetic regulation where a single perturbation propagates through a network you only partly understand. When I watch companies run "AI strategy," I keep seeing the same category error. They evaluate the tool. They should be modeling the ecosystem. The evaluation asks does this do the job well and what does it cost? The ecosystem asks what does this reward, and what will therefore grow? Those are different questions with different answers, and only the second one predicts what your org looks like in eighteen months.
Here is the single idea that does all the work, borrowed intact from population biology: every deployment makes some previously costly behavior cheap, and whatever gets cheap proliferates. That is selection pressure. You do not get to choose what proliferates. You only get to choose what you make cheap, and the rest follows from incentives you no longer control. The ROI spreadsheet measures the thing you intended to make cheap. It is structurally blind to everything else that got cheap at the same moment, which is precisely where the system reorganizes itself around you.
Cheap is a force, not a feature
Start with the mechanism, because the whole argument rests on it. When you lower the cost of a behavior, you are not adding a convenience. You are altering a cost gradient, and populations of people, teams, actions, incentives flow down cost gradients the way water does. A capability that makes X cheap does not simply give you more X at constant everything-else. It moves the bottleneck, and the system reorganizes around the new bottleneck whether or not anyone decided that it should.
Take the most-cited win: a coding assistant that makes generating code cheap. The pilot measures lines shipped per engineer and reports a gain. Generating code, though, was never the scarce resource. Reviewing, integrating, and being accountable for code was. Make production cheap and you have not sped up the pipeline; you have shifted the load onto the one stage that did not get cheaper. Pull requests multiply. Review queues lengthen. The senior engineers who were the actual constraint now spend their day adjudicating a larger volume of plausible-looking diffs, some of which are subtly wrong in ways that are expensive to catch precisely because they look right. The first-order metric, code produced, improved. The system-level throughput may not have moved at all, and the failure rate of what ships may have gone up. Nothing in the ROI model saw this, because the model priced the intended cheapening and ignored the induced one.
This is the general shape. Every AI deployment is a selection event. Three failure modes fall directly out of the biology, and each one is invisible to a first-order evaluation and obvious to an ecological one.
Failure mode one: monoculture fragility
Standardize every team onto one blessed AI workflow and you buy a real, measurable efficiency gain in the current environment, and you destroy the variation you would need to survive a change in that environment. This is the oldest lesson in agriculture, and companies re-learn it at enormous cost roughly once a generation.
The banana in your kitchen is a Cavendish. It replaced the Gros Michel, which was the global commercial banana until a soil fungus, Panama disease, wiped it out in the mid-twentieth century. The Gros Michel was not unlucky; it was a clone. A monoculture is genetically uniform by construction, which means a pathogen that defeats one plant defeats all of them simultaneously. The Cavendish was chosen because it resisted that particular strain, and it is now being killed, worldwide, by a newer variant of the same fungus, for the identical reason. The Irish Lumper potato in the 1840s was the same story with a million deaths attached. Uniformity is maximally efficient in the environment it was optimized for and maximally fragile to any environment it was not.
Biology's defense against this is diversity held as insurance you hope never to use. Your immune system maintains an absurd variety of MHC molecules and receptor configurations not because the average one is optimal but because the population needs to have some answer to a pathogen it has never seen. Variation looks like waste on every spreadsheet right up until it is the only thing standing between you and extinction.
Now translate. When you mandate that all twelve teams adopt the same prompt library, the same model, the same generation-and-review cadence, you have created a clonal population. In a stable domain that is the correct trade; efficiency is real and you should take it. If your domain shifts, though, if a competitor changes the game, a regulation lands, or the model provider changes behavior underneath you, every team fails in the same way at the same time, and you have no internal population that solved the problem differently to learn from. The teams you were tempted to shut down for doing it "inefficiently" were running the exploration that would have told you what to do next. Their inefficiency was the cost of your optionality. Monoculture does not just make you fragile; it removes the very signal that would warn you the environment has changed.
The discipline here is to price variation as insurance rather than treat it as slack. Ask what premium you are paying for standardization and what it buys you when, not if, the environment moves. Sometimes the answer is that standardization is right and you take the fragility knowingly. The failure is not standardizing. The failure is standardizing without ever having done that math.
Failure mode two: runaway feedback
The second failure mode appears wherever the output of the system re-enters as its own input, and it is the one that produces the most spectacular, hardest-to-reverse damage. Biology has a precise name for the dynamic: Fisherian runaway, the mechanism behind the peacock's tail.
The tail is a survival liability, metabolically expensive and a handicap against predators. It exists because two things reinforced each other: females preferred large tails, and the genes for large tails traveled alongside the genes for preferring them. Preference selected for the trait; the trait selected for the preference. Once a trait and the appetite for that trait sit in the same feedback loop, selection stops tracking the thing it was originally about, survival, and starts tracking itself, spiraling to an extreme that no external fitness pressure would ever have chosen. Zahavi's handicap principle explains why it doesn't collapse: the cost is the signal. But the direction of travel is set entirely by the closed loop, not by any outside optimum.
Engagement optimization is Fisherian runaway wearing a product-management hat. A recommender's output shapes what users see, which shapes what they click, which becomes the next batch of training data, which shapes the next output. The metric you are optimizing, engagement, is being fed by a process whose input is the metric itself. When the measure becomes the input to the process that generates the measure, you do not get a stable equilibrium. You get drift, and the drift accelerates. The system optimizes toward whatever the loop rewards, which decouples from whatever you actually wanted the moment the loop closed. Content ecosystems that optimized for engagement did not decide to promote outrage; the loop selected for it, because outrage was the peacock's tail: locally rewarded, globally corrosive, and impossible to stop once trait and preference were mutually reinforcing.
The AI-native version is sharper because the loop can close without a human in it at all. Train a model on data that increasingly contains the previous model's output and you get model collapse: variance shrinks, the distribution narrows, the tails of the real world disappear from the training signal, and the system becomes progressively more confident about a progressively less accurate picture. This is not a distant research curiosity. Any pipeline where generated content re-enters the corpus, where AI-scored outputs train the next scorer, where synthetic data supplements the real, is running some fraction of this dynamic right now.
The reason first-order metrics mislead so badly here is that the loop breaks the assumption every naive metric makes: that the thing you measured was generated independently of your measuring it. Once output feeds input, correlation and causation come apart at exactly the point where your dashboard is looking, which is the causal-inference problem hiding inside every AI business decision: the number went up, and you genuinely cannot tell from the number whether the underlying good thing went up or whether the loop simply learned to produce the number. Finding where the loop closes is not optional hygiene. It is the difference between a metric that means something and one that has quietly started measuring itself.
Failure mode three: parasitic load
Every productive niche gets colonized by organisms that extract from it without contributing, and the more productive the niche, the faster the colonization. This is not a moral failing of ecosystems; it is a structural feature. Energy concentrated anywhere is an invitation, and evolution is relentlessly good at accepting invitations.
Any capability that lowers the cost of producing something also lowers the cost of producing adversarial versions of it. Cheap generation invites cheap extraction as a matter of course, not as an unlucky edge case. Make it cheap to write plausible text and you have made it cheap to write spam, phishing, fake reviews, and content farms. Open a support channel to an LLM and you have opened it to prompt injection. Reward a behavior with a model in the loop and you have created a target for anyone who profits from gaming that model. The efficiency you introduced is a food source, and the adversary is the thing that evolves to eat it.
The mistake companies make is booking this as an incident: a surprise, a patch, a post-mortem. Biology books it as a fixed cost. Your immune system runs continuously and consumes real metabolic energy every hour of your life whether or not you are currently sick, because the correct response to a world full of parasites is not to react to each one but to pay a standing tax for the capacity to resist all of them. The metabolic cost of immunity is the price of being a productive body in a world that contains other organisms.
So the honest way to budget for a new AI capability is to reserve, up front, the detection, rate-limiting, and review capacity that the capability's own attractiveness will demand, sized to the value of the thing you just made cheap, because that value is exactly what sets the adversary's incentive to attack it. If you cannot afford the immune cost, you cannot afford the capability. Deploying the productive function while treating its parasitic load as a future surprise is like growing a body with no immune system and being astonished when something eats it.
Run this before the ROI math, not after
The reframing is only useful if it changes what you do on Monday, so here is the operational form. Before you build the ROI model for any AI deployment, because these questions determine which terms the model is even allowed to omit, run four questions borrowed straight from how you would assess introducing a species into an ecosystem.
- What behavior just got cheaper? Not "what did we automate" but what is now cheap to do that used to be expensive, including the things you did not intend. That is the behavior that will proliferate, and it is rarely the one on the slide.
- Where does the feedback loop close? Does the output re-enter as input, directly through data or indirectly through human behavior? If yes, your headline metric is a candidate for runaway, and you should assume it will drift toward whatever the loop rewards rather than what you meant.
- What variation did we just eliminate? What were the "inefficient" teams or methods actually exploring, and what is your plan for surviving an environment shift once you have made everyone identical? Price that variation as insurance before you cancel the policy.
- What adversary does the efficiency attract? Who profits from the new cheap thing, and what does the standing immune cost of holding this capability actually run? Book it as a fixed tax now, not an incident later.
None of these are answerable by evaluating the tool. They are answerable only by modeling the system the tool enters, which is the entire point, and the entire thing that "install and measure ROI" is built to skip.
There is a deeper reason companies get this wrong, and it is the same reason startups do: the constraint is almost never the raw capability. It is the organization's capacity to metabolize what the capability produces. Most teams that fail with AI do not fail from a shortage of model quality; they fail because they ingested a new source of cheap output faster than they could digest its second-order effects, which is the specific way most startups die of indigestion, not starvation. The gain was real. The system could not process what the gain did to it.
Treat your next AI deployment as an introduced organism and the whole risk profile becomes legible. Ask what it makes cheap, where the loop closes, what variation it kills, and what it feeds. The tool is the easy part. The ecosystem is the company, and it was already evolving before you added a new selection pressure and called it a rollout.