In early April 2026, Stella Laurenzo, a technical lead in AMD's AI group, published a detailed technical analysis on GitHub that shook the AI developer community. Her conclusion was simple and brutal: Anthropic's most advanced AI model had become dramatically worse — and she had the data to prove it.
The analysis was based on 6,852 Claude Code sessions, 234,760 tool calls and 17,871 thinking blocks, collected from January through April 2026. The numbers revealed a pattern that was hard to dismiss. The model's reasoning depth — measured as the median length of thinking tokens — had dropped by roughly 67 percent. The number of API calls per task had risen dramatically. The model had started asking "should I continue?" 173 times in 17 days, a behavioral shift that was zero before March 8. Self-contradictions had tripled.
The report drew more than 2,000 reactions on GitHub and spread rapidly via X, Reddit and tech media. The term that captured the mood was borrowed from the grocery aisle: AI shrinkflation. Same price, same packaging — but less inside.
What actually happened?
Anthropic's response was honest enough to acknowledge two concrete product changes, while disputing that the model itself had been degraded.
On February 9, adaptive thinking was introduced — a mechanism that lets the model decide how much reasoning to spend on each question, instead of using a fixed budget. On March 3, the default effort level was lowered to "medium" (level 85 out of 100). Boris Cherny, head of Claude Code, called it "the best balance of intelligence, latency and cost for most users."
Translated into plain language: Anthropic decided that the model should think less by default, in order to answer faster and cost less to run. Users who wanted full capacity could manually set /effort max — but most did not know the option existed.
The two changes in brief
February 9: Adaptive thinking is activated. The model decides its own reasoning depth per question, rather than using a fixed budget.
March 3: The default is lowered to "medium effort" (85/100). Anthropic calls it optimal balance. The developer community calls it a nerf.
March 26: Time-dependent rate limits are introduced — sessions are consumed faster during business hours (weekdays).
Cherny also disputed parts of Laurenzo's methodology. He argued that the observed drop in thinking length was partly due to redacted thinking data no longer being saved locally — a UI decision, not a capacity change. It was a technically correct point. But it did not explain why developers around the world experienced exactly the same thing: a model that felt shallower, less confident and more prone to half-hearted answers.
On Reddit, one user captured the frustration in a sentence that drew hundreds of upvotes: that Claude, for the first time in two years, didn't know it had a built-in Plan Mode. Not a subtle quality shift — a total feature blackout.
Shrinkflation as a business model
The term shrinkflation comes from consumer economics. When a chocolate bar shrinks from 200 to 180 grams while the price stays the same, that is shrinkflation. The phenomenon works because most consumers don't weigh their purchases. They react to the price, not the contents.
What is fascinating is that the SaaS industry has already normalized this pattern. A study from Vertice showed that more than a quarter of all SaaS contracts were affected by shrinkflation in 2023 — features that disappeared, pricing tiers that were eliminated, usage limits that were tightened, all while the invoice stayed the same or went up. AI services now add a new layer on top: the product isn't degraded by removing features, but by gradually thinning out the intelligence — the only thing you are actually paying for.
There is an important nuance between deliberate degradation and capacity optimization. Anthropic claims the latter. They say they found a "sweet spot" that works for the majority. This is probably true — most users ask simple questions where the difference between 85 percent and 100 percent effort is imperceptible. But for the professional user who pays precisely to access maximum capacity — the developer refactoring a codebase, the consultant building complex analytical models, the researcher who needs deep reasoning — the difference is dramatic.
And those are exactly the users who drive word-of-mouth, file GitHub issues and buy enterprise plans. Optimizing them away is like lowering the quality of business class to subsidize economy. It works exactly until business travelers find another airline.
The underlying tension
What makes this controversy structurally interesting — not just a customer service matter — is that it exposes a fundamental tension in the economics of the AI industry.
Training a frontier model costs hundreds of millions of dollars. That is a one-time cost. But running the model — inference, in industry speak — costs money for every single answer, every chain of thought, every token. The more the model thinks, the more GPU time it consumes, and the higher the variable cost.
Adaptive thinking is Anthropic's attempt to solve this equation. If the model can learn to use 30 percent of its capacity on easy questions and 100 percent only when truly needed, the average cost per answer drops sharply. It is rational. It is even elegant.
The problem is that the model does not always judge correctly which questions are easy. Laurenzo's data suggests that it systematically under-allocated resources on complex tasks — agentic workflows, multi-step coding, long sessions with dependent context. Exactly the kinds of tasks that professional users pay premium prices for.
On top of this come capacity limits not tied to the model but to the infrastructure. On March 26, Anthropic introduced time-dependent session management — during business hours (weekdays) sessions are consumed faster. In practice that is a price increase for anyone working daytime, hidden behind the same monthly fee. Anthropic confirmed that Team and Enterprise customers were not affected — which implicitly confirms that free users and Pro subscribers got reduced access.
History repeats itself
This is not the first time. The AI community has seen exactly the same pattern with GPT-4 (summer 2023) and with earlier versions of Claude. A model launches with impressive capability. Users adapt their workflows. Then, gradually and without explicit communication, the experience shifts. Answers get shorter. Reasoning gets shallower. The model starts to hesitate where it used to be confident.
The pattern is so consistent that it should be treated as a law of the AI industry rather than a series of isolated incidents. There is a structural explanation: AI companies launch models with settings that maximize quality, to win benchmarks and generate positive press. Then, as the user base grows and inference costs explode, the settings are dialed down. Not the model itself — but the configuration around it.
It is like buying a car that tested at 500 horsepower, only to have it software-updated down to 400 three months later. The engine is the same. The specification you paid for no longer applies.
What the BridgeBench controversy reveals
Parallel to Laurenzo's analysis, a benchmark-based accusation surfaced. BridgeMind published data claiming that Claude Opus 4.6's accuracy had fallen from 83.3 percent to 68.3 percent — a drop from rank 2 to rank 10 on their leaderboard.
This specific data point should be treated with great skepticism. Several independent reviewers pointed out that the BridgeBench comparison was methodologically questionable — it compared results across different test configurations, and the observed difference fell within normal statistical variation for non-deterministic AI models. One extra hallucination on a single task in a small sample can move the result substantially.
But the fact that this weak data point went viral — shared tens of thousands of times, cited by tech journalists, treated as evidence — says something important about the state of the relationship between AI companies and their users. When trust erodes, it takes only a spark to start a fire. And trust is eroding right now.
The price of transparency
The most remarkable aspect of the whole controversy is not that Anthropic made changes. It is that they did not communicate them.
Adaptive thinking rolled out on February 9. Effort level 85 became default on March 3. Time-dependent session handling was introduced on March 26. None of these changes were proactively communicated to users in a form that matched their impact. They appeared in developer documentation and release notes — not in an email to paying customers saying: "We have changed how your AI works. Here is why, and here is how to restore the previous behavior."
This is a communication failure bordering on disrespect. And it is particularly problematic for Anthropic, which builds its brand on responsible AI and transparency. If you market yourself as the ethical alternative in an industry full of cowboys, you also set a higher bar for your own openness.
Compare this to a situation where a manufacturer changes the recipe of a food product. There are legal requirements for the packaging to be updated. There are no equivalent requirements in the AI industry — but there should be, and companies like Anthropic should be leading that development rather than waiting until an AMD executive does their quality control for them on GitHub.
What it means in practice
I have 30 years of experience in industrial optimization and lean manufacturing. I have seen how measurement systems, process control and variation management decide whether a factory delivers world class or mediocrity. The same principles apply here, just applied to an entirely new type of product.
In manufacturing there is a concept called process capability. It measures whether a process consistently delivers within specified limits. A process with high capability delivers predictably. A process with low capability sometimes delivers well and sometimes poorly, and the customer never knows which one they will get.
AI models have exactly the same problem, but without the measurement infrastructure manufacturing has built up over decades. There is no public Cpk value for Claude Opus 4.6. There is no independent check verifying that the model you get on a Tuesday afternoon delivers the same quality as the one you got on a Sunday morning. And with time-dependent capacity management and adaptive thinking, there is now evidence that it actually does not.
For anyone integrating AI into professional workflows — and that is an increasing share of us — this is a fundamental risk factor. You build processes that presume a certain capacity level. If that level is lowered without warning, it's not just the single task that collapses — the entire workflow degrades.
The bigger picture
Step back and consider the whole thing through Mo Gawdat's lens of zero marginal cost. His thesis — which I have written about earlier on this site and which I personally can validate from my own AI productivity — is that AI drives the marginal cost of cognitive services toward zero. That holds at a macro level.
But AI shrinkflation exposes a friction in that thesis. Producing a single AI answer costs almost nothing compared to the equivalent human labor. But producing a good AI answer — one with deep reasoning, a long chain of thought and correct context handling — costs substantially more in compute resources than a shallow answer. The difference can be a factor of 10 or more in GPU time.
Zero marginal cost therefore applies to quantity, but not necessarily to quality. And that creates an economic incentive to deliver "good enough" rather than "the best possible" — exactly the dynamic we see in Anthropic's adaptive thinking.
This is a crucial insight for anyone planning their business around AI tools. The cheapest variant of AI will become almost free. But the best variant — the one that delivers genuine expert level — will remain a premium product with a price that reflects the compute resources it requires. The question is whether AI companies will be transparent about where that line is drawn, or whether they will keep selling premium and delivering standard.
The way forward
Three things need to happen.
First, transparency about configuration changes. Every time an AI provider changes default settings, resource allocation or capacity management in a way that affects output quality, it should be proactively communicated to paying customers. Not in a footnote in an API changelog — in an email, with an impact analysis and instructions for restoring the previous behavior.
Second, independent quality measurement. The AI industry needs an equivalent of manufacturing's quality auditors — independent actors that continuously measure and publish the actual delivery quality of the major models, not just at launch but over time. The BridgeBench fiasco shows that the community cannot do this on its own without rigorous methodology.
Third, an honest pricing conversation. If deep reasoning costs more to run, say so. Offer a pricing plan that explicitly guarantees full capacity, and a cheaper variant with adaptive thinking. Let the user choose — instead of making the choice for them and hoping they don't notice.
AI shrinkflation is not a bug. Nor is it a conspiracy. It is the logical consequence of an industry selling the promise of unlimited intelligence while wrestling with very limited GPU resources. The tension between those two realities will define the coming years — and the companies that solve it through transparency rather than by hoping nobody notices will win the long race.
I have spent my entire professional life in manufacturing. The one lesson that applies everywhere — from foundries to solar panel factories, from composite production to food — is this: always measure. Never trust the supplier to tell you when quality drops. That rule now applies to an entirely new kind of delivery.