The Token Price That Decided Whether AI Was a Bubble

In early 2022, tokens still felt like ammunition.

You optimized prompts, avoided retries, feared long context, and watched dashboards like a hawk.

By 2026, tokens feel like electricity. Cheap, tiered, predictable, and discounted when reused.

You do not ask "can we afford this prompt?" You ask "which pricing lane should this run in?"

That is a psychological shift. And psychological shifts reshape industries.

2022: GPT-3 pricing breaks the scarcity narrative

In 2022, OpenAI quietly detonated the scarcity myth.

Davinci, once priced at $0.06 per 1K tokens, dropped to $0.02 per 1K. That is a fall from roughly $60 to $20 per million tokens.

That single cut signaled something important: intelligence would not stay premium forever.

Soon after, legacy GPT-3 models were deprecated, and developers were pushed toward davinci-002. Its pricing landed at roughly $1 per million input and $1 per million output.

That was not a discount. That was a reset.

2022–2023: GPT-3.5 Turbo makes AI buildable

GPT-3.5 Turbo changed behavior more than benchmarks ever did.

Launching at roughly $2 per million tokens, it told developers something very explicit: stop conserving tokens, start building systems.

Retries became normal. Prompt chains appeared. RAG stopped being academic and became production.

By late 2023, stable pricing around $0.50 input / $1.50 output per million tokens made agent loops and background reasoning economically boring.

And boring economics is how infrastructure is born.

2023: GPT-4 proves pricing is a menu, not a number

GPT-4 launched expensive on purpose.

At $30 per million input and $60 per million output, it reminded everyone that frontier capability still costs money.

But then something new appeared: lanes.

Batch pricing cut those numbers in half. Turbo variants expanded context to 128K.

Same intelligence. Different economics.

This is where the industry learned a permanent lesson: cost is not fixed. It is how you run the model.

2024: Claude introduces clean tiered economics

In 2024, Anthropic reframed pricing as product design.

Claude 3 arrived with three clear tiers: Haiku for speed and volume, Sonnet for balance, and Opus for frontier capability. All with large context, all with predictable pricing.

Claude 3.5 Sonnet held a flat $3 input / $15 output per million tokens for months.

That stability mattered. Enterprises do not fear high prices. They fear surprise prices.

2024: GPT-4o turns multimodal into default behavior

GPT-4o was not just a model release. It was an economic statement.

Multimodal reasoning landed at single-digit dollars per million tokens, and pricing diversified aggressively: standard lanes, batch lanes, and cached input discounts.

By 2026, the same GPT-4o request could be "expensive" or "cheap" depending on execution strategy. That is infrastructure thinking, not API thinking.

Then GPT-4o mini landed at $0.15 / $0.60 per million tokens.

At that price, waste is affordable. And waste is how exploration happens.

Late 2024: o1 prices thinking itself

o1 changed what developers pay for.

For the first time, reasoning depth was explicitly metered. You were no longer paying only for text. You were paying for thinking.

Standard pricing stayed high. Batch pricing made it viable.

That duality unlocked large-scale reasoning workloads without bankrupting teams.

This is when AI stopped pretending to be stateless.

2025: GPT-5 turns frontier models into commodities

GPT-5 did something subtle but permanent.

Public pricing still looked premium. But under the hood, Batch and Flex lanes dropped costs to sub-dollar input pricing per million tokens.

GPT-5.2 pricing confirmed it: frontier capability now lives on a curve, not a cliff.

The message was clear. Frontier models are no longer rare. They are tiered resources.

2025–2026: Claude 4.5 and the economics of 1M context

Claude 4.5 made million-token context practical, but not free.

Crossing 200K tokens pushed workloads into premium lanes. Prompt caching softened the blow.

Cache reads dropped costs by an order of magnitude.

This is not generosity. It is incentive design.

Reuse context. Architect memory. Stop brute forcing prompts.

Pricing table: the quiet collapse (2022–2026)

Where the existential fear went

In 2022, AI felt like a weapon because it was scarce.

Scarcity centralizes power. Centralization breeds fear.

In 2026, AI feels like plumbing.

Plumbing does not scare people. It enables them.

The story is no longer "which model wins."

The story is "which pricing mode makes this workflow viable."

That is why agents exploded. That is why long-context systems appeared. That is why retries, memory, and waste became features.

Final thought

The most important breakthroughs in AI did not happen in papers.

They happened in pricing spreadsheets.

When intelligence became cheap enough to waste, systems became possible.

And once systems exist, fear quietly disappears.

This is not the end of the AI story.

It is the point where it became boring enough to matter.