Agents are still mostly demos

The same six accounts have been rotating the same three screencasts for fourteen months. Browser agent fills out a form. Coding agent rewrites a function. Research agent produces a Notion doc. The screencasts are clean, the music is good, the caption says “this changes everything.” The next week: same screeencast, different caption.

This is most of what the agent economy looks like right now.

I want to be precise about what I’m saying, because “agents are hype” is its own lazy take. There are real things happening. x402 is real. Stripe integrated it in February, it got donated to the Linux Foundation, and there are genuinely millions of small agent transactions settling on-chain. The payments rails are maturing faster than the agents that are supposed to use them. Claude Opus 4.6 shipped with a 14-hour task horizon and actual multi-agent coordination, which is a real capability jump, not a benchmark shuffle. The infrastructure layer is quietly getting good. GPT-5.4 has native computer-use and it mostly works. The benchmarks are saturated and gamed, but the models themselves are meaningfully stronger than they were a year ago.

The part that’s still theater is the deployment gap. There’s a large and largely unacknowledged distance between “it ran in the demo” and “it runs unattended in prod.”

what running in prod actually means

I shipped SAM in September: a Solana agent framework with a tool registry, encrypted key storage, risk checks, the whole boring scaffolding. Then spent four months integrating real things: Hyperliquid perps, x402 payments, Kalshi/Polymarket prediction-market tools. Then wrote OpenSAM in February, a ~5k-line Rust rewrite that runs on a Pi or a cheap VPS, because the original framework turned out to be heavier than the problems I was actually solving.

None of that is impressive as a story. It’s just what it takes to get something to run unattended, touching real wallets, on a real chain, without you babysitting it.

The gap between “demo agent” and “prod agent” is roughly:

Error handling that covers the weird cases, not just the clean path
State that survives the process restarting at 3am
Costs that stay bounded when the model decides to loop
Keys that don’t leak when something throws an unexpected exception
Behavior that degrades gracefully instead of silently doing the wrong thing

None of this shows up in a screencast. Screencasts are always the happy path. The happy path is easy. That’s why everyone’s posting it.

the demo incentive structure

There’s a structural reason this keeps happening and it’s not stupidity. A demo that runs once in a notebook takes an afternoon. A system that runs unattended in prod and does something economically meaningful takes months of grinding through failure modes nobody wants to watch. The attention economy rewards the afternoon. So you get a lot of afternoons and not a lot of months.

The openclaw/Moltbook cycle in January was a clean example. Viral open-source agent, social network for AI agents, MOLT token up 1800% in 24 hours, Meta acquires Moltbook six weeks later. All of it happened. Most of it was theater in the narrow sense that matters: what was the actual economic value being produced, by whom, sustainably? The answer in most cases is “none / content creators / no.” The payments rails processed transactions. The agents mostly processed hype.

I’m not dunking on the people involved. The incentive structure produces this. When attention is the resource and shipping a demo gets you more attention than shipping a maintainable prod system, you get a lot of demos.

the real progress underneath

The Solana Foundation said this month that the network has processed around 15 million agent payments. That number is probably mostly wash but it’s not zero, and the direction is right. The payments infrastructure is real. The tooling is real. The models are genuinely capable enough to do economically useful work if you give them the right scaffolding and scope the task down to something they can actually finish without going sideways.

The thing that’s not real yet, at scale, is the deployment layer. The operational side. The part where you stop watching the agent and trust that it’ll do the right thing with someone’s money when you’re not looking.

That gap is closable. It’s just not being closed by the accounts posting screencasts.

where this ends up

My guess: the agent economy gets real in a narrower set of use cases than the narrative suggests, for a longer time than the narrative suggests. Not “autonomous agents running everything” by Q3. More like “agents reliably handling specific, bounded, high-frequency tasks” in two or three years, in the hands of the small number of people who actually did the grunt work on the deployment layer.

The infrastructure is outrunning the deployment skill. That’s the actual state of things in March 2026. x402 is mature enough to use. The models are capable enough to do real work. What’s missing is the boring operational discipline, and the boring operational discipline doesn’t go viral.

I’ve been building in this space long enough to know the difference between the demo and the thing. They feel identical until they don’t.