Prompt engineering was never the point

There was a whole industry for a minute around being good at prompts. Courses, newsletters, job titles. “Prompt engineer” made it onto LinkedIn and everything. The thing is, the people who were actually getting results out of these models weren’t spending time on prompt craft. They were spending time on plumbing.

The prompt is the last ten feet of a very long pipe. What goes in the other end of that pipe is the actual work.

what the model actually needs

A language model doesn’t care how elegantly you phrased your question. It cares what information is sitting in its context window when it tries to answer. Everything else is decoration.

When I’ve built agent tools, the part that takes time isn’t figuring out how to “instruct” the model. It was deciding what to feed it. You want an agent that does something useful with a Solana wallet? The model needs the wallet state, the token balances, the recent transaction history, the current price feeds, maybe some relevant documentation. You assemble all of that before the model touches it, and suddenly a fairly generic prompt works fine. You skip that assembly step and write an elaborate, finely-tuned instruction instead, and the model hallucinates confidently into the void.

This is not a new insight. People who built search-augmented systems figured this out fast. RAG exists because “ask the model to remember everything” doesn’t scale, so you retrieve the relevant chunks at query time and hand them over. The prompt is almost beside the point. What matters is the retrieval quality and what you decided to put in your knowledge base.

The same logic runs through every real agent system. MCP is basically a standardized protocol for “here is how tools hand context to the model.” Claude Code works because it pulls in the right files, the right error output, the right diff before it starts reasoning. GPT-4.1’s extended context window matters because it lets you stuff more relevant state in without truncating. Gemini 2.5’s long context isn’t interesting because you can write longer prompts. It’s interesting because you can hold more of a codebase in view at once. The research keeps showing that what you put in front of the model is the dominant variable.

the actual work

Context architecture, as a practice, is mostly unglamorous. It’s deciding what to index and how. It’s chunking documents in ways that preserve meaning instead of cutting mid-sentence. It’s writing retrieval logic that returns the three most relevant things instead of the ten nearest neighbors. It’s figuring out when to include tool outputs verbatim and when to summarize them first. It’s managing what stays in session state and what gets evicted.

None of that fits on a course landing page. You can’t sell “careful database schema decisions” the same way you can sell “unlock GPT-4 with these 7 prompting secrets.” So the market sold the prompts, and the people quietly building the retrieval layers kept their heads down and shipped working systems.

The agents that actually run in production right now, the ones doing something more than a demo, are all context assembly problems at their core. You have a query. You have a bunch of potentially relevant information across memory, tool results, conversation history, external knowledge. You need to pull the right subset and fit it in the window before the model responds. Getting that assembly right is the engineering problem. Tweaking the system prompt wording is not.

I’ve watched this play out in my own agent experiments. When a tool call returns garbage, it’s almost never because the model misunderstood the instruction. It’s because the context it was working from was incomplete or wrong. Fix the context, the model fixes itself. Spend another hour on the prompt phrasing, and you get the same garbage, slightly more confidently stated.

DeepSeek showed that you could train models that were lean and sharp without being buried in alignment overhead. Grok 4 showed you could get reasoning that actually uses context well rather than pattern-matching on surface form. The direction is models that are better at using whatever you give them, not models that require you to phrase your request in the magic way. That’s the bet that seems to be paying off.

So the job title that’s going to age badly is the one whose whole value prop was phrasing. The job that’s going to matter is the one that understands retrieval, state, tool output design, context window management, what to cache, what to discard. System design with a model in the middle of it. The title was never the point.