Every other week, someone shows me their new “AI agent.” Nine times out of ten, it’s a prompt that calls an API and returns the result. No planning, no state, no failure recovery. Just a glorified function call with an LLM in the middle.
That’s not an agent. That’s generative AI with extra steps.
The distinction matters more than you’d think. Architecture, failure modes, testing strategy, cost profile: they’re all fundamentally different between generative and agentic systems. After building both (shipping MCP servers, debugging agent loops that burned through tokens at 3 AM, and learning when a structured prompt would have been enough), I’ve landed on a framework I wish I’d had from the start.
What Generative AI Actually Is
Generative AI is a request-response system. You send a prompt, you get an output. Text, code, images, structured data. Then the system stops and waits for the next request.
Three properties define generative AI:
- Stateless per interaction. Each request is independent. The system doesn’t remember what it did five minutes ago unless you explicitly pass context back in.
- Reactive. It never initiates. A human decides when to invoke it, what to ask, and what to do with the output.
- Single-turn reasoning. It processes one request at a time. Even multi-turn conversations are just a series of independent completions with accumulated context.
Here’s the nuance that trips people up: techniques like RAG, chain-of-thought prompting, and structured output are powerful, but they don’t make a system agentic. If a human still decides what happens after every response, it’s generative. A RAG pipeline that retrieves documents and generates an answer is generative. A chain-of-thought prompt that reasons step by step is generative. The sophistication of the reasoning doesn’t change the interaction pattern.
For a deeper comparison of these techniques and when to use each, see RAG vs Fine-Tuning vs Prompting: A Decision Framework.
What Makes a System Actually Agentic
If generative AI is “you ask, it answers,” agentic AI is “you set a goal, it figures out how to get there.” But that’s too vague to be useful for architecture decisions. Here are four concrete markers I use:
1. Goal orientation. The system pursues an objective across multiple steps. It decides what to do next based on where it is relative to the goal, not because a human told it the next step.
2. Autonomous tool use. The system selects and invokes tools on its own. Not “here’s a function, call it when I say so” but “here are your capabilities, figure out which ones you need.” This is the core architectural difference. The model decides when and how to use its tools. For how tool integration actually works under the hood, see MCP vs Function Calling.
3. Multi-step reasoning with state. The system maintains context across steps, uses intermediate results to inform the next action, and can backtrack when a step fails. A generative system processes each request in isolation. An agentic system remembers that step 2 returned an error and adjusts step 3 accordingly.
4. Failure recovery. This is the marker most “agents” fail on. A real agentic system detects when something goes wrong and has a strategy: retry with different parameters, fall back to an alternative approach, or escalate to a human. If your system just returns an error message when an API call fails, it’s not agentic.
I once had an agent loop retry a failing API call indefinitely because the error handler treated a 401 authentication failure as a transient network issue. It burned through tokens for 20 minutes before we killed it. That’s when I realized “retry logic” isn’t a feature. It’s a design responsibility. You need to classify failures, not just catch them. See AI Agent Error Handling Patterns for what real failure recovery looks like in production.
The Spectrum, Not a Binary
In practice, most systems sit somewhere between purely generative and fully agentic. Think of it as a spectrum:
Prompt Tool-augmented Autonomous
response + human gate agent loop
|______________|__________________|
Stateless Session state Durable state
Reactive Semi-autonomous Goal-driven
No tools Human-gated tools Autonomous tools
Simple Moderate High complexity
A chatbot with function calling that waits for the user to confirm each action sits in the middle. A system that autonomously plans, executes, recovers from errors, and reports back when done sits on the right.
The key insight: moving right increases capability but also increases operational complexity. Every step rightward means more failure modes, harder testing, higher cost, and more infrastructure for observability. Don’t move right unless the problem actually demands it.
And the cost isn’t abstract. Agent loops multiply token usage. Each planning step, each tool call, each retry is another model invocation. A task that costs $0.01 as a single prompt can quietly become $0.50 as an agent loop with retries. The autonomy isn’t free.
When You Don’t Need an Agent
Most tasks don’t need an agent. Here are the signals that generative AI is the right call:
- Single reasoning step. The task can be completed in one prompt-response cycle, even if that cycle uses RAG or chain-of-thought.
- Human reviews every output. If someone checks the result before it’s used, you don’t need autonomous execution.
- Narrow, well-defined domain. Summarization, classification, extraction, translation. The input-output mapping is clear.
- Predictability matters more than autonomy. In regulated environments or customer-facing systems, deterministic behavior beats creative problem-solving.
- Your team lacks agent infrastructure. If you don’t have observability, structured error handling, and a testing strategy for non-deterministic systems, an agent will be a liability.
I learned this the hard way. Early on, I built an agentic loop for a task that involved pulling data from an API, transforming it, and generating a report. The agent would decide which endpoints to call, handle pagination, and assemble the final output. It worked, but it was fragile, expensive, and hard to debug. A structured prompt with RAG over pre-fetched data handled the same task with a fraction of the complexity and near-perfect reliability. The “agent” was solving a problem that didn’t need autonomous decision-making.
Before going agentic, read Why AI Agents Fail in Production for a reality check on what can go wrong.
When You Actually Need an Agent
Some tasks genuinely require agentic architecture. The signals:
- Multiple tool calls with branching logic. The system needs to call tool A, check the result, then decide between tool B or tool C. A human can’t gate every decision without destroying throughput.
- Intermediate results change the plan. What the system discovers in step 2 changes what it should do in step 3. This is fundamentally different from a pipeline where every step is predetermined.
- Unattended operation. The system needs to run overnight, handle edge cases, and have work ready by morning. No human in the loop is the whole point.
- Partial failure recovery. If step 4 of 7 fails, the system should retry or work around it, not throw away the results from steps 1 through 3.
- Multi-system workflows. The task spans multiple APIs, databases, or services. Coordinating these interactions requires a system that can maintain state and adapt.
I’ve seen this play out when building agents that interact with external APIs. The model needs to discover available endpoints, handle authentication, navigate pagination, and adapt when responses don’t match expectations. That’s genuine multi-step reasoning with state. A static prompt can’t handle it because you don’t know the API surface in advance. For what happens when you give an agent that kind of freedom, see I Gave My AI Agent Full API Access. For a practical guide to building one, see I Built My First AI Agent with the Strands Agents SDK.
Decision Framework
When you’re deciding between generative and agentic, run through these questions:
- Can the task complete in one reasoning step? If yes, generative.
- Does a human review every output before it’s used? If yes, generative.
- Does the task require multi-step tool use with branching? If yes, agentic.
- Must the system operate unattended? If yes, agentic.
- Can your team invest in observability, testing, and error handling for a non-deterministic system? If not, stay generative.
- Is the cost and latency of an agent loop justified by the value it delivers? If not, stay generative.
Notice the asymmetry. Four of six questions push you toward generative. That’s deliberate. Agentic architecture should be a conscious decision driven by real requirements, not a default.
At a Glance
| Dimension | Generative AI | Agentic AI |
|---|---|---|
| Interaction | Request-response | Goal-driven loop |
| Tool use | Optional, human-gated | Autonomous |
| State | Stateless / session | Durable, multi-step |
| Failure handling | Return error | Detect, retry, escalate |
| Testing | Output assertions | Three-layer pyramid |
| Operational cost | Low | High |
| Best for | Content, analysis, single-step tasks | Workflows, automation, multi-system |
Key Takeaways
- Generative = request-response. Agentic = goal-driven autonomy. Most systems sit somewhere between the two, and that’s fine.
- Four markers define agentic systems: goal orientation, autonomous tool use, multi-step state, and failure recovery.
- Don’t go agentic unless the task demands it. The operational overhead is real: harder testing, more failure modes, higher cost.
- If you do go agentic, invest equally in reliability. Observability, error handling, and testing aren’t optional. They’re the difference between a demo and a production system.
- “Agent” is an architecture decision, not a marketing label. Choose it based on what your system needs to do, not what sounds impressive.
Frequently Asked Questions
What is agentic AI?
Agentic AI refers to systems that pursue goals autonomously across multiple steps. Unlike generative AI, which responds to a single prompt and stops, an agentic system decides what to do next, selects and invokes tools on its own, maintains state across steps, and recovers from failures. The key difference is initiative: an agent acts, not just responds.
Is ChatGPT an AI agent?
Base ChatGPT is generative AI. You prompt it, it responds, it waits. However, when ChatGPT uses plugins or browsing to autonomously search the web, read results, and decide whether to search again, it’s exhibiting agentic behavior. The same model can power both patterns. What matters is the interaction pattern, not the model.
What makes an AI system an agent?
Four markers: goal orientation (pursues an objective across steps), autonomous tool use (selects tools without human approval per step), multi-step reasoning with state (remembers previous results and adjusts), and failure recovery (detects errors and has a strategy beyond returning an error message). If your system doesn’t have all four, it’s somewhere on the spectrum between generative and fully agentic.
Discussion
Comments are powered by GitHub Discussions. Sign in with GitHub to join the conversation.