Jam with AI
When to Use MCP vs API vs Function/Tool Calling in Your AI Agent — A Decision Framework
Shirin Khosravi Jam & Shantanu Ladhwe
Apr 23, 2026
When to Use MCP vs API vs Function/Tool Calling in Your AI Agent — A Decision Framework
Source: Jam with AI · Authors: Shirin Khosravi Jam & Shantanu Ladhwe · Date: 2026-04-23 · Original article
If you've been building AI agents, you've probably hit this exact confusion: should I make a direct API call, use function/tool calling, or stand up an MCP server? The internet treats this as a holy war — some people talk about MCP like it's the second coming, others call it overengineered glue, and a growing 2026 crowd insists CLIs beat both.
The honest answer is: they're all right, and they're all wrong. The choice depends on where you are in your project, how many integrations you need, and what you're optimizing for. There's plenty of content explaining what each pattern does. What's missing — and what this piece tries to provide — is a clear framework for choosing between them based on real constraints: project stage, integration count, team size, compliance needs.
First, Get the Definitions Straight
The three patterns get conflated all the time because they aren't really three separate things. They're layers that build on each other.
Pattern 1: Direct API Calls
The foundation. Your agent (or just your code) makes an HTTP request directly to an external service — REST, GraphQL, gRPC, whatever. You write the endpoint, handle auth, parse the response, manage errors. The LLM has nothing to do with it. Your application code owns the entire flow.
If your backend needs to fetch a user profile from your database, it doesn't need an LLM to decide that. It just calls the endpoint.
Pattern 2: Function Calling / Tool Use
This is where the LLM enters. OpenAI popularized it in mid-2023 with their Chat Completions update. The mental model: tool calling is essentially wrapping an API in a structured schema that an LLM can understand and invoke.
You define a set of functions (tools) as JSON schemas — names, parameter types, descriptions — and send them alongside your prompt. The model decides which function to invoke based on the user's query and outputs structured JSON with the function name and arguments. Your application then executes the actual API call and feeds the result back.
The critical nuance: the LLM does not execute anything. It generates a structured request. Your code still makes the same API call you would have made in Pattern 1. Tool calling just adds an intelligent routing layer on top — the LLM becomes the decision-maker for which API to call and with what parameters.
Every major LLM provider supports this, but each implements it differently. OpenAI, Anthropic, and Google all have their own schemas and invocation patterns. If you want to switch from GPT to Claude, you're rewriting tool definitions. If you want the same tool in two agents on two different providers, you're maintaining two separate implementations.
That fragmentation is exactly why MCP exists.
Pattern 3: MCP (Model Context Protocol)
MCP is the standardization layer on top of tool calling. It doesn't replace function calling — it standardizes it. Just as REST standardized web service communication and Docker standardized deployments, MCP standardizes how any AI model connects to any external tool.
Anthropic introduced MCP in November 2024 as an open standard. It uses a client-server architecture where MCP servers expose tools, resources, and prompts through a consistent interface. Any MCP-compatible client (Claude Desktop, Cursor, ChatGPT, your custom agent) can connect to any MCP server and discover available capabilities at runtime.
The three things MCP adds on top of basic tool calling:
- Dynamic discovery. With function calling, tools are typically hardcoded into your prompt. With MCP, the agent queries the server at runtime to learn what's available. New tools appear without code changes on the client side.
- Model-agnostic portability. Build an MCP server once, and it works with Claude, GPT, Gemini, Llama, or any model that supports MCP. No more separate tool definitions per provider.
- Credential isolation. The server handles auth and execution. The AI agent never touches your API keys — credentials stay behind the server boundary.
A subtle but important point: an MCP server doesn't have to wrap an existing API. It can be a self-contained service that does the work directly — reading files from disk, querying a database, running computations. In that case, the MCP server is the service. There's no REST endpoint underneath. You're building a small, purpose-built microservice that speaks MCP natively.
How They Relate
These three patterns usually layer on top of each other:
- APIs provide the raw capability (an HTTP endpoint that does something).
- Tool calling wraps that capability in a schema so an LLM can decide when and how to invoke it.
- MCP standardizes that schema so any LLM can discover and call any tool, regardless of provider.
This changes the design decision. You're not always picking between "direct API call vs. tool calling vs. MCP." Sometimes the real question is: should I build a traditional API and then wrap it, or should I build an MCP server from the start and skip the API layer entirely?
Bonus Pattern: CLIs
The contrarian take gaining serious traction in 2026: instead of wrapping tools behind MCP or function schemas, just let the agent call CLI tools directly through the shell. gh pr list, docker build, kubectl get pods.
LLMs were trained on enormous amounts of CLI documentation and usage, so they already know how to use these tools without any schema you provide. Andrej Karpathy captured the irony in a February 2026 post: CLIs are exciting precisely because they're a "legacy" technology — that's exactly why AI agents can natively and easily use them.
Peter Steinberger, creator of OpenClaw (250K+ GitHub stars, one of the fastest-growing open-source projects in GitHub history), built his entire agent framework around this idea. OpenClaw doesn't use MCP in its core architecture; it relies on a skills-and-CLI approach instead.
CLIs aren't competing head-on with MCP or function calling. They operate in a narrower lane: tools that already have mature command-line interfaces, where the model's training data is the documentation. That lane is surprisingly wide (git, docker, kubectl, cloud CLIs, database CLIs), but it has edges. Custom internal tools, SaaS platforms without CLIs, anything requiring structured output the CLI can't produce cleanly — those still need the other patterns.
How These Patterns Work Together in Practice
The framework below gives you guided decision logic, but in practice nobody picks just one pattern. A single agent might shell out to a CLI, call two MCP servers, run internal validation through function calling, and hit a direct API for a status check — all in one user request.
- CLIs for tools the model already knows (git, docker, curl, gh).
- Function calling for app-specific logic, validation gates, and routing decisions inside a single agent.
- MCP servers for reusable integrations shared across multiple agents/products (Slack, databases, ticketing, cloud), or as standalone services that expose custom capabilities directly.
- Direct API calls for deterministic operations that don't need LLM reasoning.
A Concrete Example: Same Task, Three Patterns
Imagine you want your AI agent to create a GitHub issue.
Pattern 1 — Direct API Call. Your code directly calls:
POST https://api.github.com/repos/{owner}/{repo}/issues
with a JSON body. No LLM. You hardcode when this happens (maybe after a test failure). You handle the Bearer token, the headers, the error codes. Simple, predictable, fast.
Pattern 2 — Tool Calling. You define a tool schema:
create_github_issue(repo: string, title: string, body: string, labels: list)
You send this schema to the LLM alongside the user's message. When the user says "file a bug for the login crash," the model outputs:
{
"tool": "create_github_issue",
"args": {"repo": "myapp", "title": "Login crash on iOS", ...}
}
Your code catches that, makes the same GitHub API call from Pattern 1, and sends the result back to the model. The LLM decided what to do. Your code decided how to do it.
Pattern 3 — MCP. You connect your agent to the GitHub MCP server. The agent discovers that create_issue, list_pulls, search_code, and dozens of other tools exist. The user says "file a bug for the login crash." The agent picks create_issue, the MCP server executes the API call with its own stored GitHub token, and returns the result. You wrote zero GitHub-specific code on the client side. If tomorrow you also want to support GitLab, you connect a GitLab MCP server — zero changes to your agent code.
The tradeoff is clear: each pattern adds a layer of abstraction and a layer of capability. The question is whether you need that capability for your specific use case.
The Complexity Spectrum
How these patterns actually stack up across the dimensions that matter in production.
1. Setup Complexity
- Direct API calls — simplest to start. Write an HTTP request, handle the response, done. No protocol overhead, no server, no schema. But the simplicity is front-loaded — maintenance grows linearly with every new integration.
- Function calling — adds a layer. JSON schemas for every tool, code to handle structured output, execute, and feed results back. All in your application code, no external infrastructure. Manageable for 3–5 tools.
- MCP — most upfront investment. You need an MCP server (build or configure one), implement JSON-RPC 2.0 communication, and manage the client-server connection. You're running at least two processes. But once the infrastructure exists, adding new tools becomes trivial — add them to the server once and every connected client picks them up.
2. Maintenance Burden
This is where the calculus flips.
- Direct API calls — highest long-term maintenance. Every API change, every new auth flow, every rate limit update means manual code changes. Ten integrations means ten custom integrations to maintain.
- Function calling — moderate. You're managing a library of tool schemas plus execution code per tool. Schemas are tightly coupled to your app, and supporting multiple LLM providers means maintaining different schema formats.
- MCP — lowest at scale. Update the MCP server once and every agent using it gets the update. The protocol handles auth, error responses, and discovery in a standardized way. The tradeoff: you now have infrastructure to maintain (the server itself) — but that's one piece of infrastructure versus N custom integrations.
3. Security Model
- Direct API calls — credentials live in your application. Full control, full responsibility.
- Function calling — same as direct API calls. Keys live in your main app as env vars or config. The model never sees them, but they're all in one place.
- MCP — credential isolation is architectural. Credentials live on the MCP server. The agent never sees raw API keys. The server enforces permission checks; the client only receives results. This is a genuine advantage for sensitive data.
4. Performance and Token Cost
- Direct API calls — fastest. No reasoning layer, no protocol overhead. Predictable latency.
- Function calling — adds the LLM reasoning step (model decides what to call), but execution is still direct. Tool definitions consume input tokens, but for a small toolset that's negligible.
- MCP — adds protocol overhead on top of everything. JSON-RPC, tool discovery, session management. For time-sensitive workflows (monitoring stock prices, IoT sensors, real-time analytics), direct API calls are significantly more predictable.
The bigger concern with MCP is the context window tax. Every MCP tool definition gets loaded into the model's context. GitHub's MCP server alone can consume 40,000–55,000 tokens just for its tool definitions. A typical multi-server setup (GitHub + database + filesystem + Slack) can eat 75,000+ tokens in overhead alone. On a 200K context window, that's over a third of your capacity gone before the agent does anything useful.
That's why the emerging best practice is aggressive curation: don't auto-convert your OpenAPI spec into an MCP server. Abstract low-level endpoints into high-level, task-oriented tools. Fewer tools, better descriptions.
The Decision Framework: What to Use and When
A guide that maps each pattern to project stage, integration count, and primary constraint.
Stage 1 — Proof of Concept / MVP
Use: Function Calling. You're validating an idea, you have 2–5 tools, you need speed. Don't introduce MCP infrastructure here — you don't yet know which tools your agent actually needs or how users will interact with them. Pay attention to which tools get used heavily during prototyping; that data will inform later architecture decisions.
Stage 2 — Single-Product Agent (Production)
Use: Function Calling + Direct API Calls. You've validated the concept. You're shipping a product with 5–10 tools, a stable integration set, and endpoints you control. Function calling gives you full ownership of auth, execution, and error handling. When something breaks, you copy the curl command, run it, check the response — debugging is straightforward.
Use direct API calls for any deterministic operation where no LLM reasoning is needed: fetching a user profile, checking an order status, running a scheduled export. If your backend can handle it in 5ms, don't route it through an agent.
MCP at this stage is typically overhead. You're one team, one product, one model provider. The N×M problem doesn't exist yet.
Stage 3 — Multi-Agent / Multi-Product Platform
Use: MCP + Function Calling (Hybrid). Multiple agents or products need the same integrations. You're supporting multiple LLM providers. The toolset is growing beyond what one team can maintain inline.
Build the Slack integration once as an MCP server, and every agent across your organization can use it. No duplicated credentials, no syncing tool definitions across codebases. Update the schema in one place; every connected client picks it up.
But keep function calling for app-specific logic: routing decisions, validation gates, business rules that only make sense inside one particular agent. MCP servers handle "what can I connect to," function calling handles "what should this specific agent do."
At enterprise scale — hundreds of customers each with their own Salesforce, GitHub, or Gmail credentials, plus SOC 2 / GDPR / HIPAA compliance — add an MCP Gateway between your agents and MCP servers. The gateway centralizes authentication, authorization, auditing, and traffic management.
Common Mistakes Teams Make
- Using MCP for everything. It's powerful but not free. Every tool definition costs context tokens; every server adds infrastructure. If your agent only calls two APIs, function calling is simpler, cheaper, and easier to debug.
- Ignoring the context window tax. A typical multi-server MCP setup can consume 75,000+ tokens in tool definitions alone. Be deliberate about which servers you connect. Cursor enforces a hard limit of 40 tools because quality degrades dramatically beyond that threshold.
- Wrapping every API endpoint as a separate tool. Don't auto-convert your OpenAPI spec one-to-one. Curate aggressively — abstract low-level endpoints into high-level, task-oriented capabilities. An MCP server with 93 tools (like GitHub's) is overwhelming for most models. Fewer, smarter tools beat many granular ones.
- Using an LLM when you don't need one. If the operation is deterministic (user profile fetch, order status, cron job), don't route it through an agent. Use a direct API call. The LLM adds latency, cost, and nondeterminism for zero benefit.
- Treating these as mutually exclusive. The title says "vs" but the real answer is "and." The best production systems use all of these patterns, each for what it's good at: function calling for the LLM reasoning loop, MCP for shared reusable integrations, direct API calls for deterministic ops, CLIs for developer tools. It's a toolkit.
- Designing MCP tools like an API for developers. Your API has 80 endpoints? That doesn't mean your MCP server needs 80 tools. Models can't browse your docs, can't learn from past sessions, and can't infer relationships between endpoints that seem obvious to you. Consolidate around user intent, write descriptions that explain when and why to use each tool (not just what it does), and make every response guide the model toward the correct next action. The gap between "technically works" and "works reliably in production" is almost entirely tool design.
The Bottom Line — A TL;DR Decision Tree
Do you need LLM reasoning to decide what action to take?
- No → Use direct API calls. Done.
- Yes → continue.
How many tools/integrations do you have?
- 1–5 → Function calling. Ship fast, iterate.
- 5–20 → Consider MCP for shared integrations + function calling for app logic.
- 20+ → MCP servers (or a unified API platform) + an MCP gateway for governance.
Do multiple agents/products share the same integrations?
- No → Function calling is fine.
- Yes → MCP is the right abstraction.
Are you building a new capability from scratch (no existing API)?
- Yes, and multiple agents will use it → Build it as an MCP server directly. Skip the API layer.
- Yes, but only one agent needs it → Function calling with the logic inline is simpler.
Does a CLI already exist for this tool?
- Yes → Try it first. Especially for git, docker, cloud CLIs, dev tools.
Is the operation time-critical (sub-10ms)?
- Yes → Direct API calls only. No LLM in the loop.
Don't chase a technology because it's hype. Think about your use case. Time to revisit some of your past decisions?
Author
Shirin Khosravi Jam & Shantanu Ladhwe
Continued reading
Keep your momentum

MKT1 Newsletter
100 B2B Startups, 100+ Stats, and 14 Graphs on Web, Social, and Content
This is Part 2 of MKT1's three-part State of B2B Marketing Report. Where Part 1 looked at teams and leadership , Part 2 turns to what marketing teams are actually doing — what their websites look like, how they use social, and what "content fuel" they're producing. Emily Kramer u
Apr 28 · 10m
Lenny's Newsletter (Lenny's Podcast)
Why Half of Product Managers Are in Trouble — Nikhyl Singhal on the AI Reinvention Threshold
Nikhyl Singhal is a serial founder and a former senior product executive at Meta, Google, and Credit Karma . Today he runs The Skip ( skip.show (https://skip.show)), a community for senior product leaders, plus offshoots like Skip Community , Skip Coach , and Skip.help . Lenny de
Apr 27 · 7m

The AI Corner
The AI Agent That Thinks Like Jensen Huang, Elon Musk, and Dario Amodei
Dominguez opens with a claim that is easy to skim past but worth stopping on: the difference between elite founders and everyone else is not raw IQ or speed — it is that each of them has internalized a repeatable mental procedure they run on every important decision. The procedur
Apr 27 · 6m