MCP vs CLI + Skills: when to use which

Every major SaaS now ships an official MCP server. GitHub (preview April 2025, GA September 2025). Linear (May 2025, expanded February 2026). Slack (GA February 2026). Stripe (28 tools across 11 categories). Sentry. Datadog (GA March 2026). Notion. Atlassian.

Meanwhile, Andrew Ng launched Context Hub in March 2026 as a CLI + Skill — 12,000+ GitHub stars in three weeks. Multiple practitioners have written about this tradeoff: jngiam, cra.mr, manveerc, intuitionlabs. The consensus is converging: they're complementary, not competing. The question is which for what.

Here's the rule of thumb I've landed on after shipping production agents for the past year.

What MCP actually costs

MCP is a protocol, not a product. Three transports: stdio (local subprocess), SSE (legacy), Streamable HTTP (current remote). The important thing is what happens when a client connects to an MCP server.

An Anthropic engineer confirmed in GitHub issue #3406: "I don't believe this is anything new — we always provide local & MCP tools directly to the model." Every tool definition — name, description, input schema, output schema — lands in the model's context window on every turn. Not just the first turn. Every turn.

The math gets brutal at scale. Independent measurements from jdhodges and scottspence show 150-800 tokens per tool definition depending on schema verbosity. Gmail's create_draft tool alone: 820 tokens. In GitHub issue #13717, a developer documented a 6-server Claude Code session where MCP tools consumed 98,700 of 200,000 context tokens — 49.3% of the window — before typing anything. The issue was closed as "not planned." This is the architecture, not a bug.

GitHub itself acknowledged the problem. In December 2025, the github-mcp-server changed its default config to ship only 5 toolsets instead of all 50+. When the vendor reducing its own surface is the fix, the overhead is real.

Prompt caching mitigates the dollar cost — cache reads at 0.1x of the base input price per Anthropic's pricing. But caching does not reduce context window occupancy. The thing that crowds out your agent's reasoning is the raw token count, and that's the uncacheable part.

What Skills actually cost

Anthropic launched Agent Skills in October 2025. The design is progressive disclosure — a three-level loading system:

Level 1: Metadata. Always loaded at startup. ~100 tokens per skill. Just the name (max 64 characters) and description (max 1,024 characters) from the YAML frontmatter. The agent knows what skills exist and when to use them. Nothing else enters context.

Level 2: Instructions. Loaded when the skill triggers (by relevance match to the user's request). Under 5,000 tokens. The full SKILL.md body with instructions, examples, and guidance. Only the triggered skill loads — the rest stay at Level 1.

Level 3+: Resources. Loaded as needed. Effectively unlimited. Bundled scripts execute via bash; only the output enters context, not the script code itself. The Anthropic engineering blog states explicitly: "the amount of context that can be bundled into a skill is effectively unbounded."

The contrast: 50 MCP tools = 20,000-40,000 tokens permanently in context every turn. 50 skills installed = ~5,000 tokens of metadata, with only the 1-2 triggered skills adding their body. The rest are invisible.

There is one constraint worth flagging: skills require a filesystem and code-execution runtime. They work in Claude Code, Claude.ai, the Claude API (with beta headers), and the Agent SDK. They don't work for pure-tool-call agents without shell access. The engineering blog says skills "complement Model Context Protocol servers" — not replace them.

Andrew Ng's Context Hub: the canonical example

In The Batch Issue 343 (March 2026), Andrew Ng wrote:

Coding agents often use outdated APIs, hallucinate parameters, or not even know about the tool they should be using. This happens because AI tools are rapidly evolving, and coding agents were trained on old data that does not reflect the latest tools. Context Hub, which is designed for your coding agent to use (not for you to use!) provides the context it needs.

His example: Claude Opus 4.6 uses the deprecated OpenAI chat.completions.create API instead of the newer responses.create API, even though the newer one has been out for a year. I've seen the same thing — it happened to me when I was building the code examples for my agent engineering methodology page. The Claude Agent SDK code block I shipped used Agent(config=AgentConfig(...)) — a class and a method that don't exist. The real API is ClaudeAgentOptions + ClaudeSDKClient. It looked plausible. It was wrong. It shipped.

Context Hub (chub) solves this by giving the agent curated, version-pinned markdown docs fetched via a CLI command. The recommended install path is a SKILL.md file that tells Claude Code: search for the library, fetch the doc, use the doc, annotate what you learned, give feedback to the maintainer.

Here's the nuance most coverage misses: Context Hub also ships an MCP server. The package.json has two bin entries — chub (CLI) and chub-mcp (MCP server with 5 tools: chub_search, chub_get, chub_list, chub_annotate, chub_feedback). Ng leads with CLI + Skill in his messaging, but he supports both transports. I don't think he's making a transport-architecture argument. He's making a "your agent needs real docs, not hallucinated APIs" argument — and the transport is whichever one your client speaks.

The decision framework

After shipping production agents across LangGraph, Claude Agent SDK, and OpenAI Agents SDK, here's the heuristic I use:

Use MCP when:

The integration is a third-party SaaS with an official MCP server (Slack, GitHub, Stripe, etc.)
Auth is OAuth or multi-tenant — MCP handles this natively
Multiple agents or multiple frameworks need the same tool surface
The tool has 20+ operations with typed schemas — discovery and validation earn their token cost

Use CLI + Skill when:

The tool is something you own (your deploy script, your validation pipeline, your doc-fetcher)
The agent has shell access (Claude Code, Codex, Devin)
The operation is narrow (1-3 commands) — MCP overhead isn't justified
Transparency matters — you want the agent to see the literal command and the literal output, and pipe it through grep and head

Use both when:

The ecosystem provides it (Context Hub ships both; GitHub has both gh CLI and github-mcp-server)
Different agents in your system need different transports (a LangGraph production agent uses MCP; a Claude Code development agent uses the CLI)

The counterexamples

Any heuristic that survives one year of shipping is good enough. This one survives most cases, but not all:

GitHub ships both gh CLI (500+ commands) and github-mcp-server (50+ tools, 28K stars). For project-scoped PR work in Claude Code, I use gh. For a production agent that searches issues across repositories, MCP is the right answer. Both exist because both have real use cases.

AWS has no official MCP server. The aws CLI is the canonical transport for cloud operations in agents. This is a product that ships as a CLI because the CLI is mature, battle-tested, and composable. The heuristic says "products get MCP" — AWS breaks it.

Sentry has an official MCP server that's explicitly "designed for human-in-the-loop coding agents" — a project-scoped use case. But the data is SaaS, multi-tenant, OAuth-scoped, so MCP is the right transport. The heuristic says "projects get CLI" — Sentry breaks it, because what matters is where the data lives, not what the agent is doing with it.

The lesson: it's a heuristic, not a law. "Mostly right" is the bar.

What I actually run

My setup for this portfolio site:

context7 as an MCP server — configured in .mcp.json so any Claude Code session in this repo gets it automatically. It covers the long tail: 2,700+ libraries, bleeding-edge SDKs (claude-agent-sdk, openai-agents, deepeval) that chub's curated registry doesn't have yet. The cost is a few MCP tools in the context window — worth it because I call it frequently enough.

chub as a CLI + Skill — installed globally via npm, with a get-api-docs Skill at ~/.claude/skills/get-api-docs/SKILL.md. It covers the hot list with curated, version-pinned quality: LangGraph, OpenAI, Anthropic, LangChain. The docs are community-maintained markdown, so I can inspect exactly what the agent reads. Annotations persist across sessions — if the agent discovers a gotcha, it saves it.

research/code-validation/ as the backstop — a uv-managed Python + TypeScript sandbox where every code example that ships to a user-facing page gets executed first. Fetching docs is necessary but not sufficient — I caught hallucinations in code that used the right library name but the wrong class, the wrong method, the wrong parameter names. Running it locally is the only way to be sure.

The transport is a tool. The discipline is the thing.