Stop Dumping Tools Into Context. It Doesn't Scale

Feb 1, 2026

5 min read

Stop Dumping Tools Into Context. It Doesn't Scale

Everyone loves MCP in demos. Nobody talks about what happens when you connect five servers and the agent forgets how to think.

Everyone's excited about MCP. Model Context Protocol — the open standard that lets AI agents talk to external tools. GitHub, Slack, databases, file systems. Plug in an MCP server, hand the model a tool menu, and watch the magic happen.

Except here's what nobody shows in the demos: what happens when that tool menu has ninety items on it.

I've been building agentic AI workflows for finance teams — the kind of multi-step automation where a missed parameter means a payment gets applied to the wrong account. So when I say MCP breaks at scale, it's not a theoretical concern. It's a daily one.

The Context Tax Nobody Talks About

MCP's core idea is elegant. You run MCP servers. Each server exposes tools with JSON schemas. The client loads those definitions into the model's context window. The model picks the right tool. Clean, typed, standardized.

In practice, that clean menu becomes a wall of noise.

One developer recently audited their Claude Code setup and found their MCP tools were consuming over 66,000 tokens of context before they even started a conversation. That's a third of Claude's 200k context window — gone. Just on tool definitions nobody asked for.

And it's not just hobbyist setups. GitHub's official MCP server alone defines 93 tools and eats roughly 55,000 tokens. Vercel independently confirmed: ~50,000 tokens just describing what GitHub's server can do. Now stack three or four more servers on top.

The model is trying to solve your actual problem with whatever context budget is left. It's like handing someone a 200-page restaurant menu and wondering why they ordered wrong.

Why Accuracy Multiplies Down

Here's the part that makes this an engineering problem, not just a UX annoyance.

Every tool call is a probabilistic decision. Say each individual tool call has 90% accuracy. In a five-step workflow, you need all five to land correctly:

0.9 × 0.9 × 0.9 × 0.9 × 0.9 = 0.59

Five steps. 59% reliability. And that 90% per-call number is generous — it drops fast when the model is distracted by dozens of irrelevant tool definitions competing for its attention.

What Anthropic Actually Did About It

Anthropic didn't patch MCP. They didn't declare it dead. They quietly shipped something called Skills.

A Skill is deceptively simple: a folder containing a SKILL.md file with YAML frontmatter — name, description, metadata — followed by detailed instructions, optional reference docs, and optional executable scripts.

The key design decision isn't what's in the folder. It's when the model reads it.

At startup, the agent loads only minimal metadata for each skill. Name and a one-line description. Maybe a hundred tokens total. The full instructions, reference docs, and scripts stay on disk until the model decides it needs them.

This is progressive disclosure. Or as I'd call it: RAG for tools.

The Architecture That Actually Works

MCP is the execution layer. It connects your agent to external systems. That part works fine.

Skills are the retrieval and orchestration layer. They sit in front of MCP and decide what the agent needs to know, when it needs to know it.

The flow: retrieve the right skill → load only relevant instructions → run deterministic code when possible → call MCP tools only when necessary, with precise parameters from the skill's instructions.

The model isn't choosing from a hundred tools anymore. It's choosing from a handful of workflow wrappers — each of which already knows which tools to call and how to call them.

The Quiet Part Out Loud

Anthropic didn't kill MCP. They made it usable by acknowledging three things most agent builders are still ignoring: context is scarce, static tool exposure doesn't scale, and the future isn't about giving agents more tools.

It's about giving them better ways to relate to tools.