MCP documentation shows you how to build a server that returns the current time. It won't show you what happens when a client drops mid-request and reconnects with a different session ID twenty seconds later. Or when you ship 71 tools and realize agents can't figure out which one to call. Or when your auth token expires silently and every request returns empty results because the middleware failed open instead of closed.

I built CacheBash — an MCP server coordinating 20+ AI agents across Claude Code, Cursor, and VS Code sessions. Seventy-one tools. Twenty-eight thousand lines of TypeScript. Six months in production. The architecture post covered what CacheBash does. This post covers what broke: session identity, tool sprawl, auth rotation, rate limits, and message delivery.

At peak load the fleet runs dozens of concurrent tool calls per minute across multiple repos. Sessions stay alive for hours, and the same agent identity persists across session boundaries. The failure modes below only show up when agents reconnect, retry, race each other, or quietly stop heartbeating — things you never see in a single-session demo.

Transport: HTTP Isn't Stateless When Sessions Matter

MCP supports two transports: stdio (local process) and HTTP (remote server). The docs make HTTP sound simple. It isn't.

The stdio model is session-per-process. You spawn the server, it initializes, you make tool calls, you kill the process. Clean lifecycle. The MCP client (Claude Code, Cursor, etc.) manages one long-lived connection per server. Session IDs are stable. State is implicit.

The HTTP model is stateless. Every request is independent. The server has no idea if this is the first call from a client or the hundredth. Session IDs are whatever the client sends. If the client restarts, you get a new session ID. Your server has to handle that.

The docs don't tell you this matters. It does.

Reconnection Breaks Everything

What actually happens in production:

Claude Code starts a session. Makes 10 tool calls. Everything works.
User closes Claude Code for lunch.
User reopens Claude Code. Different session ID.
Agent tries to read its program state. Gets an empty result because the state was scoped to the old session.
The agent has amnesia. It lost context not because the model forgot, but because your transport layer treated a reconnection as a brand new client.

The fix: Stop keying state to session IDs. Key it to programId (or whatever stable identifier you use for agents). Sessions are ephemeral. Agent identity isn't.

We moved all persistent state out of session-scoped collections. Program state, learned patterns, context summaries, all keyed to programId. Sessions became disposable. Agents reconnect and pick up exactly where they left off.

Obvious in hindsight. Not obvious on day one. Every MCP tutorial uses session-scoped examples because they're simpler. Production systems need identity-scoped state.

Idle Timeouts Are Silent

HTTP transports don't notify you when a connection dies. The client just stops calling tools. Your server has no idea if the agent is thinking or dead.

We run session heartbeats. Every agent calls update_session(status="...") periodically. A monitoring module checks heartbeats and flags sessions as stale after 10 minutes of silence. Sessions that miss 30 minutes of heartbeats get archived.

Their in-progress tasks get returned to the queue. Another agent claims them. The work doesn't get lost.

The gotcha: Agents forget to send heartbeats. We burned three days debugging "tasks randomly getting reassigned" before realizing a builder agent was going silent mid-task. The session timed out. The orchestrator saw the task unclaimed and dispatched it to someone else. Two agents, same work, racing to completion.

The fix was adding heartbeat enforcement to the agent prompt. Not elegant, but effective. Agents that don't heartbeat get killed. Period.

The hardest problems weren't transport or auth. They were tool design.

Tool Schema Design: You Have Too Many Tools

Seventy-one tools sounded reasonable when we designed the API. It's not.

When Claude Code sees 71 tools, it gets confused about which one to use. Not because the model is bad. Because your tool names are ambiguous and your descriptions are too similar.

Tool Naming Is an API Design Problem

We have get_tasks(), get_task_by_id(), batch_claim_tasks(), and claim_task(). Four different tools for task operations. send_message(), get_messages(), get_sent_messages(), query_message_history(). Four more for messaging.

Agents call get_tasks() when they mean get_task_by_id(). Or call claim_task() in a loop instead of using batch_claim_tasks(). The tools work. The naming is misleading.

What I'd do differently: Verb-noun naming with clear domain prefixes.

Dispatch domain:

dispatch_create_task
dispatch_claim_task
dispatch_complete_task

Relay domain:

relay_send_message
relay_get_messages
relay_query_history

State domain:

state_get
state_update
state_recall_memory

Redundant? Yes. Clear? Also yes. Namespacing eliminates the "which get_ do I want?" problem entirely. This is on our backlog.

Parameter Bloat Kills Tool Calls

Our send_message() tool has 13 parameters. Most are optional. The agent only uses 4 of them 90% of the time.

The problem: Claude Code sees 13 parameters and hesitates. It doesn't know which ones are required. It reads the schema, tries to infer intent, and sometimes just gives up and asks the user.

We added send_directive() — an opinionated wrapper that auto-sets message_type and priority. Three required params instead of thirteen.

send_message() → 13 params, 4 required (source, target, message_type, message)
send_directive() → 3 required params (source, target, message), auto-sets the rest

The simpler the schema, the fewer the failures. If a tool has more than five parameters, it's probably doing too much. Split it or make a focused wrapper.

Auth: Bearer Tokens Look Simple Until They Don't

MCP HTTP transport uses the Authorization header. You send Bearer <token>, validate it, done.

That works for demos. In production, you need:

Key rotation. Agents run for months. API keys get compromised. You can't hard-code keys in config files.
Grace periods. When you rotate a key, the old key needs to work for 30 seconds so in-flight requests don't fail.
Soft revocation. You can't delete keys. You need an audit trail. Soft-delete with a revokedAt timestamp.

We built a key rotation system:

async function rotateKey(currentKeyHash: string) {
  const newKey = generateSecureKey();
  const newKeyHash = sha256(newKey);

  // Create new key
  await db.collection('apiKeys').doc(newKeyHash).set({
    programId: program.id,
    createdAt: now(),
    revokedAt: null
  });

  // Grace-revoke old key (30s window)
  await db.collection('apiKeys').doc(currentKeyHash).update({
    revokedAt: now() + 30000
  });

  return newKey; // Show once, never stored
}

The auth middleware checks both revokedAt === null and revokedAt > now(). Keys in the grace window still work. After 30 seconds, they're dead.

Agents call rotate_key(), get the new key, update their config, and continue. No downtime. No failed requests.

Encryption Keys Derived From API Keys

We needed E2E encryption for sensitive data (user questions, mobile notifications). We didn't want a separate encryption key system.

Solution: derive encryption keys from API keys.

function deriveEncryptionKey(apiKey: string): Buffer {
  const salt = sha256(apiKey).substring(0, 16);
  return pbkdf2Sync(apiKey, salt, 100000, 32, 'sha256');
}

The API key validates the request. The same key (via PBKDF2) encrypts the data. One secret, two uses. When the API key rotates, the encryption key rotates automatically.

The gotcha: You can't decrypt old data after a key rotation unless you store a mapping of keyHash → oldKey. We don't. Once a key is rotated, old encrypted data is unreadable.

This is a feature. Encrypted questions are ephemeral. If an agent rotates keys, old questions expire. That's acceptable for our use case. It won't be for everyone.

What Breaks at Scale That Toy Examples Never Hit

Rate Limiting Is Not Optional

MCP has no built-in rate limiting. The protocol assumes trust. Production systems don't get to assume trust.

We run tiered rate limits: 60 requests per minute for standard programs, 300 for orchestrators, 600 for admin. Sliding window per API key. When an agent goes rogue (retry loop, infinite recursion, etc.), it hits the limit in seconds.

The first time we saw this, an orchestrator agent got stuck dispatching the same task repeatedly. It burned through the rate limit in 40 seconds. Every subsequent request returned 429. The entire fleet stopped.

The fix: Circuit breakers at the agent level. If an agent fails the same operation twice, it escalates instead of retrying. If it hits a rate limit, it backs off exponentially.

Rate limiting saved the system. Circuit breakers saved the agents.

Message Delivery Is Not Guaranteed

MCP tools return success or failure. They don't return "message sent but not delivered" or "request timed out, maybe succeeded, maybe didn't."

Our relay system uses fire-and-forget message delivery. You call send_message(), we write to Firestore, we return success. Whether the recipient actually reads it is out of band.

This breaks agent workflows that assume synchronous request-response. An orchestrator sends a directive, assumes the builder got it, and moves on. The builder never saw the message (maybe it was offline, maybe the message expired). The orchestrator is blocked waiting for a result that will never come.

The fix: Explicit ACKs. The recipient calls send_message(message_type="ACK", reply_to=original_msg_id). The sender polls for the ACK. If it doesn't arrive in 30 seconds, assume failure and retry or escalate.

TCP semantics bolted onto Firestore. Not elegant. Works.

Tool Failures Fail Silently

When a tool call fails, MCP clients (Claude Code, Cursor) show the error to the user. The agent sees the error. The server sees the error.

What doesn't happen: telemetry. You have no idea which tools are failing, how often, or why. Not unless you instrument every tool handler.

We log every tool call:

try {
  const result = await handler(auth, args);
  await logAudit(auth.userId, toolName, 'success', result);
  return result;
} catch (err) {
  await logAudit(auth.userId, toolName, 'failure', err.message);
  throw err;
}

After a week, we had data. batch_claim_tasks was failing 15% of the time due to Firestore transaction conflicts. send_message was failing 5% of the time due to invalid target IDs.

We fixed both. Without audit logs, we'd never have known.

What's Actually Hard

Building an MCP server is easy. Building one that stays up when 20 agents hammer it concurrently, handles reconnections gracefully, doesn't leak sessions, enforces rate limits without false positives, and gives you enough telemetry to debug failures after the fact — that's the part the docs don't cover.

The MCP protocol is solid. The ecosystem is growing. The clients work. But the gap between "hello world" and "production-ready" is bigger than any tutorial shows.

If you're building an MCP server for real work, this is what actually matters:

Seven Lessons From Running MCP in Production

Session IDs are ephemeral. Agent identity isn't.
HTTP transport is stateless. Reconnections are normal.
Tool count is a UX problem. Fewer tools win.
Auth requires rotation, grace periods, and audit trails.
Rate limits protect you from runaway agents.
Message delivery is not guaranteed. Build ACKs.
Instrument everything. You can't debug what you can't measure.

The CacheBash MCP server is open source. The code is there. The gotchas are documented. If you're building something similar, start there and save yourself six months.

Christian Bourlier builds multi-agent systems with CacheBash and writes about what breaks along the way. The code is at github.com/rezzedai.

I Built an MCP Server. Here's What the Docs Don't Tell You.