
In the year since the Model Context Protocol (MCP) launched, it has rapidly become the industry standard for connecting AI agents to the world. We’ve seen a massive shift from fragmented, custom integrations to a universal ecosystem where developers “implement once” to unlock thousands of tools.
However, as agents grow more sophisticated, they are hitting a “token wall.” Connecting an agent to hundreds of tools often leads to bloated context windows, high latency, and soaring costs.
The solution? Code Execution. By teaching agents to write code to interact with MCP servers—rather than relying on direct tool calls—we can reduce token usage by over 98% while improving security and reliability.
The Scalability Problem: Token Bloat
Traditionally, MCP clients load every available tool definition directly into the model’s context window. This creates two major bottlenecks:
1. Tool Definition Overload
If an agent has access to dozens of MCP servers, it might be processing hundreds of thousands of tokens just to “understand” its capabilities before it even reads the user’s request.
2. The “Intermediate Result” Tax
In a standard loop, every piece of data must flow through the model. Imagine asking an agent to “Move a 50-page transcript from Google Drive to Salesforce.”
- Step 1: The model calls
gdrive.getDocument. - Step 2: The entire transcript is piped into the context window.
- Step 3: The model must then output that entire transcript again to call
salesforce.updateRecord.
This isn’t just expensive; it’s risky. Large documents can exceed context limits or cause the model to hallucinate or truncate data during the “copy-paste” process.
Enter “Code Mode”: MCP as an API
Instead of presenting tools as flat lists, we can present MCP servers as a filesystem-based API. In this paradigm, the agent discovers tools by exploring a directory and only reads the specific files (and definitions) it needs for the task.
Example: TypeScript-based Tool Discovery
servers/
├── google-drive/
│ ├── getDocument.ts
│ └── index.ts
└── salesforce/
├── updateRecord.ts
└── index.ts
When an agent needs to move that meeting transcript, it no longer passes the data through its own “brain.” It writes a script:
import * as gdrive from './servers/google-drive';
import * as salesforce from './servers/salesforce';
const transcript = (await gdrive.getDocument({ documentId: 'abc123' })).content;
await salesforce.updateRecord({
objectType: 'SalesMeeting',
recordId: '00Q5f000001abcXYZ',
data: { Notes: transcript }
});
The Result: Token usage drops from 150,000 to roughly 2,000. That is a 98.7% saving in cost and time.
4 Key Benefits of the Code-First Approach
1. Progressive Disclosure
Models are exceptionally good at navigating filesystems. By using a search_tools function or directory listing, an agent can “on-demand” load only the relevant schemas. This keeps the context window lean and the model focused.
2. Context-Efficient Filtering
If an agent needs to find “Pending” orders in a 10,000-row spreadsheet, a traditional agent would ingest all 10,000 rows. A code-exec agent writes a .filter() function in the execution environment and only returns the 5 relevant rows to the context window.
3. Privacy-Preserving Operations
Code execution allows for PII (Personally Identifiable Information) Stripping. The MCP client can intercept data flowing between tools (e.g., from Sheets to Salesforce) and tokenize sensitive names or emails before the model ever sees them. The data moves securely in the background, while the model only sees placeholders like [EMAIL_1].
4. State Persistence and “Skills”
Agents can now maintain a “workspace.” They can write intermediate results to a CSV file to resume work later or save successful code snippets as Skills. Over time, your agent builds its own library of reusable functions, evolving from a simple assistant into a specialized power user.
The Trade-off: Security and Sandbox
It is important to note that code execution isn’t a “free lunch.” Running agent-generated code requires:
- Robust Sandboxing: To prevent the agent from accessing the host system.
- Resource Limits: To prevent infinite loops or memory leaks.
- Monitoring: To audit what the agent is actually executing.
Summary
The future of efficient AI isn’t just “bigger context windows”—it’s smarter context management. By combining the universal connectivity of MCP with the logic of Code Execution, we can build agents that are faster, cheaper, and more capable of handling complex, data-heavy workflows.
Follow us for more Updates