How to Evaluate an MCP Server With an LLM: 17 Bugs Found and Fixed
Your tools work. That doesn't mean an agent can use them. Eight rounds. Seventeen bugs.
June 27, 2026 · 12 min
Cloud Architecture · AI Engineering · Distributed Systems
44 posts · 2025–2026
Your tools work. That doesn't mean an agent can use them. Eight rounds. Seventeen bugs.
June 27, 2026 · 12 min
How agents should navigate documents at production scale, from 26,000+ downloads of one MCP server.
June 24, 2026 · 11 min
Seven Lambdas, two SQS queues, one DynamoDB table. SES for sending. No per-subscriber fee.
June 20, 2026 · 10 min
Why more MCP tools make agents worse, and the pattern that fixes the surface.
June 13, 2026 · 8 min
Take the notes server from localhost STDIO to a secured, containerized HTTP service you can run on any host.
June 09, 2026 · 11 min
Hybrid wins at page grain. BM25 wins at section grain. Granularity decides.
June 06, 2026 · 10 min
Page-mode PDF search costs 2 to 6 extra tool calls per query depending on the document. Section-aware search delivers...
May 30, 2026 · 10 min
Every Claude Desktop session using your MCP server is a free QA pass. You just have to listen to what the LLM is tryi...
May 23, 2026 · 10 min
A prompt change broke my agent silently. Behavioral CI caught 4 regressions before they shipped.
May 16, 2026 · 8 min
10 trace CLI commands that turn re-run-and-guess debugging into actual inspection.
May 09, 2026 · 6 min
My agent burned tokens on outputs it never reread. One context change cut costs 21%.
May 02, 2026 · 8 min
Failures I couldn't reproduce led to a trace-first layer. No dashboard, no infra, just traces.
April 25, 2026 · 9 min
No articles found for this filter.