I Built CI for My AI Agent (It Catches What You Miss)
A prompt change broke my agent silently. Behavioral CI caught 4 regressions before they shipped.
May 16, 2026 · 8 min
Cloud Architecture · AI Engineering · Distributed Systems
36 posts · 2025–2026
A prompt change broke my agent silently. Behavioral CI caught 4 regressions before they shipped.
May 16, 2026 · 8 min
10 trace CLI commands that turn re-run-and-guess debugging into actual inspection.
May 09, 2026 · 6 min
My agent burned tokens on outputs it never reread. One context change cut costs 21%.
May 02, 2026 · 8 min
Failures I couldn't reproduce led to a trace-first layer. No dashboard, no infra, just traces.
April 25, 2026 · 9 min
The router missed exactly when it mattered. Benchmarked against hybrid RRF on real PDFs.
April 18, 2026 · 11 min
MCP tutorials stop at tool integration. The real work is auth, guardrails, and untrusted data.
April 11, 2026 · 7 min
Semantic search finds synonyms. Keyword search finds invoice numbers. Benchmarked in pdf-mcp.
April 07, 2026 · 8 min
Tool boundaries shape agent behavior more than prompts. What I learned shipping pdf-mcp.
April 04, 2026 · 10 min
Every check passed. 67% of the facts were wrong. Three layers to catch confident hallucinations.
March 28, 2026 · 10 min
No articles found for this filter.