Blog

Agents are Bad at Writing Agents

Why eval-driven agent loops optimize for passing the metric over the goal, and what that means for building reliable dev agents.

Claude Code Save Plan Hook

Never lose a Claude Code plan again. A tiny hook that snapshots your latest plan into the repo the moment Claude transitions from Plan to Edit.

Using Markdown Templates with AI

While working with LLM tools for production application development, I’ve found one of the highest-leverage productivity hacks to be the use of Markdown...

Building AI Agents at Scale

Notes from AWS re:Invent 2025: Scaling agentic systems, enforcing multi-tenant safety, and choosing the right architectural patterns.

LLM Eval Reliability Foundations

Notes from AWS re:Invent 2025: Why LLM benchmarks are failing, how contamination and nondeterminism distort scores, and where evaluation is heading.