simonwillison.net web signal

Willison ships llm-coding-agent 0.1a0, an LLM-library agent

TL;DR

  • Simon Willison released llm-coding-agent 0.1a0 on July 2, 2026, a small alpha coding agent built on his LLM library.
  • The agent ships six tools including file read/write/edit, shell run with a 600 second cap, glob listing and regex search.
  • Willison had Claude Code write the spec first and implement it via red/green TDD, calling the first cut 'pretty good'.

Simon Willison shipped an alpha of llm-coding-agent on July 2, tagged 0.1a0. It is a small coding agent built on his LLM library, which he says has 'evolved into more of an agent framework', so the release is partly a proof that the framework can host something useful, and partly the framework's first flagship user.

The setup is deliberately minimal. You invoke it with `uvx --prerelease=allow --with llm-coding-agent llm code`, and the agent exposes six tools: read a file with line numbers and pagination, write a file, edit by string replacement with diff verification, run shell commands with a 600 second cap, list files by glob while respecting .gitignore, and search file contents by regex. There is a `--yolo` mode for running unattended, and a `--allow` flag for scoped approvals, so `--allow "pytest*"` lets the agent run pytest without prompting on every command. On the Python side there is a `CodingAgent` class you can drive directly from code, with parameters for model, working directory, and approval requirements.

The build story is at least as interesting as the tool. Willison calls it 'another Fable 5 experiment' and describes the workflow as: have Claude Code write the spec, then 'build it using red/green TDD in a series of sensible commits (each with passing tests and updated docs).' He says the result is 'pretty good for a first attempt', and notes at least one behaviour was implemented that he 'didn't ask for but I'm delighted to see'. It is a concrete data point on how far spec-then-TDD can be pushed as a delegation pattern for an agent writing an agent.

The honest caveats are the ones you would expect for an 0.1a0 release. There are no benchmarks and no head-to-head comparisons with Claude Code or other established coding agents in the post. The published demo, asking the agent to scaffold a SwiftUI CLI ASCII-art clock, which the agent itself flagged with 'SwiftUI isn't suitable for a true CLI' before proceeding, is a smoke test rather than a stress test. And the `--yolo` flag is exactly what it sounds like, shipped as a headline convenience rather than buried behind a warning.

Where this matters is as a readable reference. Coding agents built on hosted platforms are opaque; a small Python package with six tools, a scoped-approval model, and a public spec-plus-TDD build history is something you can actually read end to end, fork, and rewire. If you want to understand what a coding agent's inner loop actually looks like, or embed one inside a larger Python workflow, this is now one of the shortest paths there.

Shared on Bluesky by 2 AI experts