A capability-bound LLM agent

capa_agent_demo is an LLM agent harness in about 650 lines of Capa across four files. The model can call a small set of tools (read a file, list a directory, fetch a GET-only URL, read the clock); each tool's authority is statically narrowed by the type system; and the capability manifest is the audit contract. This page describes the real repository; every manifest shown here was regenerated from its source with capa --manifest in June 2026.

The problem

Mainstream agent harnesses (LangChain, OpenAI function calling, MCP servers) ship tools as arbitrary functions with no permission system. Whether a prompt injection that says "now delete the home directory" succeeds depends on whether some tool happens to reach subprocess.run or an unrestricted filesystem handle somewhere deep in its body. The blast radius is implicit, dynamic, and auditable only by reading every line of every tool.

Industry guidance on this (the "lethal trifecta" framing, model-vendor security guidelines) ultimately rests on convention: this tool only reads, trust me. The demo replaces the convention with a type-system guarantee: a tool's capability surface is its signature, the agent loop's surface is the union of its tools' signatures, and the compiler refuses to type-check a call to any authority a function did not name.

What the demo is

A four-tool agent loop against the Anthropic Messages API. The model runs for up to 5 turns: it can issue tool calls, read the results, and emit a final answer. Every tool call is forwarded through one function, dispatch_tool, which is the narrow waist: its signature holds the entire authority union the loop is allowed to exercise.

ToolCapabilityWhat it does
read_file ReadOnlyFs Read a file's contents.
list_dir ReadOnlyFs List a directory's entries.
get_url GetOnlyHttp HTTP GET against an allow-listed host (no POST, PUT, or DELETE).
current_time Clock Return Unix seconds.

ReadOnlyFs and GetOnlyHttp are not built-ins. They are user-defined capabilities in attenuated.capa, implemented as cap-bearing structs: ReadOnlyFsImpl holds the underlying Fs internally and exposes only read_file and list_dir; GetOnlyHttpImpl holds the raw HTTP client plus a host allow-list and exposes only fetch. A holder of ReadOnlyFs cannot call Fs.write because the wrapper never exposes it. The wrapper legitimately holds the raw Fs inside; what the analyzer guarantees is that the implementation stays within the capability surface it declares, and that consumers can never reach past that surface to the authority underneath.

The audit moment

One command answers the question every agent-security review starts with: what is the worst this agent can do?

$ capa --manifest agent.capa

tool_read_file    -> [ReadOnlyFs]
tool_list_dir     -> [ReadOnlyFs]
tool_get_url      -> [GetOnlyHttp]
tool_current_time -> [Clock]
dispatch_tool     -> [ReadOnlyFs, GetOnlyHttp, Clock]
run_agent_loop    -> [Stdio, Logger, LlmClient, ReadOnlyFs, GetOnlyHttp, Clock]
main              -> [Stdio, Env, Clock, Fs, Unsafe]

The real command emits a JSON manifest; the listing above is a condensed function-to-capabilities view of it. The function names and capability sets are taken verbatim from the actual output.

Read the run_agent_loop line and you are done. The model is in control of that function for up to 5 turns, and its blast radius is exactly those six capabilities: no write-capable Fs, no POST-capable HTTP, no Net, no Db, no Unsafe. A prompt injection cannot widen the surface, because the compiler refuses to type-check a call to an authority the function did not declare; there is nothing for the injected text to reach.

main is the wiring point. It legitimately holds Unsafe and the raw Fs, passes them through three factories (make_readonly_fs, make_get_only_http, make_anthropic_client), and hands the loop only the attenuated versions. Past main, the unattenuated authorities are unreachable. That split is visible in the manifest itself: the dangerous line is the one line a human has to review, and the function behind it is under twenty lines of wiring.

The manifest as the audit contract

The same source feeds capa --cyclonedx, so the per-function authority bound above ships inside a standard CycloneDX SBOM. A reviewer assessing an agent product built this way reads the agent's full authority boundary statically, from the artifact, without reading the agent's source and without trusting an attestation about its behaviour. The bound holds by construction: the compiler will not produce the artifact if the code exceeds it. The manifest page covers the format; the regulatory page covers where such evidence fits in CRA-style frameworks.

The demo's own supply chain

The harness consumes two Capa seed libraries through capa.toml, both pinned and key-verified:

[dependencies.capa_http]
git = "https://github.com/nelsonduarte/capa_http"
tag = "v0.1.3"
verify_key = "6C1D222D491FB88031E041A536CFB426101AA24B"

[dependencies.capa_log]
git = "https://github.com/nelsonduarte/capa_log"
tag = "v0.1.2"
verify_key = "6C1D222D491FB88031E041A536CFB426101AA24B"

capa install runs three independent checks on each dependency: the SHA in capa.lock must match the resolved commit (catches tag retags), git verify-tag must match the pinned GPG fingerprint (catches account compromise), and when the dependency declares a verify_key and lives on GitHub, the SLSA L2 build-provenance attestation is verified through Sigstore Rekor via gh attestation verify.

Run it

You need an Anthropic API key; the tool calls themselves run locally.

$ git clone https://github.com/nelsonduarte/capa_agent_demo
$ cd capa_agent_demo
$ capa install
$ export ANTHROPIC_API_KEY=sk-ant-...
$ capa --run agent.capa -- "What's the weather in Lisbon today? \
    Use get_url with https://wttr.in/Lisbon?format=3"

The get_url allow-list is hard-coded in agent.capa (wttr.in and api.github.com); the weather prompt works because wttr.in is on it. Point the model at any other host and the wrapper refuses before a request is made.

Honest limits (v0.1)

Where to go next