Language reference
Full specification of the syntax and semantics of the Capa language. For a guided introduction, see the Learn track. For the built-in APIs, see the standard library page.
1. Lexical structure
1.1. Encoding
UTF-8 is required. Identifiers may contain any Unicode letter, digits, and _, but must start with a letter or _.
1.2. Comments
// Line comment (runs to the end of the line)
/// Doc comment (attaches to the next declaration)
/** Block doc comment (same role) */
Regular block comments /* ... */ are also accepted by the lexer (and ignored). Only the doc variants are attached to AST nodes.
1.3. Indentation
Capa is indentation-sensitive, à la Python. Implicit INDENT/DEDENT/NEWLINE tokens are produced by the lexer:
- Leading whitespace on a line defines its indentation level
- Increase →
INDENT - Decrease →
DEDENT - End of line →
NEWLINE - Inside
(,[,{,NEWLINEis suppressed (implicit line continuation)
1.4. Implicit continuation by leading dot
For multi-line method chaining, a line beginning with . is treated as a continuation of the previous line:
let r = xs
.filter(...)
.map(...)
.fold(...)
1.5. Keywords
fun let var if then elif else match while for in
break continue return import const type trait impl capability
true false and or not consume self Self
async await yield defer where mut
The last row lists reserved-for-future-use keywords. The lexer recognises them; the parser rejects their use.
1.6. Literals
| Type | Examples |
|---|---|
| Integer | 42, -7, 0, 1_000_000, 0xff, 0o755, 0b1010 |
| Float | 3.14, 2.0, 1e10 |
| String | "hello", "a\nb", "x = ${x}" |
| Char | 'a', '\n' |
| Bool | true, false |
| List | [1, 2, 3], [] |
| Tuple | (1, "a"), (x,), () |
| Range | a..b (exclusive), a..=b (inclusive) |
1.7. Interpolated strings
${expr} inside a string literal is parsed as a Capa expression:
let n = 7
"value = ${n * 2}" // "value = 14"
"len = ${xs.length()}"
$$ is the literal-$ escape. Nested string literals inside interpolation are not supported.
2. Type system
2.1. Primitive types
Int, Float, String, Bool, Char, Unit. See the standard library for the methods on each.
2.2. Compound types
| Construct | Syntax |
|---|---|
| List | List<T> |
| Tuple | (T1, T2, ..., Tn) |
| Function | Fun(T1, T2) -> Ret |
| Map | Map<K, V> |
| Set | Set<T> |
| Option | Option<T> |
| Result | Result<T, E> |
2.3. User-defined types
Structs:
type Person { name: String, age: Int }
Sum types (nominal variants):
type Shape =
Circle(Float)
Rectangle(Float, Float)
Square(Float)
Variants may have zero or more payloads. Variants without a payload (type X = A) are constants, used without ().
The variant names Ok, Err, Some, and None are reserved: a user-defined sum type cannot redeclare any of them. They belong to the built-in Result and Option; shadowing them would silently change the meaning of pattern matches across a module.
2.4. Generics
Functions and types can take type parameters delimited by <>:
fun first<T>(xs: List<T>) -> Option<T>
return xs.first()
type Pair<A, B> { first: A, second: B }
Local inference: the caller rarely needs to supply explicit args. first<Int>([1,2,3]) is equivalent to first([1,2,3]).
2.5. Cross-statement inference
let xs = [] produces List<TyVar>. The first use pins the type parameter:
let xs = []
xs.push(42) // OK, infers List<Int>
xs.push("oops") // error: expects Int, got String
TyVar sharing propagates through aliases (let ys = xs) and into calls to typed functions (process(xs) where process: List<Int> -> ...).
2.6. Compatibility
compatible(expected, actual) is structural with exceptions:
TyUnknown(an untyped expression) is compatible with any typeTyVar(inference placeholder) is compatible with any type
3. Statements
3.1. Bindings
let name = "Ana" // immutable, type inferred
let age: Int = 30 // immutable, explicit type
var counter = 0 // mutable
counter = counter + 1 // assignment (only for var)
Pattern matching in bindings:
let (a, b) = pair() // tuple destructuring
let Person { name, age } = p // struct destructuring
3.2. Control flow
// if-statement
if cond
body1
elif cond2
body2
else
body3
// while
while cond
body
// for
for x in iter
body
// match (statement)
match scrutinee
pat1 -> body1
pat2 -> body2
// match (expression, multi-line)
let r = match scrutinee
pat1 -> expr1
pat2 -> expr2
// match (expression, inline single-line)
let r = match scrutinee { pat1 -> expr1, pat2 -> expr2 }
// break / continue (only inside loops)
break
continue
// return
return // returns ()
return expr // returns a value
3.3. Expressions as statements
Any expression can be a statement (value discarded):
stdio.println("hello") // call with side effect
xs.push(42) // mutation
1 + 2 // value discarded (valid but useless)
4. Expressions
4.1. Operators
In decreasing precedence:
| Operator | Description |
|---|---|
() [] . | Call, index, field access |
not - | Unary |
* / % | Multiplicative |
+ - | Additive |
.. ..= | Range |
< <= > >= == != | Comparison |
and | Short-circuit conjunction |
or | Short-circuit disjunction |
? | Try (Err propagation) |
4.2. if as an expression
let cat = if cond then e1 else e2
The then keyword is the discriminator: without it, if is a statement.
4.3. match as an expression
match is the same production whether used as a statement or as an expression: the value is consumed in expression position and discarded in statement position. Two surface forms exist:
// Multi-line (indented arms, expression OR block body)
let r = match scrutinee
pat1 -> expr1
pat2 -> expr2
// Inline (single-line, comma-separated, expression body only)
let r = match scrutinee { pat1 -> expr1, pat2 -> expr2 }
Both forms accept guards and or-patterns. All arms must produce compatible types.
The inline form's { ... } opens immediately after the scrutinee. This collides syntactically with the struct-literal heuristic; to force a struct literal as the scrutinee, wrap it in parentheses:
match (Point { x: 1.0, y: 2.0 })
Point { x, y } -> stdio.println("${x}, ${y}")
4.4. Lambdas
fun (x: Int) -> Int => x * 2 // single-expression
fun (x: Int) -> Int => // block body
let y = x * 2
return y + 1
fun () -> Int => 42 // no params
fun (a: Int, b: Int) -> Int => a + b // multiple params
Lambdas capture the lexical environment. If a single-line lambda contains a nested match, the transpiler automatically promotes it to a nested function.
4.5. The ? operator
Propagates Err in functions that return Result:
fun read_two(fs: Fs) -> Result<(String, String), IoError>
let a = fs.read("a")? // if Err, returns immediately
let b = fs.read("b")?
return Ok((a, b))
5. Pattern matching
5.1. Available patterns
| Pattern | Syntax | Matches |
|---|---|---|
| Wildcard | _ | Any value |
| Identifier | x | Binds to x |
| Literal | 42, "x", true | Equality |
| Variant without payload | None | Singleton variant |
| Variant with payload | Some(x), Ok(v) | Match + bind |
| Struct | Person { name, age } | Match + bind fields |
| Tuple | (a, b), (x, _, z) | Tuple of the same arity |
| Or-pattern | a | b | c | Any alternative |
5.2. Or-patterns with bindings
Each alternative can bind variables, provided all of them bind the same set of names with compatible types:
match op
Add(n) | Sub(n) | Mul(n) -> n // n is Int in all
5.3. Guards
match n
x if x > 0 -> "positive"
x if x < 0 -> "negative"
_ -> "zero"
5.4. Exhaustiveness
The checker requires full coverage:
- Sum types: every variant, or a catch-all
_ Bool: bothtrueandfalse, or a catch-all- Or-patterns count each alternative toward the count
type Color = Red | Green | Blue
match c
Red -> "r"
Green -> "g"
// error: missing variant Blue
5.5. Type-parameter substitution
match m.get(k) where m: Map<String, Int> infers Some(n) with n: Int, not n: T. The owner's type parameters are substituted by the scrutinee's type arguments.
6. Capabilities
6.1. What they are
Capabilities are primitive types representing access to system resources (Stdio, Fs, Env, Net, Clock, Random, Unsafe). They are only accessible via function parameters; there are no global instances.
6.2. The capability discipline (3 layers)
Structural (v1): capabilities cannot appear in struct fields, variant payloads, function return types, constants, let/var bindings, generic args, or tuples. They only flow through parameters. One relaxation: cap-bearing structs that implement a user-defined capability may hold built-in caps as fields.
Flow (v2):
- No aliasing: the same capability cannot occupy two argument slots in a single call
- Mandatory use: capability parameters must be used (or prefixed with
_to silence the warning)
Linearity (v3): the consume keyword indicates ownership transfer:
fun close(consume f: File)
// f cannot be used after this call
"Consumed" variables are tracked across fork/merge in if/elif/else and match. In loops, the analysis uses dry-run + redo to discover consumes in the first iteration.
6.3. Capability in the signature
fun main(stdio: Stdio, fs: Fs) // multiple
fun pure(x: Int) -> Int // no capabilities (pure)
fun with_consume(consume cap: MyCap) // ownership transfer
6.4. Attenuation
Every built-in capability has an attenuator that returns a fresh, narrower instance:
| Capability | Attenuator | Semantics |
|---|---|---|
Net | restrict_to(host: String) | Allowed host set, monotonic intersection |
Fs | restrict_to(prefix: String) | Allowed path prefix, monotonic |
Env | restrict_to_keys(keys: List<String>) | Allowed key set, monotonic intersection |
Clock | restrict_to_after(t: Float) | Active only after timestamp |
Random | with_seed(seed: Int) | Deterministic sequence (no denied state) |
Attenuated capabilities are also recorded in the --manifest output via the args_flow field. See the manifest page.
7. Information-flow control
7.1. Security labels
Capa tracks a two-point security lattice over values: @public (the default) sits below @secret. A label attaches to a type expression, so it can appear anywhere a type is written: parameters, let/var bindings, return types, and struct fields.
fun handle(token: @secret String) // labelled parameter
let xs: @secret List<Int> = collect() // labelled binding
type Card { pan: @secret String, brand: String } // labelled field
An unlabelled type is @public. The label is part of the flow analysis, not the runtime representation: a @secret String is an ordinary String at run time.
7.2. Propagation
A value's label is the join of every label flowing into it: if any input is @secret, the result is @secret. The join propagates through:
- Binary and unary operators
- String interpolation:
"...${secret}..."is@secret - Field reads, which inherit the receiver's label
- Indexing
- The
?operator
A function call returns the join of its argument labels: a pure function of a tainted input is itself tainted.
7.3. Sources and sinks
One source is secret by default: env.get(...) produces a @secret result with no annotation, because environment variables routinely carry credentials. fs.read(...) is intentionally not a source: config and data files are usually public, so annotate the binding @secret yourself when it holds a secret.
The public sinks are the exfiltration points where a @secret value reaching a sink-position argument is an information-flow violation:
| Capability | Sink methods |
|---|---|
Stdio | print, println, eprintln |
Net | get, post |
Fs | write |
Db | exec, query |
7.4. Enforcement (warn-then-enforce)
By default a violation is a compile-time warning: existing, unlabelled code is unaffected, while labelled code surfaces disclosures without breaking the build. The @strict_ifc() function attribute promotes those warnings into hard errors for that function.
The leak below is caught: the value bound by the match arm inherits the secret label of the scrutinee, then reaches a Stdio sink.
@strict_ifc()
fun dump(stdio: Stdio, env: Env)
match env.get("API_KEY") // env.get is @secret
Some(key) -> stdio.println(key) // error: @secret reaches a sink
None -> stdio.println("no key")
7.5. Declassification
declassify(value, reason: "...") is the single auditable bridge from @secret to @public. It is the identity at run time and relabels its result @public. The reason must be a named string literal, so the SBOM can record it. Declassifying a value that is not @secret is reported as a no-op warning.
@strict_ifc()
fun dump(stdio: Stdio, env: Env)
match env.get("API_KEY")
Some(key) -> stdio.println(declassify(mask(key), reason: "show masked key in logs"))
None -> stdio.println("no key")
Each call is recorded in the SBOM: per-function declassifications and a declassification_sites count in the summary.
7.6. Pattern binding
A match or let destructure of a secret scrutinee taints the names it binds. After match env.get("K") { Some(key) -> ... }, key is @secret.
7.7. Anti-laundering
Labels cannot be shed by repackaging a value:
- Aggregate literals (struct, list, tuple) carry the join of their element labels
- A
for-loop variable inherits the iterable's label - A secret
push/add/setinto a mutableList/Set/Maptaints the container
7.8. Implicit flow
A sink inside a branch (if/match) guarded by a @secret condition is an implicit flow. It is reported only under @strict_ifc: the default tier focuses on explicit data flows.
7.9. Boundaries
Two limits are by design:
- The analysis is intra-procedural: a secret crossing a function boundary requires an explicit
@secretparameter (the explicit-flow model) - Granularity is whole-aggregate: per-field precision is future work
8. Imports
8.1. Import forms
import util // sibling: ./util.capa
import sinks.csv_sink // nested: ./sinks/csv_sink.capa
import capa_log.log // package dep: <vendor or path>/capa_log/log.capa
import util as U // alias the module name
import util (greet as hi, Color) // selective import with optional rename
After import util, every pub name from util.capa is reachable unqualified (greet(...)) or by qualified call (util.greet(...)). With import util as U, qualified calls take the alias.
8.1.1. Selective import (and renaming)
import foo (a, b as c) brings only the listed pub symbols into scope: a under its own name, b under the alias c. Every other pub item of foo stays hidden. This is the hygienic form, and the way to resolve a symbol collision between two dependencies that export the same pub name:
import capa_csv (parse as csv_parse)
import capa_cli (parse as cli_parse)
fun main(stdio: Stdio)
stdio.println(csv_parse("a,b"))
stdio.println(cli_parse("--flag"))
Only one side needs a rename; the other may keep the bare name. Selectors work for functions, types, consts, and capabilities; selecting an unrenamed pub sum type carries its variants along. A selector that names a symbol the target does not declare, or declares without pub, is a load-time error (module 'foo' has no public symbol 'X'). Renaming a sum type via as in a selective import is rejected (its variants would be orphaned); import it without as to bring its constructors. Selective import is strictly additive: import foo and import foo as bar are unchanged.
8.2. Visibility
Top-level items are private by default. Mark a function, constant, type, trait, or capability with pub to expose it to importers; anything without pub stays callable only from inside the same file.
// util.capa
fun helper(x: Int) -> Int // private to util
return x + 1
pub fun outer(x: Int) -> Int // visible to importers
return helper(x)
// main.capa
import util
fun main(stdio: Stdio)
stdio.println("${outer(3)}") // works: 4
stdio.println("${helper(3)}") // error: undefined name 'helper'
The same rule applies to qualified access: util.outer(...) works because outer is public; util.helper(...) does not. pub on root-file items is accepted but has no effect (root callers see one another regardless).
8.3. Module search paths
The loader resolves import x.y in this order: importing-file directory, CAPA_PATH entries, ./vendor/ (when capa.toml declares a git dep), the parent of every path = "..." entry in capa.toml, ./libraries/, and finally the directory of the root file. See docs/packages.md for the package manager's role.
To pull in modules that live elsewhere on disk (stdlib-style libraries, shared internal modules), set CAPA_PATH to one or more additional roots separated by your platform's path separator (; on Windows, : elsewhere). The importer-relative path always wins when the same module name exists in both places, so the env var never silently shadows a project-local file.
$ export CAPA_PATH=/usr/local/share/capa:./libs
$ capa --run app.capa
# 'import greeter' first tries ./greeter.capa, then
# ./libs/greeter.capa, then /usr/local/share/capa/greeter.capa.
If no candidate exists, the diagnostic lists every path that was tried so the right next step (install the dependency, fix the import, adjust CAPA_PATH) is obvious.
8.4. Python interop
For Python interop, use the typed builtins py_import(unsafe, name) and py_invoke(unsafe, callable, args); both require the Unsafe capability. See the standard library page.
9. The main program
The entry point is a function called main that may take one or more capabilities as parameters. The capabilities are instantiated by the runtime at boot:
fun main(stdio: Stdio, fs: Fs, env: Env)
let argv = env.args()
stdio.println("received ${argv.length()} arguments")
If main returns Result<(), E>, an Err causes a non-zero exit code.
10. Attributes
Functions can carry static, source-level metadata via attribute syntax. The analyzer rejects unknown names, unknown keys, and duplicates; the schema is fixed. v1 recognises four attributes:
| Attribute | Keys | Role |
|---|---|---|
@security | cve, cwe, severity, fixed_in, description | Link a function to a known security history. |
@deprecated | reason, since, use, removed_in | Mark an API as superseded. |
@audited | date, by, scope, notes | Record a manual security audit. |
@vex | cve, status, justification, detail | Per-function CycloneDX VEX exploitability claim. Embeds in --cyclonedx output and surfaces in --vex. |
@vex status accepts the CycloneDX VEX vocabulary (not_affected, exploitable, in_triage, resolved, false_positive); justification accepts the CycloneDX justification vocabulary (code_not_reachable, requires_configuration, and so on). See the manifest page for full output examples.
11. Compiler CLI
Flag-order reference. All flags take one or more .capa source files. Output goes to stdout unless noted.
| Flag | Output |
|---|---|
repl | Start the Capa REPL with every standard capability pre-bound (subcommand, not a flag). |
test | Discover and run tests/test_*.capa (subcommand). --wasm runs them on the Wasm backend; --both runs both backends and diffs their stdout for cross-backend parity. A test passes on exit 0; a panic fails it. |
--run | Transpile to Python and execute in-process. |
--transpile | Transpile to Python and print the generated code to stdout. |
--watch | Re-run the program every time it (or any of its imported modules) changes on disk. Implies --run. |
--wasm | Compile through CIR to WebAssembly text (WAT). With --run, assembles and executes on a wasmtime-backed host that provides the Capa capability interfaces. |
--wasm --component | With --output, wrap the core module in a Component Model component (WIT embedded). Consumable by any Component-Model-aware runtime. |
--prefer-wasm | With --run: try the Wasm backend first and fall back to the Python pipeline only when CIR or Wasm emission rejects a construct. Also honoured via CAPA_PREFER_WASM=1. |
--doc | Self-contained HTML documentation: signatures, doc comments, attributes, per-function call lists. |
--manifest | Capa-native JSON manifest: per-function declared capabilities, attributes, Unsafe crossings, call sites, args_flow. |
--cyclonedx | CycloneDX 1.5 SBOM (JSON). Capability metadata embedded as properties[] under the capa:* namespace. When any function carries @vex, the VEX block is embedded under vulnerabilities[]. |
--spdx | SPDX 2.3 SBOM (JSON). Per-function capability metadata exposed via SPDX annotations[]. |
--vex | CycloneDX VEX (JSON). One vulnerabilities[] entry per @vex attribute, with affects[] pointing at the function's bom-ref. |
--provenance | SLSA Build L1 provenance attestation: in-toto Statement v1 envelope wrapping a SLSA Provenance v1.0 predicate, subject = SHA-256 of the source. |
Full output examples and per-flag schemas on the manifest page. The examples/sbom_diff.capa and examples/vex_demo.capa auditor programs demonstrate consumption.
12. Differences from Python
Capa transpiles to Python 3.10+, but the semantics differ:
| Capa | Python |
|---|---|
| Capabilities required for I/O | Globals such as print, open |
| Types checked at compile time | Duck typing |
Exhaustive match checked | match at runtime, no exhaustiveness |
| Or-patterns with consistent bindings | Or-patterns without bindings |
let x: List<Int> = [] valid | Python equivalent has no checks |
Mutation only with var or consume | Everything mutable |
| Manifest, SBOM, doc emitted by compiler | Manual via external tools |
13. Known limitations
- String literals do not support multi-line (use
\nfor line breaks) - Nested string literals inside interpolation (
"x ${"inner"} y") are not supported - Errors inside interpolation report positions starting from the file start
- No asynchronous I/O operations
if/matchin block-body lambdas need=>before the indented block- Multi-line
matchinside parentheses requires inline{ }form
For the full roadmap, see the roadmap page and the TODO.md.