Learn · Chapter 14 of 14

From source to SBOM

Everything the previous chapters taught, capabilities in signatures, attenuation, @secret and declassify, exists so the compiler can answer audit questions mechanically. This chapter takes one small program and pulls every audit artefact out of it: the capability manifest, a CycloneDX SBOM, an SPDX SBOM, a VEX document, and a SLSA provenance attestation. All five come from the compiler itself, from the same source the type checker reads.

The program

Create a working directory with a log file to analyse:

$ mkdir report && cd report && mkdir logs
$ printf 'INFO boot\nERROR disk full\nINFO retry\nERROR disk full\nINFO ok\n' > logs/app.log

(PowerShell: "INFO boot`nERROR disk full`nINFO retry`nERROR disk full`nINFO ok`n" | Set-Content -NoNewline logs/app.log.)

Then save this as report.capa. It deliberately exercises the distinctive machinery: a pure helper, a capability-holding function, an attenuated Fs, a secret from Env routed through declassify, and one @vex claim:

// report.capa: read a log file, count error lines, print a summary.

fun count_errors(lines: List<String>) -> Int
    var n = 0
    for line in lines
        if line.contains("ERROR")
            n = n + 1
    return n

@vex(
    cve: "CVE-2021-44228",
    status: "not_affected",
    justification: "code_not_reachable",
    detail: "summarise declares no Net capability; a network-side exploit chain cannot be reached from this function. Statically enforced."
)
fun summarise(stdio: Stdio, fs: Fs, env: Env, path: String) -> Result<Unit, IoError>
    let body = fs.read(path)?
    let lines = body.split("\n")
    let errors = count_errors(lines)
    let token = env.get("REPORT_TOKEN")    // @secret by default
    match token
        Some(t) ->
            let tail = t.substring(t.length() - 4, t.length())
            let masked = declassify(
                "token ending ${tail}",
                reason: "audit trail: last 4 chars identify the reporting token"
            )
            stdio.println("report by ${masked}")
        None -> stdio.println("report (unauthenticated)")
    stdio.println("${errors} error lines in ${lines.length()} total")
    return Ok(())

fun main(stdio: Stdio, fs: Fs, env: Env)
    let logs_fs = fs.restrict_to("logs/")
    match summarise(stdio, logs_fs, env, "logs/app.log")
        Ok(_) -> ()
        Err(e) -> stdio.eprintln("failed: ${e}")

Check that it runs:

$ REPORT_TOKEN=tk_9f3a77c2 capa --run report.capa
report by token ending 77c2
2 error lines in 6 total

(PowerShell: $env:REPORT_TOKEN = "tk_9f3a77c2"; capa --run report.capa.)

A working program. Now stop being its developer and become its auditor.

Artefact 1: the capability manifest

capa --manifest emits Capa-native JSON describing every function's authority. Here is the entry for summarise, shortened with ellipses but otherwise verbatim:

$ capa --manifest report.capa
{
  "capa_version": "1.2.0",
  "schema_version": 1,
  "filename": "report.capa",
  "functions": [
    ...
    {
      "name": "summarise",
      "pos": "report.capa:10:1",
      "declared_capabilities": ["Stdio", "Fs", "Env"],
      "transitively_reachable_capabilities": ["Env", "Fs", "Stdio"],
      "provably_excluded_capabilities": [
        "Clock", "Db", "Net", "Proc", "Random", "Unsafe"
      ],
      "has_unsafe": false,
      ...
      "declassifications": [
        {
          "reason": "audit trail: last 4 chars identify the reporting token",
          "value": "\"token ending ${...}\"",
          "pos": "24:26"
        }
      ]
    },
    ...
  ],
  "summary": {
    "total_functions": 3,
    "functions_with_capabilities": 2,
    "functions_with_attributes": 1,
    "functions_crossing_unsafe": 0,
    "declassification_sites": 1,
    "protocol_states": 0
  }
}

Three fields carry the audit weight. declared_capabilities is the signature's claim. provably_excluded_capabilities is the type system's counter-claim: summarise can never touch Net or Proc, because those names are not in scope and the analyzer rejected every path to them. And declassifications lists every sanctioned secret disclosure with its stated reason, the record chapter 13 promised.

The manifest also captures attenuation. Inside main's entry, the call to summarise records that the Fs argument was narrowed before being handed down:

"args_flow": [
  null,
  {
    "name": "logs_fs",
    "attenuations": [
      { "method": "restrict_to", "args": ["\"logs/\""] }
    ]
  },
  null,
  null
]

An auditor reads this as: the filesystem authority that reaches summarise is not the program's full Fs, it is Fs restricted to logs/, and the compiler extracted that fact from the source, not from a questionnaire.

Artefact 2: the CycloneDX SBOM

capa --cyclonedx wraps the same information in CycloneDX 1.5, the SBOM format most compliance tooling ingests. Each function becomes a component, and the capability data rides in standard properties[] under a capa: namespace:

$ capa --cyclonedx report.capa
{
  "bomFormat": "CycloneDX",
  "specVersion": "1.5",
  ...
  "components": [
    ...
    {
      "bom-ref": "capa:fn:report.capa:summarise",
      "type": "library",
      "name": "summarise",
      "properties": [
        { "name": "capa:kind", "value": "function" },
        { "name": "capa:pos", "value": "report.capa:10:1" },
        { "name": "capa:declared_capability", "value": "Stdio" },
        { "name": "capa:declared_capability", "value": "Fs" },
        { "name": "capa:declared_capability", "value": "Env" },
        { "name": "capa:provably_excluded_capability", "value": "Net" },
        ...
        { "name": "capa:attribute:vex:cve", "value": "CVE-2021-44228" },
        { "name": "capa:attribute:vex:status", "value": "not_affected" },
        ...
      ]
    }
  ]
}

The point of the wrapper: any tool that already understands CycloneDX (dependency-track dashboards, policy engines, the sbom-watch program from the home page) can consume this without knowing anything about Capa. The capability claims are just properties on components. A scanner-produced SBOM tells you which packages are present; this one also tells you, per function, what each piece is allowed to do.

Artefact 3: the SPDX companion

capa --spdx emits the same content in SPDX 2.3, the Linux Foundation's format. Pick whichever your downstream consumer standardises on; the information is identical. Functions become packages related to the program by CONTAINS, capabilities by DEPENDS_ON, and the per-function metadata rides in annotations[]:

$ capa --spdx report.capa
{
  "spdxVersion": "SPDX-2.3",
  "name": "report.capa",
  ...
  "packages": [ "report.capa", "Env", "Fs", "Stdio",
                "count_errors", "summarise", "main" ],
  "relationships": [
    { "spdxElementId": "SPDXRef-Package-report.capa",
      "relationshipType": "DEPENDS_ON",
      "relatedSpdxElement": "SPDXRef-Builtin-report.capa-Fs" },
    { "spdxElementId": "SPDXRef-Package-report.capa",
      "relationshipType": "CONTAINS",
      "relatedSpdxElement": "SPDXRef-Fn-report.capa-summarise" },
    ...
  ]
}

(The packages array is abbreviated to names here; each entry is a full SPDX package object with annotations[] carrying the capa:* key-value pairs.)

Artefact 4: VEX, the exploitability claim

An SBOM says what is in the box. VEX (Vulnerability Exploitability eXchange) says how the box is affected by a known CVE. The @vex attribute you wrote on summarise becomes a CycloneDX VEX document:

$ capa --vex report.capa
{
  "bomFormat": "CycloneDX",
  "specVersion": "1.5",
  ...
  "vulnerabilities": [
    {
      "bom-ref": "capa:vex:report.capa:summarise:CVE-2021-44228",
      "id": "CVE-2021-44228",
      "source": { "name": "NVD",
                  "url": "https://nvd.nist.gov/vuln/detail/CVE-2021-44228" },
      "analysis": {
        "state": "not_affected",
        "justification": "code_not_reachable",
        "detail": "summarise declares no Net capability; a network-side
                   exploit chain cannot be reached from this function.
                   Statically enforced."
      },
      "affects": [ { "ref": "capa:fn:report.capa:summarise" } ]
    }
  ]
}

(The detail string is wrapped across three lines here for readability; the compiler emits it as a single line.)

Standard VEX is per-package: "our product bundles a vulnerable library, but we are not affected". Capa refines the claim to per-function, and gives it a machine-verifiable basis: the manifest's provably_excluded_capabilities for summarise includes Net, so the "code not reachable" justification is grounded in a type-system fact. If a later edit added net: Net to the signature, the claim's basis would visibly disappear from the next manifest diff.

Artefact 5: provenance

The last artefact answers "where did this come from?". capa --provenance emits a SLSA Build L1 attestation, an in-toto Statement v1 binding the artefact to the SHA-256 of its source:

$ capa --provenance report.capa
{
  "_type": "https://in-toto.io/Statement/v1",
  "subject": [
    {
      "name": "report.capa",
      "digest": { "sha256": "f0878f4a6bdb56dbe9657a77c21b491a..." }
    }
  ],
  "predicateType": "https://slsa.dev/provenance/v1",
  "predicate": {
    "buildDefinition": {
      "buildType": "https://capa-language.com/build/transpile-to-python/v1",
      "externalParameters": { "source": "report.capa" },
      "internalParameters": { "capaVersion": "1.2.0",
                              "target": "python>=3.10" },
      ...
    },
    ...
  }
}

(The digest, externalParameters, and internalParameters objects are compacted here for readability; the compiler emits each across multiple lines.)

Anyone holding the source can recompute the digest and confirm this attestation describes exactly this file, no more, no less. L1 is the unsigned tier of the SLSA ladder; signing and a hardened builder are deliberately left to CI infrastructure, where they belong.

What an auditor gets, in one table

ArtefactThe question it answers
ManifestWhich functions hold which authorities, what is provably out of reach, where secrets are disclosed and why
CycloneDX SBOMThe same, in the format compliance pipelines already ingest
SPDX SBOMThe same, for SPDX-standardised consumers
VEXPer-function exploitability of known CVEs, grounded in excluded capabilities
ProvenanceWhich exact source produced this artefact, by digest

None of these required a scanner, an agent, or a form. The discipline you learned in chapters 8 through 13 put the information in the type system; the five flags serialise it.

Where you go next

That is the whole tour, source to evidence. To see the same artefacts generated for a real codebase, read capa_paymentguard: its conformity/ directory ships the manifest, both SBOMs, the VEX, the provenance, and a CONFORMITY.md that walks an auditor through them. To start a project with that layout from day one, capa_cra_template is the scaffold. And the clause-by-clause mapping of these artefacts onto CRA, NIS2, DORA, NIST SSDF, and OWASP SCVS lives on the regulatory page.

If you build something with Capa, send it. Real use is the best feedback.