Syft doesn’t just find vulnerabilities; it actually builds a catalog of everything inside your artifact, and then uses that catalog to cross-reference against known vulnerabilities.
Let’s see it in action. Imagine you’ve just built a Docker image for a simple Go application.
FROM golang:1.20-alpine AS builder
WORKDIR /app
COPY . .
RUN go build -o myapp
FROM alpine:latest
COPY --from=builder /app/myapp /myapp
EXPOSE 8080
CMD ["/myapp"]
You build this image: docker build -t myapp:latest .
Now, you want to scan it for vulnerabilities. Syft does this by first generating a detailed inventory of all the software and files within the image. This isn’t just a list of packages; it’s a deep dive into the filesystem.
Here’s how you’d run the scan:
syft myapp:latest -o json=syft-report.json
This command tells Syft to analyze the myapp:latest Docker image and output the results in JSON format to a file named syft-report.json.
Once Syft has done its work, syft-report.json will contain a comprehensive list of all detected software, their versions, and crucially, any known vulnerabilities associated with them. The JSON output will look something like this (simplified for clarity):
{
"version": "v0.80.0",
"artifacts": [
{
"name": "alpine-baselayout",
"version": "3.4-r0",
"type": "apk",
"id": "aHR0cHM6Ly9naXN0LmdpdGh1Yi5jb20vYXBwcy9hbHBpbmUtYmFzZWxvdXQ=",
"locations": [
{
"path": "/lib/apk/db/installed"
}
],
"ports": [],
"licenses": null,
"digests": {
"sha256": "a1b2c3d4e5f6..."
},
"language": null,
"frameworks": [],
"build-args": null,
"runtime-deps": null,
"vulnerabilities": []
},
{
"name": "myapp",
"version": "0.0.0",
"type": "binary",
"id": "go:myapp:0.0.0",
"locations": [
{
"path": "/myapp"
}
],
"ports": [],
"licenses": null,
"digests": {
"sha256": "f7e8d9c0b1a2..."
},
"language": "go",
"frameworks": [],
"build-args": null,
"runtime-deps": null,
"vulnerabilities": [
{
"id": "CVE-2023-XXXX",
"source": "nvd",
"description": "A buffer overflow vulnerability...",
"severity": "HIGH",
"urls": [
"https://nvd.nist.gov/vuln/detail/CVE-2023-XXXX"
],
"related-vulnerabilities": [],
"fixed-in": "1.2.3"
}
]
}
// ... other artifacts
],
"catalogers": [
// ... details about catalogers used
],
"source": {
"type": "image",
"target": "myapp:latest"
}
}
The core problem Syft solves is the lack of visibility into the software supply chain. Before tools like Syft, knowing precisely what software was running in your containers, and whether it was vulnerable, was a manual, error-prone process. Syft automates this by creating a Software Bill of Materials (SBOM) for your artifacts.
Internally, Syft uses a cataloging system. It has specific "catalogers" for different package managers (like apk for Alpine, dpkg for Debian/Ubuntu, rpm for CentOS/Fedora) and for detecting language-specific packages (like Go modules, npm packages, Python wheels). It also has catalogers for generic binaries, libraries, and even configuration files. When you run syft myapp:latest, it iterates through these catalogers, applies them to the image’s filesystem, and builds up that comprehensive list of artifacts.
Once the artifact list is complete, Syft queries vulnerability databases (like the National Vulnerability Database - NVD, or GitHub Security Advisories) using the identified software names and versions. It then correlates these findings, presenting them alongside the artifacts in its output.
The most surprising thing is how Syft handles "unknown" binaries or custom applications. While it excels at package managers, it also has a "binary" cataloger that can sometimes infer language and version information from executable files, or at least identify them as distinct components that might need further investigation. This means even custom-built applications within your images get inventoried, allowing you to track their dependencies and potential vulnerabilities, even if they aren’t managed by a standard package manager.
The next step after generating an SBOM is often integrating it into a CI/CD pipeline to automatically fail builds if critical vulnerabilities are detected.