This critical role would not be possible without funding from the Alpha-Omega project.
TLDR: Skip intro, take me to the visualization!
I'm working on improving measurability of Python packages by allowing Software Bill-of-Materials documents (SBOM) to be included in Python packages so that projects and build tools can record information about a package for downstream use.
This is a cross-functional project where I need input from Python projects, Python packaging tools (build backends+tools and installers), but also from folks completely outside the Python community like SBOM tooling maintainers. With projects like this, it can be difficult to "see the forest through the trees". When you're reviewing the packaging PEP, it can be difficult to imagine how or who is using the new standard. This article is to help visualize the end-to-end data flow.
In short, the proposal is:
pyproject.toml with [project].sbom-files = ["..."]Sbom-File field.There are two Python packages being shown, Package A on the left and Package B on the right.
Package A depends on Package B. Package A is a pure-Python package with no bundled dependencies.
Package B uses binary extensions and uses auditwheel to bundle shared libraries.
Stage 1: If the Python project bundles third-party software in their own source code then the project may specify one or more SBOM documents through project.sbom-files in pyproject.toml. Build backends copy these documents into source distributions and wheels.
Stage 2: If the Python build-backend pulls dependencies (like Maturin and Cargo) while building a wheel those dependencies can be recorded in another SBOM document in the wheel.
Stage 3: If a tool that modifies wheels by adding dependencies is used (like auditwheel) then that tool can record modifications in an SBOM document. At this point there are three separate SBOM documents included in the Package B archive.
Stage 4: Archives are uploaded to an index like PyPI. The index can do some validation of included SBOM documents, if any.
Stage 5: Installers download and install the Python package archives. The SBOM files are placed into the .dist-info/sboms/ directory in the Python environment and referenced in package metadata.
Stage 6: SBOM generation tools scan the Python environment and using existing Python package metadata and new SBOM documents with per-package data stitch together an Operational SBOM (OBOM) detailing the Python environment.
The plan is to allow each "actor" in the system adding SBOM data to a Python package to create their own SBOM document inside the Python package.
This means they can choose any SBOM standard (although we'll recommend sticking to a well-known one like CycloneDX and SPDX) and that intermediate tools won't need to "merge" SBOM data together. Avoiding this merging is extremely important, because cross-standard SBOM data merges are a very hard problem. This problem is deferred to SBOM generation tools which already need to support multiple SBOM standards.
pyproject.toml.
Keeping this up-to-date is a non-zero amount of work, but I am hoping that by providing this PEP it will enable these types of contributions.
I'm also hoping to provide a lightweight pre-commit hook to help keeping these SBOM documents up-to-date, similar to what CPython already uses.My hope is that the most difficult part of this work (manually annotating a package if automatic tools can't) will enable a new type of contribution from users of Python packages to provide SBOM data. Previously there was no standardized method to have SBOM data propagate through Python packages, thus discouraged this type of contribution.
If you're interested in having your use-case covered or you have concerns about the approach, please open a GitHub issue on the project tracker.
That's all for this post! 👋 If you're interested in more you can read the last report.
Wow, you made it to the end!
- Share your thoughts with me on Mastodon, email, or Bluesky.
- Browse this blog’s archive of 170 entries.
- Check out this list of cool stuff I found on the internet.
- Follow this blog on RSS or the email newsletter.
- Go outside (best option)