-
Notifications
You must be signed in to change notification settings - Fork 206
HIP for OCI support #136
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HIP for OCI support #136
Conversation
Signed-off-by: jdolitsky <393494+jdolitsky@users.noreply.github.com>
bacongobbler
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
One comment from the dev call: Will we support download by digest? (i.e. This doesn't fit into this proposal very well, but its an important topic. Perhaps we allow this on |
|
Could the "pull a digest" case work with I'm curious what would be the need for a |
|
Since we just had the discussion about provenance files, it seems that I wouldn't use it with
|
That would work. As @TBBle mentioned, I think better UX might be: The UX gets weird, as "@" is used when trying to get a manifest by SHA (vs ":"). Ultimately I think people want to copy-paste a full reference, whether by tag or digest.
Manifests can be uploaded by tag or digest. See https://github.com/opencontainers/distribution-spec/blob/6a6ee23bbab1e5d07b45a10547265bc5249c3c4d/spec.md#put-manifest So one could upload by digest, then download by digest. The version could be determined by unpacking the config blob (which is Chart.yaml JSONified). But this is essentially the same as supporting arbitrary tags... |
|
Access by digest is different from supporting arbitrary tags, because a given chart object can only appear at one digest (the digest of the blob), and one tag (the version contained in Chart.yaml), both determined by the content of the blob. Arbitrary tags allow putting charts in places that cannot be tied back to the content of the Chart. Whether you apply a tag or not, an OCI Registry is a content-addressable store, and so pulling by digest is naturally supported, as long as we support it in the tools. In Helm's case, we aren't a content-addressable store locally, so we need to determine the correct address (version) from the content we pull, whether it's by tag or by digest. Effectively, pushing by digest is pushing untagged, which makes sense for Docker images, but I don't think it's awesome for Helm. The only use-cases I can see is if you're either pushing multiple instances of a Chart with the same version and don't want the latest push to win (bad idea! SemVer has Side-note: From memory, the Docker syntax allows Another place where pulling by digest might be handy is if you have the version->digest mapping from elsewhere, it saves having to mangle the version to account for |
proposals/oci-support-updates.md
Outdated
|
|
||
| Support for this method of signature validation should be carried over into OCI storage. As the format of the OCI manifest is custom to Helm, Helm can also choose to modify the resulting manifest on upload when a chart is being signed (e.g. `helm push --sign`). | ||
|
|
||
| As far as the low-level details, the `.prov` file will simply be stored as a second layer on the manifest. If running `helm pull --verify oci://...`, the second layer (layer at index 1) will be assumed to be the provenance file. The first layer is always the chart `.tgz` itself. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if we supported pushing the provenance file as the manifest.config (with a different mediaType from the current)? That would simplify access to it, without needing to assume anything about the layers, and (unless I'm mistaken) it's a superset of the Chart.yaml data used as the config now. Although the config is current in JSON (I guess for historical reasons), while the current .prov is a pair of YAML documents with an (undemarcated) signature appended to the second.
This would let a Helm-aware registry more-easily validate and display the presence and state of a signature in a browse UI, for example.
Side-note: I haven't checked, but do we actually validate the config when the chart is pulled?
|
I just noticed that we use |
proposals/oci-support-updates.md
Outdated
| For example, given a chart with the name `pepper` and the version `1.2.3`, users may run a command such as the following: | ||
|
|
||
| ``` | ||
| $ helm push pepper/ oci://r.myreg.io/mycharts |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at helm/helm#7032, I think this is incorrect or needs more care. This is doing two things: helm package, and helm <upload>. We probably should split those so helm push pushes a package you've already built, otherwise helm push needs to accept all the parameters helm package accepts.
For the on-liner use case we could perhaps add something like helm package --push=<registry> which is similar to how modern OCI Image builders work, e.g., BuildKit (--output type=image,name=docker.io/username/image,push=true) and Kaniko (--destination). They don't also maintain a local image registry for execution, so their build commands are either filling a local cache, or actually end in an output to file or push to registry etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the flip side, helm install ./ is a convenience method for helm package and helm install. We just don't provide additional feature flags to manipulate helm package. Those wishing to perform more advanced package methods (i.e. --sign) can use helm package instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realise this is slightly off-topic, but I'm unaware of a helm push issue for discussion. Would it be reasonable to (later) add support to helm push for non-OCI uploads, e.g. given a http:// URL instead of an oci:// URL, it trivially performs a HTTP PUT? The use-case here is for products like Artifactory which auto-generate the Helm Repository metadata when a chart is uploaded, to save carrying an extra tool around, i.e. helm package && helm push would be valid for both OCI Registries and ("smart") Chart Repositories with only a difference in the output URL.
|
Hello, everyone! Nice to see Helm consider signing/verification. Unfortunately, the current "provenance" (a misleading term, considering that it actually does not consider the whole software supply chain) model seems to broken. This is because it looks like attackers who control the OCI registry can replace PGP/GPG signatures with any other signature they like, since there is no trusted information about which GPG public keys to trust. Securing software supply chains is not a trivial problem, and I strongly recommend that the Helm community looks at how CNAB—which can include Helm charts—solved this problem in a much more secure manner. |
|
Hi @trishankatdatadog. Thanks for your response. As an FYI, a few Helm core maintainers were part of the early discussions in CNAB and are quite aware of the spec - @radu-matei and I are co-workers, and we worked together on the initial draft for the CNAB repository specification, which was loosely based on the OCI distribution protcol. cnabio/cnab-spec@4515571 Good call-out on the security issue regarding PGP/GPG signatures being replaced on the server side. However, I do believe you may be mis-reading Helm's security model on chart provenance. To clarify, the current security model handles cases where the server has been compromised. Public keys distributed by the chart authors are distributed separately from the OCI/chart repository API workflow, and are added to the user's local keychain. Therefore, if an attacker were to replace the PGP signatures and the blobs in the OCI registry, they would not have access to the private key that was used to originally sign those packages, and therefore package verification would fail as the user's local keyring does not trust this new key. To be totally clear here, we've had one security audit performed by an external third party who was sponsored by the CNCF. You may read up on the audit here (have a look under subsection 2: signing and verification code), but the auditing team did not find any glaring loopholes in this process. We have another security audit ongoing at this very moment, and we've asked them to look over this part of the codebase as well. In any case, the plan is to port over the current provenance process, as it is a known, audited, working process for signing and verifying charts. We have discussed moving to an external key server like Notary v2 once that becomes available, which can be introduced at a later time. That might align closer to CNAB's goals with TUF and in-toto. Let us know what you think. |
I apologize if I seemed hostile, and also for claiming that the PGP model of Helm provenance is "broken." It is not: I read the security audit [1], and also verified that the code does indeed check signatures against the local keyring by default. While the PGP model has some security issues [2] that are better fixed with something like TUF and in-toto [3], I agree that this is a good start. Again, please accept my apologies. [1] By the same people who audited Notary-v1 BTW IIRC. |
|
@trishankatdatadog We understand issues with PGP and Notary v2 would be great to use. Notary v1 has some shortcomings. When Notary v2 is ready for use we will definitely evaluate it. On the PGP side, I would suggest people take the verification public key and pass that in using the |
Do you have a reference of which ones in particular got in the way? a usecase as simple as gpg signing seems like notary should have had no issues to provide. |
proposals/oci-support-updates.md
Outdated
| "config": { | ||
| "mediaType": "application/vnd.cncf.helm.config.v1+json", | ||
| "digest": "sha256:8ec7c0f2f6860037c19b54c3cfbab48d9b4b21b485a93d87b64690fdb68c2111", | ||
| "size": 117 | ||
| }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I missed this elsewhere, but what is the content of a application/vnd.cncf.helm.config.v1+json layer? A Chart.yaml serialized to json?
This should be documented so that other tools know what to do with this media type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's not a layer, that's the content type of this object. Oddly enough, it's mentioned as an example in the OCI Artifact Author's Guide although as application/vnd.cncf.helm.chart.config.v1+json rather than application/vnd.cncf.helm.config.v1+json.
Edit: I misunderstood. You want to know what the actual data in sha256:8ec7c0f2f6860037c19b54c3cfbab48d9b4b21b485a93d87b64690fdb68c2111 looks like. Sorry.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is indeed the serialised Chart.yaml.
And on that note, perhaps application/vnd.cncf.helm.config.v1+json is not the correct mediaType for an APIVersion 2 Chart.yaml? Should that be v2, to track the APIVersion? Or does that introduce complexity without benefit, given that any tool operating on this can just check the APIVersion of the resulting object during or after deserialisation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggested earlier that in the provenance case, perhaps we should have the provenance file as the config instead. And thinking about the tooling and ecosystem, why is this JSON, not YAML? Does JSONification present any concerns for round-tripping or hash-checking etc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The actual data is Chart.yaml as minified JSON. Will document. JSON chosen since it is 1) smaller and 2) more ubiquitous to parse.
Most implementations can ignore the contents of this blob, its for registry providers to parse and extract metadata about the chart.
@TBBle v1 is correct I believe, since this is version 1 of Helm's OCI integration. Honestly not sure best approach, or what to consider if APIVersion reaches v3..
I imagine most people will not use provenance (unfortunately), so I do not think it makes sense in config.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey folks,
A few notes:
And on that note, perhaps
application/vnd.cncf.helm.config.v1+jsonis not the correctmediaTypefor an APIVersion 2 Chart.yaml? Should that bev2, to track the APIVersion? Or does that introduce complexity without benefit, given that any tool operating on this can just check the APIVersion of the resulting object during or after deserialisation.
Using v1 or v3 as it aligns with Helm 3 is a good discussion. Having lived through .NET versioning issues, keeping version numbers aligned, as much as possible = simplicity. Functionally, it's a detail that just requires someone to grok. It's there to enable client tools to know the difference if serialization changes. Such as serializing as .json or .yaml, or some other additions over time.
That's not a layer, that's the content type of this object. Oddly enough, it's mentioned as an example in the OCI Artifact Author's Guide although as
application/vnd.cncf.helm.chart.config.v1+jsonrather thanapplication/vnd.cncf.helm.config.v1+json.
The OCI artifact doc was written before the Helm work was complete. I'll update to whatever the Helm team decides, including the versioning question above.
As for the config object vs. a layer:
manifest.config: More details here, but the config object is used to pivot the artifact type. Like a file extension. The value of the config object is optional. More detail here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@SteveLasker The chart YAML format also has an APIVersion, it was v1 for Helm v2, and v2 for Helm v3. So that's another version numbering system in the mix. Given that Helm v3 can both generate and process both APIVersion v1 and APIVersion v2 Charts, I would lean more towards tracking the chart APIVersion here than the Helm release version.
I would personally lean towards the OCI mediaType version tracking the APIVersion of the contained Chart. After all, there's no reason I shouldn't be able to upload an APIVersion v1 chart to OCI, and a hosting repository indexer that only knows APIVersion v2 should be able to know it it can't index it.
proposals/oci-support-updates.md
Outdated
| }, | ||
| "layers": [ | ||
| { | ||
| "mediaType": "application/tar+gzip", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As per https://github.com/opencontainers/artifacts/blob/master/artifact-authors.md#defining-layermediatypes, media types that are not vendor|org|entity specific should be registered with IANA.
I don't see application/tar+gzip in this list; is there an existing generic media type that would be appropriate to use here, or should we come up with a new non-generic type? https://www.iana.org/assignments/media-types/media-types.xhtml
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The +gzip suffix is registered, but there doesn't appear to be a registered media type for tar.
Does it make sense to use a Helm vendor-defined type for the layer, since that means we can assume more about the archive than 'tarball', e.g. presence of Chart.yaml.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm. This is an unfortunate mistake. cc @SteveLasker
I'm also reading this: https://superuser.com/a/960710
I'm not sure I agree it should be a custom type, but this may be what we have to do if tar is really not an official type..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For they layer types, this was another active conversation. Should each type define their own layer.mediaType. When it's as simple as a .tar file, there's not a ton of benefit for defining something unique. As noted, application/tar, nor application/tar+gzip have yet to be registered with iana.org. It was a surprise to all of us.
The reality is the layer is less of a concern. Artifact types must be unique at the manifest.config and hopefully soon index.config layer as tooling must know if it can operate on the artifact. As a security scanner iterates through the artifacts in a registry, if and how will they scan the artifact?
The layer is a secondary object that's determined by the specific artifact tool.
If someone wants to register application/tar with iana.org, it would be much appreciated, and you'll get your name in history - a reason to do, or not to do it...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @jdolitsky!
As #140 has now been approved and accepted, would you mind following the process described in HIP 1 and rewrite this as a "feature" or an "informational" HIP?
Let me know if you have any questions about the process. I'd be happy to answer them.
Thank you!
|
@bacongobbler yes, can do! |
|
FYI - this proposal has been cleaned up and converted to the new HIP format. |
|
The first PR addressing this proposal is here: helm/helm#8843. It implements |
bacongobbler
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Josh. This HIP is heading in the right direction and looks very close to a final draft. I made a few suggestions on how to improve this document to provide more information to users.
This has been approved as HIP 0006. Please re-name the HIP accordingly. Thanks!
|
|
||
| ### 3. Chart versions == OCI reference tags | ||
|
|
||
| To keep things simple, the version of a chart will be 1-to-1 with the tag used on registry references. Arbitrary tags will not be supported. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be a problem, some projects provides in the same repository an image and a chart that share the same namespace and version (and registry).
With a chart tag that doen't match the chart version this allow us to host the chart as ${myserver}/${myproject}:${myversion}-chart.
Other options would be to host on a different project name or registry, which is sad as I like to see the chart going along the image.
Another solution could be to have a version number that's different of the image, which is also not good to me.
(Maybe it's accidental, but in my case, but push the chart with the same reference as an image removed my image from the registry [ACR]).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we will get there, but I'm not sure its necessary for the first iteration. There is technically nothing stopping from adding this as a later feature
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Think --base-namespace "myproject" --tag "1.2.3-chart"
Signed-off-by: jdolitsky <393494+jdolitsky@users.noreply.github.com>
Signed-off-by: jdolitsky <393494+jdolitsky@users.noreply.github.com>
Signed-off-by: jdolitsky <393494+jdolitsky@users.noreply.github.com>
Signed-off-by: jdolitsky <393494+jdolitsky@users.noreply.github.com>
|
@bacongobbler - thanks for detailed review. I've just made a bunch of updates to address your comments, and also modified to HIP 0006. Please let me know if any other changes necessary. |
|
|
||
| ### 3. Chart versions == OCI reference tags | ||
|
|
||
| To keep things simple, the version of a chart will be 1-to-1 with the tag used on registry references. Arbitrary tags will not be supported. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to clarify that 1:1 here is "a 1:1 mapping exists between tag and version", not "the tag is string-identical to the version"? Chart versions are SemVer, which allows the + character, but OCI registries do not allow + in their tags, see distribution/distribution#1201 for history. So we will probably need to map + in the version into some other character in the OCI tag. Probably _, which is allowed in the OCI tag, but not SemVer.
I'm sure this has been discussed before, but I can't remember where. It might have been on one of the issues or PRs relating to this work.
|
I don't see it mentioned in this PR so either it's obvious to those involved, confusion on my part, or it's missing: how will helm search an OCI registry for charts, if at all? Based on what I've read I assume it won't and that the plan to rely on an external hub? (artifact hub and the like). I saw that the |
|
Hi @absoludity,
I've been trying to catch up on the latest Helm OCI support, but this happened to catch my eye, so figured I'd drop a quick note. Steve |
|
Thanks Steve, that certainly fills in some gaps on what's planned (or at least, hoped for) for the future with regards to search support in tho OCI distribution spec (eg. the examples included at tool-specific searches). But support for such searches in the OCI distribution spec appear to be further away even than a stable OCI support in helm. That helps clarify my question a little: Will the OCI support in helm only be considered stable when such searches are possible with an OCI registry, or will it be considered stable when you can install OCI-based charts from the artifact hub? (I realise you're not in a position to answer that, but I do think it's relevant for this PR) |
|
I don't think we can hold back this feature for future OCI Distribution features, so we should be able to get it stabilised when Helm can pull and push charts by name, and (I guess) once tag-listing (where a Registry supports it) is used for fetching all known versions of a chart (single repository), so SemVer matching works. That will be enough for charts registered on Helm Hub, and for charts referenced as dependencies of other charts in Charts.yaml. Any more-generic searching (across multiple repositories, partial match, etc) should be able to be added later, and would not change how the existing pull/push/SemVer works. |
|
Completely agree that folks should prioritize what's available today, vs. what we're hoping to get done tomorrow. Having a prioritized execution helps make continued progress. The thing to be aware of is this: I would also endorse the client/server request model for a list of versions of content. The But, we also realize the registry must provide better capabilities than tag listing, which gets us back to the search scenario. But, what do we need now? Just suggesting there's a subset of capabilities internal teams need, and they need it lifted from the experimental flag, and would be super happy. We continue to hear the tension that large customers can't use features "in preview". OCI Artifacts is GA supported in ACR and other cloud providers. We just need to lift a subset of helm features so projects like Terraform can support it. So, what is the minimum scoped features that could be added, with more to come later? |
|
Search was intentionally left out of this HIP. I think it's safer to limit the scope of this for now, and add more features later. Otherwise we are perpetually caught in this loop of "what about X". For now, specific chart versions must be specified, search and automatic semver resolution will not be supported. |
|
Custom tags were ruled out and added to the "Rejected Ideas" section. But last I recall when we discussed this in September, the system was designed to handle automatic semver resolution. That's why we decided to implement an OCI Getter. The tag listing API allows us to fetch all available tags, which will allow us to determine which is "latest" similar to the chart repository API. We can then check the We even ruled out the idea that docker images and Helm charts are to be stored in the same "repository", so I'm not sure why this is being brought up again. What changed between September and today to rule out automatic semver lookup support? |
bacongobbler
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Let me know if there's something else we need to address here before merging.
bacongobbler
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, found one section that needs an update before merging. Once that's done we can go and merge this.
Signed-off-by: jdolitsky <393494+jdolitsky@users.noreply.github.com>
|
@bacongobbler resolved. Re: tag listing - this was part of the original discussion/PR but never put into this HIP. There is still some experimentation to do here, coupled with the fact that docker/distribution (backbone of most registries) still does not support pagination. See distribution/distribution#3143. I think the search/semver aspect can come in a subsequent HIP. |
FWIW, I wasn't suggesting search should be included, just that IMO the position should be clear in this HIP, stating that any search or browser functionality is not supported by this proposal and so the helm client will be able to install only an explicitly provided chart and tag from a specific repository, if that's the case. And possibly what people's other options are for finding the explicit chart and tag (ie. mentioned above, internal systems you might know the exact tag, other public systems might want to populate some external searchable directory, such as a hub - just guessing with my own thoughts for the last point). |
It's up to the user to segment the content they push to a repo. Just calling out that it's not something helm can limit, and it just needs to parse the results. Great to hear the logic is built-in. Really great to see the progress happening. And, we're looking for any asks needed in the underlying distribution spec/oci artifacts spec that we should add to enable additional functionality. |
No description provided.