-
Notifications
You must be signed in to change notification settings - Fork 7.5k
Add support for creating repository indexes in JSON format #12245
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
When an index is in a JSON format, the `sigs.k8s.io/yaml` package uses an inefficient approach to unmarshaling the data, as it does an unnecessary roundtrip on the data to transform the YAML to valid JSON. To prevent this from happening, detect if the bytes which we attempt to load contain valid JSON, and unmarshal them directly using `json.Unmarshal` instead. Signed-off-by: Hidde Beydals <hidde@hhh.computer>
This adds support for generating the repository index file in JSON format using the `--json` flag. The index itself is still written to `index.yaml`, which is fully backwards compatible as YAML is a superset of JSON. For big indexes (think multiple megabytes), this approach is however more efficient in combination with the changes to the load logic, as it prevents a YAML -> JSON roundtrip during decoding. Signed-off-by: Hidde Beydals <hidde@hhh.computer>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks pretty good, it will make a massive difference processing large helm indexes.
|
This is a really good enhancement and verified that the enhancement does function correctly. However, the use of It might be interesting to consider two potential files that could contain repository metadata: |
In an iteration before creating this PR, I actually had an approach like this where Besides this approach causing potentially lots of HTTP 404s for the server when a JSON isn't served, after taking a closer look at the caching logic (more precisely the various Another upside of the approach taken is that it's backwards compatible without having to serve (and thus store) two different file types. A Helm repository administrator can simply decide to start serving JSON, and any version of Helm will happily be able to download and work with the new index format (albeit only the Hem version which would include this change with memory consumption improvements). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... the use of index.yaml for both JSON and YAML content is confusing ...
I disagree. Since json is a subset of yaml, it seems perfectly valid to me.
The rest of hiddeco's rational is also very compelling.
I tested this on my use cases and it works for me.
|
@hiddeco @joejulian My concern with mixing JSON and YAML content is centered around two primary areas:
|
|
Other way around. Yaml is a superset of json. Anything in json is 100% compatible with yaml. Yaml has features that are not in json. So any json content in a yaml file is guaranteed compatible. |
The actual parser Helm uses converts YAML to JSON during unmarshalling. In addition, from the YAML 1.2 spec (2009, revised in 2021):
As noted in my previous comment, the exact opposite is true. As introducing an
|
sabre1041
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hiddeco @joejulian Reevaluated code changes, comments and LGTM
|
@hiddeco @sabre1041 this is amazing! Should it not be mentioned in the docs? |
|
They are partly automatically documented: https://github.com/helm/helm-www#updating-the-helm-cli-reference-docs |
|
Note, while writing the release notes I discovered this doesn't fully work. I filed a bug at #12422. |
helm makes this a thing in helm/helm#12245
|
@hiddeco sorry, I don't understand, how this PR should help with #9931 I have helm repository in artifactory and large (~50 Mb) index.yaml file and my "helm repo update" command consumes a lot of memory (~900 Mb). And as far as I know artifactory doesn't support json format for index.yaml file So, could you explain does this PR help with "helm repo update" or not ? |
|
It will not help if you are not in control of the published format of the repository index (which appears to be the case based on what you describe). |
What this PR does / why we need it:
This adds support for creating an
index.yamlin JSON format usinghelm repo index [...] --json.Fixes #10542
Supersedes #12151
Special notes for your reviewer:
Given that it does not create unnecessary HTTP requests to discover
index.jsonand/orindex.yamlfiles hosted by a repository server, nor requires rigorous changes to the caching logic, and is fully backwards compatible due to YAML being a superset of JSON. This appears to be the least invasive way to add support for the JSON format.Benchmark using Bitnami's infamous "full index" archive:
If applicable: