Fix caching behavior to always use hashed URLs for the cache filenames#503
Merged
Fix caching behavior to always use hashed URLs for the cache filenames#503
Conversation
The two cache dirs are currently "refs" and "downloads". Make these always explicit, rather than `"downloads"` being an internal default baked into the downloader.
1. When caching schemas in the `downloads/` dir, hash their URLs, just as is done in the `refs/` cache. This fixes buggy behaviors due to cache confusion on matching filenames. 2. Deprecate (but do not, at this time, remove) the `--cache-filename` option, making it non-functional. This also removes the `cache_filename` internal arg from several components. Some minimal test and doc cleanup is needed, to appropriately handle these changes.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This change fixes a significant bug in caching behavior.
Cross-link: resolves GHSA-q6mv-284r-mp36
Why
Under the pre-existing default behavior, two different remote (HTTPS) schemas with the same filename would be cached under the same local filename. e.g.,
https://example.com/foo/schema.jsonandhttps://example.com/bar/schema.jsonboth cache asschema.json.As a result, cache confusion / poisoning issues would easily occur for users interacting with these schemas.
Prior to this change, users would either have to disable caching or use
--cache-filenameto get correct caching behavior.What
To fix, caching is changed here to always use URL hashes, as was developed for the
refs/cache.This behavior is refined and relocated to be used by both caching paths in the codebase.
The
--cache-filenameoption is no longer important or useful to support, so it is made non-functional and deprecated in this release.It will be removed in a future release; timeline TBD.
Additional Notes
Both caching codepaths -- initial schema and refs -- now use sha256 hashing. This results in a mild (read: negligible) slowdown in exchange for driving the risk of a collision even lower, to effectively 0.
No effort is made to "clean up" stale files in the cache. As documented, users can always blow the whole cache, as in
rm -r ~/.cache/check_jsonschema, if they want to remove these files.Cache cleanup would also play poorly with users who have different tool versions installed simultaneously in different projects.
I tried some implementation options which required a more significant refactor of the caching layer, but discarded them as
which evaluate negatively against the criterion of "ship a bugfix confidently and expeditiously so that users can upgrade".