Skip to content

Unstable MD5 Due to Git Archive + Versioneer #7937

@citibeth

Description

@citibeth

The use of templates in a GitHub repo, plus auto-filling of them from git archive, can result in unstable hashes.

While installing, I noticed that one package's hash had changed from a year ago: https://github.com/SciTools/cf_units

These things must always be investigated. So I downloaded the tarball using GitHub's archive feature (https://github.com/SciTools/cf_units/archive/v1.1.3.tar.gz) and compared it to the old tarball I had lying around.

Expected Result

I expected the two to be the same.

Actual Result

I got the diff:

$ diff -r cf/cf_units-1.1.3 py/cf_units-1.1.3
diff -r cf/cf_units-1.1.3/cf_units/_version.py py/cf_units-1.1.3/cf_units/_version.py
26c26
<     git_refnames = " (tag: v1.1.3)"
---
>     git_refnames = " (HEAD -> master, tag: v1.1.3)"

Digging Deeper

I looked at _version.py in the repo and found the following https://github.com/SciTools/cf_units/blob/master/cf_units/_version.py

def get_keywords():
    """Get the keywords needed to look up the version information."""
    # these strings will be replaced by git during git-archive.
    # setup.py/versioneer.py will grep for the variable names, so they must
    # each be defined on a line of their own. _version.py will just call
    # get_keywords().
    git_refnames = "$Format:%d$"
    git_full = "$Format:%H$"
    keywords = {"refnames": git_refnames, "full": git_full}
    return keywords

Apparently, the templates in this file are filled in when git-archive is run (GitHub runs git-archive when a special archive URL is used, as above). And even though the version is stable, the presence of additional branches or tags on a commit can cause git_refnames to change, and thus the MD5 to change.

Proposed Solution

Encourage upstream authors to avoid the use of git_refnames (above) if they don't need it. Encourage upstream authors to make sure that tagged releases don't have any additional tags or branches on them.

This won't be an issue for the vast majority of upstream authors who don't use git-archive to substitute in versions.

Has anyone else encountered this?

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions