14

When I zip (Zip 2.31) the same file in Linux I get a different checksum everytime. How can I keep the same md5sum from last time? I'm using the latest zip update from yum

3
  • The most likely reason is that the file you're compressing keeps changing. Commented Oct 22, 2013 at 16:15
  • the file is the same, same creation date, same size, same checksum Commented Oct 22, 2013 at 16:19
  • My advice: (1) Ask on a site where the question is on-topic (e.g. superuser.com). (2) Include a complete, reproducible shell session that demonstrates the behaviour. Commented Oct 22, 2013 at 16:27

4 Answers 4

30

The archive being generated does not only contain the compressed file data, but also "extra file attributes" (as refered in zip documentation), as file timestamps, file attributes, ...

If this metadata is different between compressions, you will never get the same checksum, as the metadata for the compresed file has changed and has been included in the archive.

You can use zip's -X option (or the long --no-extra option) to avoid including the files extra attributes in the archive:

zip -X foo.zip foo-file

Sucessive runs of this command without file modifications must not change the hash of the archive.

Sign up to request clarification or add additional context in comments.

7 Comments

source file has the same checksum everytime. It's weird
Yes, but when you add the file into the zip, you add the metadata (file modificacion datetime) to the zip. So, the zip is different, so are the chechsums
zip command has a --no-extra parameter to control file atributes. I don't have now a copy to try. If this doesn't work, you can try to use touch command to set the file date/time before zipping.
thanks MC, using the -X flag works: -X eXclude eXtra file attributes. Thanks for the tip that led em to this.
the -X flag doesn't work for me on OSX. Probably because it doesn't save extended file attributes which I guess are distinct from modification time.
|
5

Adding -X flag as suggested in @mc-nd's answer worked fine for me on single-file zip.

But when I was compressing a directory (node_modules in my case) I was getting the different hash each time I reinstalled node_modules.

The fix was to also add -D flag:

-D
   --no-dir-entries
          Do  not  create entries in the zip archive for directories.  
          Directory entries are created by default so that their attributes can
          be saved in the zip archive.

1 Comment

Which OS are you running on? On both macOS and Debian-flavoured Linux, if I use the long option --no-extra I get zip error: Invalid command arguments (long option 'no-extra' not supported) and the short option -X doesn't appear to do anything (if I extract the file again it has the timestamp of the original file)...
4

Neither -X or -D worked for me. It looks like zip still sets timestamps within the archive causing mismatching hashes on identical content.

I've fixed the issue by manually setting file timestamps using:

touch -t 202001010000 file

1 Comment

I use it like this find . -exec touch -d @0000000000 {} +
4

In order to make a deterministic archive, one that can be rebuilt and verified using a hash, several things are required:

Timestamps of all files must have predictable values

Set the timestamps of all files to a specific value, e.g.

find . -exec touch -d '1985-10-21 09:00:00' {} \;

As an aside, the earliest date supported by the zip format is 01/01/1980 - timestamping all files to the unix epoch (01/01/1970) won't have the desired effect.

If making a zip from a Git checkout you could use the Git commit timestamp of the last change to each file (inspired by this stackoverflow answer).

git ls-files | xargs -I {} sh -c 'chmod 644 "{}"; touch -m -t "$(git log --pretty=format:%cd -n 1 --date=iso "{}" | sed "s/-//g;s/ //;s/://;s/:/\./;s/ .*//")" "{}"'

Permissions of all files must have predictable values

Explicitly set permissions, say to 644, like this:

find . -type f -exec chmod 644 {} \;

Don't rely on the permissions applied by git clone because these depend on the environment's uname value and are therefore unpredictable.

Present files to zip in a specific order

The order in which files are added to a zip matters. Instead of relying on recursion and globbing that depend on the order files are stored in directories which is filesystem dependent and unpredictable. Use somthing like find and sort the list to provide a predictable order.

Disable the zip "extra attributes" feature

This ensures that non-deterministic data such as archive modification timestamps, user names, etc, is not written to the archive. Use the -X option to do this.

Example:

find . -type f | sort | TZ=UTC zip -qX myfile.zip -@ 

Also, here, the timezone is forced to UTC to avoid further confusion.

Such a zip should be deterministic; verifable using md5sum, sha256sum, etc.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.