110

In my project I need to know what a zlib header looks like. I've heard it's rather simple but I cannot find any description of the zlib header.

For example, does it contain a magic number?

6 Answers 6

163

zlib magic headers

78 01 - No Compression/low
78 5E - Fast Compression
78 9C - Default Compression
78 DA - Best Compression 
Sign up to request clarification or add additional context in comments.

4 Comments

This helped me figure out what type of compression I was dealing with. I knew the file was compressed, but was doing searches for some header bytes and this came up. Thanks!
When using the Java Inflator (uses ZLIB) I'm seeing header values of 120, -100. This equates to 78 9C. Backs up what you said above.
That is only three of the 64 possible zlib headers. See mwfearnley's answer.
There are four possible 78 answers - this answer omits 78 5E - which falls between none/default. 28 other values are less likely to be seen, but still valid.
117

The header is two bytes.

From RFC 1950: ZLIB Compressed Data Format Specification version 3.3

0   1
+---+---+
|CMF|FLG|
+---+---+

CMF byte

CMF (Compression Method and flags)
This byte is divided into a 4-bit compression method and a 4- bit information field depending on the compression method.

bits 0 to 3  CM     Compression method
bits 4 to 7  CINFO  Compression info

bits 0 to 3: CM

CM (Compression method)
This identifies the compression method used in the file. CM = 8 denotes the "deflate" compression method with a window size up to 32K. This is the method used by gzip and PNG and almost everything else. CM = 15 is reserved.

bits 4 to 7: CINFO

CINFO (Compression info)
For CM = 8, CINFO is the base-2 logarithm of the LZ77 window size, minus eight (CINFO=7 indicates a 32K window size). Values of CINFO above 7 are not allowed in this version of the specification. CINFO is not defined in this specification for CM not equal to 8.

In practice, this means the first byte is almost always 78 (hex)

FLG byte

FLG (FLaGs)
This flag byte is divided as follows:

bits 0 to 4  FCHECK  (check bits for CMF and FLG)
bit  5       FDICT   (preset dictionary)
bits 6 to 7  FLEVEL  (compression level)

bits 0 to 4: FCHECK

The FCHECK value must be such that CMF and FLG, when viewed as a 16-bit unsigned integer stored in MSB order (CMF*256 + FLG), is a multiple of 31.

bit 5: FDICT

bits 6 to 7: FLEVEL

FLEVEL (Compression level)
These flags are available for use by specific compression methods. The "deflate" method (CM = 8) sets these flags as follows:

        0 - compressor used fastest algorithm
        1 - compressor used fast algorithm
        2 - compressor used default algorithm
        3 - compressor used maximum compression, slowest algorithm

2 Comments

Alright, so a lower bit number here refers a less significant bit, not MSB. I was pretty confused when you said “bit #0.”
For this CMF byte The bits index is 0-7 is in big-endian. So starting from right to left. 8 - on the right. 7 - on the left. Took me sometime to get it
40

The ZLIB header (as defined in RFC1950) is a 16-bit, big-endian value - in other words, it is two bytes long, with the higher bits in the first byte and the lower bits in the second.

It contains these bitfields from most to least significant:

                                FDICT
 CINFO       CM           FLEVEL|  FCHECK
 v--v--v--v  v--v--v--v   v--v  v  v--v--v--v--v
15 14 13 12 11 10  9  8   7  6  5  4  3  2  1  0
 0  x  x  x  1  0  0  0   x  x  0  x  x  x  x  x

  • CINFO (bits 12-15, first byte)
    Indicates the window size as a power of two, from 0 (256 bytes) to 7 (32768 bytes). This will usually be 7. Higher values are not allowed.

  • CM (bits 8-11)
    The compression method. Only Deflate (8) is allowed.


  • FLEVEL (bits 6-7, second byte)
    Roughly indicates the compression level, from 0 (fast/low) to 3 (slow/high)

  • FDICT (bit 5)
    Indicates whether a preset dictionary is used. This is usually 0. (1 is technically allowed, but I don't know of any Deflate formats that define preset dictionaries.)

  • FCHECK (bits 0-4)
    A checksum (5 bits, 0..31), whose value is calculated such that the entire value divides 31 with no remainder.*


Typically, only the CINFO and FLEVEL fields can be freely changed, and FCHECK must be calculated based on the final value. Assuming no preset dictionary, there is no choice in what the other fields contain, so a total of 32 possible headers are valid. Here they are:

      FLEVEL: 0       1       2       3
CINFO:
     0      08 1D   08 5B   08 99   08 D7
     1      18 19   18 57   18 95   18 D3
     2      28 15   28 53   28 91   28 CF
     3      38 11   38 4F   38 8D   38 CB
     4      48 0D   48 4B   48 89   48 C7
     5      58 09   58 47   58 85   58 C3
     6      68 05   68 43   68 81   68 DE
     7      78 01   78 5E   78 9C   78 DA

The CINFO field is rarely, if ever, set by compressors to be anything other than 7 (indicating the maximum 32KB window), so the only values you are likely to see in the wild are the four in the bottom row (beginning with 78).

* (You might wonder if there's a small amount of leeway on the value of <sup>FCHECK</sup> - could it be set to either of 0 or 31 if both pass the checksum? In practice though, this can only occur if <sup>FDICT=1</sup>, so it doesn't feature in the above table.)

3 Comments

Thanks for the exhaustive information. This should be the accepted answer to this question.
Bonus fact: when FLEVEL is 1 and CINFO is at least 2, both bytes are ASCII-printable, so (S, 8O, HK, XG, hC, x^ are valid headers. PNGOUT uses x^ (78 5E) when compressing images.
Thanks, 0x789C means CINFO=7 CM=8, FCHECK=28, FDICT=0, FLEVEL=2. (bit order from LSB to MSB is the only correct way to decode).
32

ZLIB/GZIP headers

Level | ZLIB  | GZIP 
  1   | 78 01 | 1F 8B 
  2   | 78 5E | 1F 8B 
  3   | 78 5E | 1F 8B 
  4   | 78 5E | 1F 8B 
  5   | 78 5E | 1F 8B 
  6   | 78 9C | 1F 8B 
  7   | 78 DA | 1F 8B 
  8   | 78 DA | 1F 8B 
  9   | 78 DA | 1F 8B 

Deflate doesn't have common headers

1 Comment

The first three bytes of a gzip stream are constant: 1f 8b 08, for stronger discrimination. Also you are only showing four of the 64 possible zlib headers. See mwfearnley's answer.
22

Following is the Zlib compressed data format.

 +---+---+
 |CMF|FLG| (2 bytes - Defines the compression mode - More details below)
 +---+---+
 +---+---+---+---+
 |     DICTID    | (4 bytes. Present only when FLG.FDICT is set.) - Mostly not set
 +---+---+---+---+
 +=====================+
 |...compressed data...| (variable size of data)
 +=====================+
 +---+---+---+---+
 |     ADLER32   |  (4 bytes of checksum)
 +---+---+---+---+

Mostly, FLG.FDICT (Dictionary flag) is not set. In such cases the DICTID is simply not present. So, the total hear is just 2 bytes.

The header values(CMF and FLG) with no dictionary are defined as follows.

 CMF |  FLG
0x78 | 0x01 - No Compression/low
0x78 | 0x9C - Default Compression
0x78 | 0xDA - Best Compression 

CMF and FLG byte order and bit order:

 7      0 7      0 
+--------+--------+
|CINFO|CM|FL|FD|FC|
+--------+--------+
   byte 0  byte 1

bits 6 to 7 FL=FLEVEL
bits 5      FD=FDICT
bits 0 to 4 FC=FCHECK

More at ZLIB RFC

1 Comment

This summary shows only three of the 64 possible zlib headers. See mwfearnley's answer.
7

All answers here are most probably correct, however - if you want to manipulate ZLib compression stream directly, and it was produced by using gz_open, gzwrite, gzclose functions - then there is extra 10 leading bytes header before zlib compression steam comes - and those are produced by function gz_open - header looks like this:

    fprintf(s->file, "%c%c%c%c%c%c%c%c%c%c", gz_magic[0], gz_magic[1],
         Z_DEFLATED, 0 /*flags*/, 0,0,0,0 /*time*/, 0 /*xflags*/, OS_CODE);

And results in following hex dump: 1F 8B 08 00 00 00 00 00 00 0B followed by zlib compression stream.

But there is also trailing 8 bytes - they are uLong - crc over whole file, uLong - uncompressed file size - look for following bytes at end of stream:

    putLong (s->file, s->crc);
    putLong (s->file, (uLong)(s->in & 0xffffffff));

2 Comments

Everything is explained here : tools.ietf.org/html/rfc1952
Those are gzip streams, not zlib streams. Two different things.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.