GZIP seed corpus
Thin header wrapper around raw Deflate — the real attack surface is the Deflate decompressor, not the 10-byte gzip header.
A GZIP file consists of a 10-byte fixed header, optional extra fields (FNAME, FCOMMENT, FHCRC), a Deflate-compressed data stream, and an 8-byte trailer containing a CRC32 and the uncompressed size. The header is small enough that purely random mutation reaches most header-parsing code paths quickly. The interesting attack surface is the Deflate decompressor: it must handle three block types (uncompressed, fixed Huffman, dynamic Huffman), back-references with a 32 KB sliding window, and end-of-block codes.
zlib's inflate() function is one of the most widely deployed pieces of C code in the world and has been heavily fuzzed. Despite this, edge cases continue to surface, particularly in the interaction between flush modes (Z_SYNC_FLUSH, Z_FULL_FLUSH) and stream concatenation. libdeflate, a faster alternative to zlib, is a younger codebase with less fuzzing history and is therefore a higher-value target. pigz (parallel gzip) adds threading complexity on top of the Deflate decompressor.
A gzip corpus should include files with all three Deflate block types, files with the optional extra fields populated (FNAME, FCOMMENT, FHCRC), multi-stream gzip files (concatenated gzip members, which some decompressors must handle as a single stream), and a few crafted files with invalid block type bits, over-subscribed Huffman codes, and back-references with distance zero.
Where to grab a starter corpus
Building + curating your corpus
- →Generate corpus entries using all three Deflate block types explicitly — many gzip files in the wild use only type 2 (dynamic Huffman), leaving types 0 and 1 undercovered.
- →Include multi-stream gzip files (two valid gzip members concatenated) to test decompressors that are expected to handle concatenated streams.
- →Add files with optional header fields populated: FNAME (NUL-terminated original filename), FCOMMENT, and FHCRC (header CRC16) each have separate parsing code.
- →Craft a gzip file where the CRC32 in the trailer is correct but the ISIZE (uncompressed size mod 2^32) is wrong — some decompressors validate only one of the two trailer fields.
- →Keep uncompressed sizes small (< 4 KB) to avoid slow decompression of large Deflate streams during fuzzing.
Mutator hints
- →Use AFL++ gzip.dict to inject gzip magic bytes (\x1f\x8b), compression method (\x08), and OS bytes as interesting single-byte tokens.
- →Directly mutate Deflate block type bits (bits 0-1 in the first byte after the gzip header) to switch between uncompressed, fixed, and dynamic Huffman block types.
- →For dynamic Huffman fuzzing, write a custom mutator that generates syntactically valid Huffman code length tables with degenerate distributions (single code, maximum code length).
- →CMPLOG mode helps AFL++ learn the \x1f\x8b magic signature and the compression method byte (\x08 = Deflate) checked at the start of inflate().
Recommended fuzzers
- → AFL++
- → libFuzzer
- → Honggfuzz
- → Centipede
Libraries that consume GZIP
Push a Dockerfile + harness + the corpus links above. First month 50% off.