Skip to main content
← All formats
Archive

GZIP seed corpus

Thin header wrapper around raw Deflate — the real attack surface is the Deflate decompressor, not the 10-byte gzip header.

A GZIP file consists of a 10-byte fixed header, optional extra fields (FNAME, FCOMMENT, FHCRC), a Deflate-compressed data stream, and an 8-byte trailer containing a CRC32 and the uncompressed size. The header is small enough that purely random mutation reaches most header-parsing code paths quickly. The interesting attack surface is the Deflate decompressor: it must handle three block types (uncompressed, fixed Huffman, dynamic Huffman), back-references with a 32 KB sliding window, and end-of-block codes.

zlib's inflate() function is one of the most widely deployed pieces of C code in the world and has been heavily fuzzed. Despite this, edge cases continue to surface, particularly in the interaction between flush modes (Z_SYNC_FLUSH, Z_FULL_FLUSH) and stream concatenation. libdeflate, a faster alternative to zlib, is a younger codebase with less fuzzing history and is therefore a higher-value target. pigz (parallel gzip) adds threading complexity on top of the Deflate decompressor.

A gzip corpus should include files with all three Deflate block types, files with the optional extra fields populated (FNAME, FCOMMENT, FHCRC), multi-stream gzip files (concatenated gzip members, which some decompressors must handle as a single stream), and a few crafted files with invalid block type bits, over-subscribed Huffman codes, and back-references with distance zero.

Building + curating your corpus

  • Generate corpus entries using all three Deflate block types explicitly — many gzip files in the wild use only type 2 (dynamic Huffman), leaving types 0 and 1 undercovered.
  • Include multi-stream gzip files (two valid gzip members concatenated) to test decompressors that are expected to handle concatenated streams.
  • Add files with optional header fields populated: FNAME (NUL-terminated original filename), FCOMMENT, and FHCRC (header CRC16) each have separate parsing code.
  • Craft a gzip file where the CRC32 in the trailer is correct but the ISIZE (uncompressed size mod 2^32) is wrong — some decompressors validate only one of the two trailer fields.
  • Keep uncompressed sizes small (< 4 KB) to avoid slow decompression of large Deflate streams during fuzzing.

Mutator hints

  • Use AFL++ gzip.dict to inject gzip magic bytes (\x1f\x8b), compression method (\x08), and OS bytes as interesting single-byte tokens.
  • Directly mutate Deflate block type bits (bits 0-1 in the first byte after the gzip header) to switch between uncompressed, fixed, and dynamic Huffman block types.
  • For dynamic Huffman fuzzing, write a custom mutator that generates syntactically valid Huffman code length tables with degenerate distributions (single code, maximum code length).
  • CMPLOG mode helps AFL++ learn the \x1f\x8b magic signature and the compression method byte (\x08 = Deflate) checked at the start of inflate().

Recommended fuzzers

  • AFL++
  • libFuzzer
  • Honggfuzz
  • Centipede
Run a GZIP fuzz campaign on Fuzze.rs →

Push a Dockerfile + harness + the corpus links above. First month 50% off.