Skip to main content
← All formats
Image

PNG seed corpus

Chunk-driven with mandatory CRC validation — perfect for dictionary-assisted and structure-aware mutation.

PNG encodes image data as a sequence of typed chunks (IHDR, IDAT, PLTE, tEXt, etc.), each with a four-byte type code and a CRC32 checksum. The IDAT chunk stream is zlib-compressed, so a PNG fuzzing corpus must stress both the chunk-level parser and the zlib decompressor underneath. Ancillary chunks (gAMA, cHRM, iCCP, tEXt, zTXt, iTXt) are individually optional but widely implemented, and each handler is a potential source of out-of-bounds access.

libpng is the canonical reference implementation and ships in virtually every OS. Its progressive read API, used by browsers, is particularly complex: the parser maintains a state machine across multiple IDAT chunks, and state transitions at unexpected positions have historically caused use-after-free and heap corruption bugs. Interlaced PNG (Adam7) adds further complexity: the decoder must reassemble seven interleaved passes of scan lines, each with independent filter application.

Because PNG chunk types are four printable ASCII characters, AFL++ dictionary mode is very effective — tokens like 'IHDR', 'IDAT', 'PLTE', 'IEND' appear verbatim in the binary and can be injected by the fuzzer as interesting replacement values. Pair this with a corpus that includes valid interlaced, progressive, and 16-bit-depth images to maximise code coverage.

Building + curating your corpus

  • Start with the PNGSuite corpus (161 files) — it systematically covers all colour types (grayscale, RGB, palette, RGBA), bit depths (1–16), and interlace modes.
  • Add a handful of intentionally corrupt files: wrong CRCs, IDAT zlib streams with bad checksums, and IHDR with zero dimensions exercise error-recovery paths.
  • Use afl-cmin to reduce a large crawled corpus to unique-coverage entries — PNG files are compact but many web-crawled images are near-duplicates structurally.
  • Include 16-bit-per-channel images and images with embedded ICC profiles (iCCP chunks) — these exercise separate code paths from common 8-bit sRGB files.
  • Keep corpus files under 50 KB for throughput; strip large IDAT payloads while preserving chunk structure if needed.

Mutator hints

  • Use AFL++ png.dict to inject chunk type codes (IHDR, IDAT, PLTE, iCCP, gAMA, tEXt, zTXt, cHRM) as mutation tokens.
  • Write a custom AFL++ mutator that recomputes CRC32 after byte-level mutations — prevents the decoder from rejecting inputs at chunk validation and drives coverage deeper.
  • For IDAT fuzzing, pre-compress a range of interesting pixel patterns and use them as replacement zlib payloads, bypassing the compression layer to reach decompression bugs.
  • Inject oversized or undersized chunk length fields (mismatch between declared length and actual data) to stress bounds-checking in chunk readers.

Recommended fuzzers

  • AFL++
  • libFuzzer
  • Honggfuzz
Run a PNG fuzz campaign on Fuzze.rs →

Push a Dockerfile + harness + the corpus links above. First month 50% off.