PNG seed corpus
Chunk-driven with mandatory CRC validation — perfect for dictionary-assisted and structure-aware mutation.
PNG encodes image data as a sequence of typed chunks (IHDR, IDAT, PLTE, tEXt, etc.), each with a four-byte type code and a CRC32 checksum. The IDAT chunk stream is zlib-compressed, so a PNG fuzzing corpus must stress both the chunk-level parser and the zlib decompressor underneath. Ancillary chunks (gAMA, cHRM, iCCP, tEXt, zTXt, iTXt) are individually optional but widely implemented, and each handler is a potential source of out-of-bounds access.
libpng is the canonical reference implementation and ships in virtually every OS. Its progressive read API, used by browsers, is particularly complex: the parser maintains a state machine across multiple IDAT chunks, and state transitions at unexpected positions have historically caused use-after-free and heap corruption bugs. Interlaced PNG (Adam7) adds further complexity: the decoder must reassemble seven interleaved passes of scan lines, each with independent filter application.
Because PNG chunk types are four printable ASCII characters, AFL++ dictionary mode is very effective — tokens like 'IHDR', 'IDAT', 'PLTE', 'IEND' appear verbatim in the binary and can be injected by the fuzzer as interesting replacement values. Pair this with a corpus that includes valid interlaced, progressive, and 16-bit-depth images to maximise code coverage.
Building + curating your corpus
- →Start with the PNGSuite corpus (161 files) — it systematically covers all colour types (grayscale, RGB, palette, RGBA), bit depths (1–16), and interlace modes.
- →Add a handful of intentionally corrupt files: wrong CRCs, IDAT zlib streams with bad checksums, and IHDR with zero dimensions exercise error-recovery paths.
- →Use afl-cmin to reduce a large crawled corpus to unique-coverage entries — PNG files are compact but many web-crawled images are near-duplicates structurally.
- →Include 16-bit-per-channel images and images with embedded ICC profiles (iCCP chunks) — these exercise separate code paths from common 8-bit sRGB files.
- →Keep corpus files under 50 KB for throughput; strip large IDAT payloads while preserving chunk structure if needed.
Mutator hints
- →Use AFL++ png.dict to inject chunk type codes (IHDR, IDAT, PLTE, iCCP, gAMA, tEXt, zTXt, cHRM) as mutation tokens.
- →Write a custom AFL++ mutator that recomputes CRC32 after byte-level mutations — prevents the decoder from rejecting inputs at chunk validation and drives coverage deeper.
- →For IDAT fuzzing, pre-compress a range of interesting pixel patterns and use them as replacement zlib payloads, bypassing the compression layer to reach decompression bugs.
- →Inject oversized or undersized chunk length fields (mismatch between declared length and actual data) to stress bounds-checking in chunk readers.
Recommended fuzzers
- → AFL++
- → libFuzzer
- → Honggfuzz
Libraries that consume PNG
Push a Dockerfile + harness + the corpus links above. First month 50% off.