Reference
Seed corpora for fuzzing
Where to find starter inputs for 12+ common file formats. Each page has upstream sources, harness scaffolds, and mutator hints for AFL++ + libFuzzer campaigns.
Document
PDF
The broadest attack surface in document parsing — 700+ spec pages and decades of implementation debt.
Read guide
DOC/DOCX
Binary OLE2 and ZIP-based XML — two completely different parser stacks hiding under one file extension.
Read guide
RTF
Plain-text container with binary payloads — RTF's escape-everything encoding hides parser complexity behind readable syntax.
Read guide
Image
PNG
Chunk-driven with mandatory CRC validation — perfect for dictionary-assisted and structure-aware mutation.
Read guide
JPEG
Marker-segmented with Huffman entropy coding — entropy streams make pure random mutation less effective than marker-aware strategies.
Read guide
WebP
RIFF container with three distinct codec paths — lossy VP8, lossless VP8L, and animated VP8X each need separate corpus coverage.
Read guide
Archive
ZIP
Dual-directory format with overlapping metadata fields — inconsistencies between local and central directory headers are a classic bug class.
Read guide
TAR
Sequential header-data blocks with octal numeric fields — path traversal and header field overflow are the two dominant bug classes.
Read guide
GZIP
Thin header wrapper around raw Deflate — the real attack surface is the Deflate decompressor, not the 10-byte gzip header.
Read guide