PE/COFF seed corpus
The Windows executable format — parsed by every AV engine, loader, and reverse engineering tool on the planet.
The Portable Executable (PE) format used by Windows executables and DLLs is built on the older COFF (Common Object File Format). A PE file starts with a DOS stub (including the MZ signature), a PE signature, a COFF File Header, an Optional Header (which contains the Data Directory table), and a Section Table. The Data Directory entries point to Import Address Tables, Export Tables, resource trees, debug directories, TLS callbacks, and .NET metadata — each a semi-independent parser with its own bug history.
PE files are parsed by the Windows kernel loader, by every antivirus and EDR product, by reverse engineering tools (Ghidra, Binary Ninja, IDA Pro), and by malware analysis sandboxes. The variety of parsers and the range of architectures supported (x86, x64, ARM, ARM64) make it a high-value fuzzing target. Historical bugs include integer overflows in resource tree traversal, heap overflows in import table parsing, infinite loops from circular export chains, and out-of-bounds reads from malformed .NET metadata streams.
A strong PE corpus requires files that cover: 32-bit (PE32) and 64-bit (PE32+) formats, DLLs with export tables, executables with rich import tables, .NET managed executables (with CLR Optional Header), and files with deliberately overlapping or out-of-bounds section RVAs. OSS-Fuzz does not directly fuzz closed-source PE parsers, so community-maintained corpora and test vectors from the PE format specification are the primary upstream sources.
Building + curating your corpus
- →The Corkami PE corpus is the best single source for format edge cases: it covers hundreds of structural quirks that real PE files in the wild don't exercise.
- →Include both PE32 (32-bit) and PE32+ (64-bit) files separately — the Optional Header layout and pointer sizes differ and exercise different parser branches.
- →Add .NET managed executables (PE files with IMAGE_DIRECTORY_ENTRY_COM_DESCRIPTOR) if your target parses managed metadata.
- →Include DLLs with export tables (forwarded exports, export-by-ordinal) alongside executables with rich import descriptors.
- →Craft PE files where a section RVA + virtual size overflows a 32-bit integer to stress VA-to-offset mapping arithmetic.
Mutator hints
- →Use AFL++ pe.dict to inject PE signatures (MZ, PE\0\0), machine type values, and Data Directory index constants as mutation tokens.
- →Mutate the NumberOfSections field in the COFF header to values larger than the actual section table allows — a common source of out-of-bounds reads.
- →Write a custom mutator that adjusts section RawOffset and SizeOfRawData fields together and independently to create both consistent and inconsistent section layouts.
- →CMPLOG mode helps AFL++ learn PE magic bytes (MZ header, PE signature, optional header magic 0x10B/0x20B) checked in the first few bytes of parsing.
Recommended fuzzers
- → AFL++
- → libFuzzer
- → Honggfuzz
Push a Dockerfile + harness + the corpus links above. First month 50% off.