ELF seed corpus
The universal Linux binary format — ELF parsers run as root in loaders and in privileged analysis tools, making bugs high-impact.
The Executable and Linkable Format (ELF) encodes object files, shared libraries, and executables as a collection of sections (for linking) and segments (for loading). The ELF header identifies architecture, ABI, entry point, and the locations of the section header table (SHT) and program header table (PHT). Bugs in ELF parsers are high-impact because the format is parsed by the kernel dynamic linker (ld-linux), by debuggers (GDB, LLDB), by binary analysis frameworks (binutils readelf, objdump), and by security tools (capstone, pwntools) — many of which run with elevated privileges or in security-critical contexts.
Common bug classes in ELF parsers include integer overflow in section size calculations, out-of-bounds reads from malformed symbol table entries, heap overflows in DWARF debug information parsers (particularly .debug_info, .debug_line, and .debug_aranges sections), and infinite loops from circular or self-referential section chains. DWARF parsing is a particularly rich attack surface: the format is deeply recursive (DIE trees), versioned (DWARF 2 through 5), and supports extension vendor blocks.
A high-quality ELF corpus should include ELF32 and ELF64 files, files for multiple architectures (x86-64, ARM, MIPS, RISC-V), object files (.o), shared libraries (.so), and static executables. Files with deliberately malformed section headers — overlapping sections, sections with sizes that extend past the file boundary, and sections with invalid type values — are essential for reaching error-recovery code paths.
Where to grab a starter corpus
Building + curating your corpus
- →Include ELF32 and ELF64 variants and files for at least three architectures (x86-64, ARM64, MIPS) — parsers frequently have architecture-specific code paths.
- →Add ELF files with DWARF debug sections (.debug_info, .debug_line, .debug_aranges) — DWARF parsing is the largest and most bug-prone part of most ELF analysis tools.
- →Include object files (.o), shared libraries (.so with PLT/GOT tables), and fully linked executables — each exercises different sections of the ELF parser.
- →Craft ELF files where e_shstrndx points to a non-existent section to stress string table lookup error handling.
- →Use afl-cmin on a corpus built from /usr/lib objects and debug packages — strip down to unique coverage entries to avoid wasting time on near-identical system libraries.
Mutator hints
- →Use AFL++ elf.dict to inject ELF magic bytes (\x7fELF), class bytes, and SH_TYPE values as mutation tokens.
- →Write a section-header-aware custom mutator that modifies sh_offset and sh_size fields independently to create overlapping or out-of-file sections.
- →Mutate DWARF abbreviation tables (in .debug_abbrev) to create mismatches between the abbreviation code and the actual DIE attribute sequence — a common source of heap overflows in DWARF readers.
- →CMPLOG mode in AFL++ is highly effective for ELF because parsers do many comparisons on architecture values (e_machine), section types (sh_type), and program header types (p_type).
Recommended fuzzers
- → AFL++
- → libFuzzer
- → Honggfuzz
Push a Dockerfile + harness + the corpus links above. First month 50% off.