WebP seed corpus
RIFF container with three distinct codec paths — lossy VP8, lossless VP8L, and animated VP8X each need separate corpus coverage.
WebP wraps three different image codecs inside an RIFF container: VP8 (lossy, DCT-based), VP8L (lossless, Huffman + LZ77), and VP8X (extended, supporting animation, ICC profiles, and EXIF/XMP metadata). Each codec is implemented by different code paths in libwebp, and the RIFF chunk parser that dispatches between them is itself a source of integer overflow and out-of-bounds read vulnerabilities. The 2023 libwebp CVE-2023-4863 (critical heap buffer overflow in the Huffman code builder, also affecting Chrome) illustrates the severity of bugs reachable through this format.
A strong WebP corpus must include files exercising all three codec variants: lossy files at various quality levels, lossless files with different colour cache sizes, and animated WebP files with multiple frames and loop counts. The VP8L lossless format is particularly complex — its Huffman group system, colour transform, and predictor transform interact in ways that produce non-obvious code paths even with small input files.
libwebp is shipped in Chrome, Android, iOS, and virtually every image-processing pipeline that handles user-uploaded content. Continuous fuzzing by OSS-Fuzz has found dozens of bugs here. The OSS-Fuzz seed corpus is the recommended starting point and is updated as new code paths are discovered.
Where to grab a starter corpus
Building + curating your corpus
- →Seed separately for VP8, VP8L, and VP8X files — the three codec paths share almost no code and must be covered independently.
- →Include animated WebP files (VP8X with ANIM/ANMF chunks) as a separate corpus subdirectory; animation parsing is a distinct and often undertested code path.
- →Add WebP files with embedded ICC profiles and EXIF metadata to reach the metadata parsing paths that precede image decoding.
- →Keep lossy files at various quality levels (10, 50, 90) to exercise different DCT coefficient distributions in the VP8 decoder.
- →Corrupt RIFF chunk sizes deliberately — mismatches between the RIFF container size and the VP8/VP8L chunk size are a known source of integer overflow.
Mutator hints
- →Use the AFL++ webp.dict dictionary to inject RIFF chunk FourCC codes (WEBP, VP8 , VP8L, VP8X, ICCP, EXIF, XMP , ANIM, ANMF) as mutation tokens.
- →A custom mutator that recomputes RIFF chunk sizes after mutations prevents early rejection at the container layer and drives coverage into codec internals.
- →For VP8L lossless files, mutate the Huffman group count and colour cache size fields specifically — these control major structural branches in the lossless decoder.
- →CMPLOG mode is effective for catching magic-value comparisons in the RIFF dispatcher ('RIFF', 'WEBP' FourCC checks).
Recommended fuzzers
- → AFL++
- → libFuzzer
- → Honggfuzz
Push a Dockerfile + harness + the corpus links above. First month 50% off.