Skip to main content
← All libraries
Compression · C

How to fuzz bzip2

bzip2 is aging and under-fuzzed — its BWT state machine has never had a full audit.

bzip2 uses a combination of Burrows-Wheeler transform, move-to-front encoding, and Huffman coding — each a separate state machine that must agree on block boundaries. The codebase is over two decades old with limited recent security review, and .bz2 files appear in package managers, backup tools, and source distributions.

Common bug classes

  • Heap buffer overflow in BWT inverse transform symbol table
  • Integer overflow in block CRC32 validation arithmetic
  • Out-of-bounds read in Huffman selector table decode
  • Infinite loop on malformed end-of-stream magic bytes
  • Null dereference on zero-symbol Huffman tree

Recommended setup

Fuzzers

  • AFL++
  • libFuzzer
  • Honggfuzz

Sanitizers

  • ASan
  • UBSan

Harness scaffold

#include <stdint.h>
#include <stddef.h>
#include <stdlib.h>
#include <bzlib.h>

int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
  unsigned int out_sz = 1024 * 1024;
  char *out = malloc(out_sz);
  if (!out) return 0;
  BZ2_bzBuffToBuffDecompress(out, &out_sz,
                              (char *)data, (unsigned int)size,
                              0 /* small */, 0 /* verbosity */);
  free(out);
  return 0;
}

Save this as fuzz_target.cc, build with your compiler + sanitizer flags, and you have a working starting point.

Notable CVEs found by fuzzing

  • CVE-2019-12900
Start fuzzing bzip2 on Fuzze.rs →

Push the harness above + a Dockerfile. First month 50% off.