Skip to main content
← All libraries
Parser · C

How to fuzz libxml2

The most widely used XML parser in existence — a bug here affects every Linux system.

libxml2 ships in every major Linux distribution and is the XML backbone of GNOME, PHP, Python, and countless servers. Its namespace, XPath, and entity-expansion logic form a large stateful attack surface that has historically yielded exploitable memory corruption bugs.

Common bug classes

  • Heap buffer overflow in XML namespace prefix resolution
  • Use-after-free in XPath node-set evaluation
  • Integer overflow in entity expansion depth counter
  • Out-of-bounds read in UTF-8 encoding validation
  • Null dereference on malformed DTD attribute default value

Recommended setup

Fuzzers

  • AFL++
  • libFuzzer
  • Honggfuzz

Sanitizers

  • ASan
  • UBSan

Harness scaffold

#include <stdint.h>
#include <stddef.h>
#include <libxml/parser.h>
#include <libxml/tree.h>

int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
  xmlDocPtr doc = xmlReadMemory((const char *)data, (int)size,
                                "fuzz.xml", NULL,
                                XML_PARSE_NOERROR | XML_PARSE_NOWARNING);
  if (doc) xmlFreeDoc(doc);
  xmlCleanupParser();
  return 0;
}

Save this as fuzz_target.cc, build with your compiler + sanitizer flags, and you have a working starting point.

Notable CVEs found by fuzzing

  • CVE-2022-29824
  • CVE-2023-39615
Start fuzzing libxml2 on Fuzze.rs →

Push the harness above + a Dockerfile. First month 50% off.