← All libraries
Parser · C
How to fuzz libxml2
The most widely used XML parser in existence — a bug here affects every Linux system.
libxml2 ships in every major Linux distribution and is the XML backbone of GNOME, PHP, Python, and countless servers. Its namespace, XPath, and entity-expansion logic form a large stateful attack surface that has historically yielded exploitable memory corruption bugs.
Common bug classes
- •Heap buffer overflow in XML namespace prefix resolution
- •Use-after-free in XPath node-set evaluation
- •Integer overflow in entity expansion depth counter
- •Out-of-bounds read in UTF-8 encoding validation
- •Null dereference on malformed DTD attribute default value
Recommended setup
Fuzzers
- → AFL++
- → libFuzzer
- → Honggfuzz
Sanitizers
- → ASan
- → UBSan
Harness scaffold
#include <stdint.h>
#include <stddef.h>
#include <libxml/parser.h>
#include <libxml/tree.h>
int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
xmlDocPtr doc = xmlReadMemory((const char *)data, (int)size,
"fuzz.xml", NULL,
XML_PARSE_NOERROR | XML_PARSE_NOWARNING);
if (doc) xmlFreeDoc(doc);
xmlCleanupParser();
return 0;
}Save this as fuzz_target.cc, build with your compiler + sanitizer flags, and you have a working starting point.
Notable CVEs found by fuzzing
- → CVE-2022-29824
- → CVE-2023-39615
Push the harness above + a Dockerfile. First month 50% off.