← All libraries
Parser · C
How to fuzz libyaml
YAML's indentation-sensitivity makes its scanner a rich target for stack-blowing inputs.
libyaml is the C backend for PyYAML and many other YAML bindings. YAML's context-sensitive grammar and anchor/alias references introduce recursive code paths that are difficult to bound statically — deeply nested aliases and infinite-expansion payloads (YAML bombs) surface denial-of-service and memory-corruption paths.
Common bug classes
- •Heap buffer overflow in scalar token buffer reallocation
- •Stack exhaustion via deeply nested YAML anchors
- •Integer overflow in token queue size arithmetic
- •Out-of-bounds read in UTF-8/UTF-16 BOM detection
- •Null dereference on unexpected STREAM-END token
Recommended setup
Fuzzers
- → AFL++
- → libFuzzer
Sanitizers
- → ASan
- → UBSan
Harness scaffold
#include <stdint.h>
#include <stddef.h>
#include <yaml.h>
int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
yaml_parser_t parser;
yaml_parser_initialize(&parser);
yaml_parser_set_input_string(&parser, data, size);
yaml_event_t event;
while (yaml_parser_parse(&parser, &event)) {
int done = (event.type == YAML_STREAM_END_EVENT);
yaml_event_delete(&event);
if (done) break;
}
yaml_parser_delete(&parser);
return 0;
}Save this as fuzz_target.cc, build with your compiler + sanitizer flags, and you have a working starting point.
Push the harness above + a Dockerfile. First month 50% off.