Skip to main content
← All libraries
Parser · C

How to fuzz libyaml

YAML's indentation-sensitivity makes its scanner a rich target for stack-blowing inputs.

libyaml is the C backend for PyYAML and many other YAML bindings. YAML's context-sensitive grammar and anchor/alias references introduce recursive code paths that are difficult to bound statically — deeply nested aliases and infinite-expansion payloads (YAML bombs) surface denial-of-service and memory-corruption paths.

Common bug classes

  • Heap buffer overflow in scalar token buffer reallocation
  • Stack exhaustion via deeply nested YAML anchors
  • Integer overflow in token queue size arithmetic
  • Out-of-bounds read in UTF-8/UTF-16 BOM detection
  • Null dereference on unexpected STREAM-END token

Recommended setup

Fuzzers

  • AFL++
  • libFuzzer

Sanitizers

  • ASan
  • UBSan

Harness scaffold

#include <stdint.h>
#include <stddef.h>
#include <yaml.h>

int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
  yaml_parser_t parser;
  yaml_parser_initialize(&parser);
  yaml_parser_set_input_string(&parser, data, size);
  yaml_event_t event;
  while (yaml_parser_parse(&parser, &event)) {
    int done = (event.type == YAML_STREAM_END_EVENT);
    yaml_event_delete(&event);
    if (done) break;
  }
  yaml_parser_delete(&parser);
  return 0;
}

Save this as fuzz_target.cc, build with your compiler + sanitizer flags, and you have a working starting point.

Start fuzzing libyaml on Fuzze.rs →

Push the harness above + a Dockerfile. First month 50% off.