Reading a Coverage Report — What Branch and Edge Coverage Actually Mean

A fuzzing campaign produces two artifacts: crashes and a corpus. The corpus tells you what code you have exercised; the coverage report translates that into something you can read and act on. Understanding how to read a coverage report — and what the numbers actually measure — is the difference between a campaign that runs until the disk fills and one that systematically discovers the uncovered paths that likely contain bugs.

This post covers three coverage reporting workflows: AFL++'s built-in edge bitmap, afl-cov with LCOV/genhtml for human-readable HTML, and Clang's source-based coverage for per-branch and per-region data. The underlying concepts — line coverage, branch coverage, and edge coverage — differ mathematically and each answers a different question.

Line Coverage vs Branch Coverage vs Edge Coverage

Line coverage (or statement coverage) records which source lines executed at least once. It is the simplest metric and the most misleading: a line with an if statement is reported as covered if the line executed — regardless of whether both the true and false branches were taken.

Branch coverage refines this by tracking both outcomes of every conditional. An if (x > 0) statement has two branches: the true path and the false path. Branch coverage is 100% only when both have been exercised. This is more useful than line coverage for finding inputs that exercise error paths and boundary conditions.

Edge coverage goes further still. In a control-flow graph, edges are the transitions between basic blocks — the directed arcs that represent each possible execution path from one block to the next. An if/else if/else chain with three branches produces at least three edges from the conditional block; the fallthrough case produces an additional edge back to the continuation block. Edge coverage is what coverage-guided fuzzers actually optimize for, because two inputs that take the same branches but arrive via different predecessor blocks are tracked as distinct.

The mathematical relationship: line coverage ≤ branch coverage ≤ edge coverage. A corpus at 95% branch coverage may still have uncovered edges because it never exercises the same branch from two different caller contexts.

How AFL++ Tracks Edge Coverage: The 64 KB Bitmap

AFL++ tracks coverage using a 64 KB shared-memory bitmap. Each byte in the bitmap represents an edge (a source block → destination block pair). When the instrumented target executes a branch, it sets or increments the bitmap byte at the index corresponding to that edge. The index is derived from a hash of the source and destination block IDs — which means two distinct edges can map to the same byte (a collision).

The 64 KB bitmap holds 65,536 entries. For large targets with tens of thousands of edges, the collision probability becomes significant. When two edges share a byte, covering one looks (to the fuzzer) like covering both — edge deduplication degrades. AFL++ addresses this in LTO mode (afl-clang-lto), which assigns collision-free edge IDs at link time. For targets where the bitmap routinely fills above 60–70% density (visible in AFL++'s status screen as the map saturation percentage), switch to LTO mode or increase the map size with AFL_MAP_SIZE.

AFL++ also distinguishes edge frequency: the bitmap byte is a hit count (bucketed into power-of-two ranges: 1, 2, 3, 4–7, 8–15, 16–31, 32–127, 128+). An edge that executes 5 times in one input is treated as different from an edge that executes 3 times — so the corpus captures not just which edges are reachable but which execution counts are reachable. This is the greybox fuzzing feedback loop in practice.

The afl-cov Workflow

AFL++'s internal bitmap is a fast runtime metric, not a human-readable report. The third-party afl-cov tool (Mike Rash's project — separate from AFL++ itself, available from the afl-cov GitHub repository) bridges the gap: it replays your corpus through a separately gcov-instrumented binary, accumulates the .gcda data, and drives lcov / genhtml to render an HTML report. Do not confuse it with afl-showmap, which is a first-party AFL++ tool that reads the AFL++ edge bitmap directly — they answer different questions (source-line coverage vs edge-bitmap coverage).

# Prerequisites: build a SEPARATE non-AFL binary with gcov instrumentation.
# afl-cov re-runs each corpus input through this gcov binary and reads .gcda files;
# the binary does not need AFL instrumentation (afl-cov drives it directly).
clang --coverage -g -O0 -o my_parser_cov my_parser.c
# (--coverage implies -fprofile-arcs -ftest-coverage; no need to pass both.)

# Run afl-cov against your AFL++ findings directory.
# AFL_FILE is a literal placeholder afl-cov substitutes per corpus input.
afl-cov -d findings/ --coverage-cmd "./my_parser_cov AFL_FILE" \
  --code-dir /path/to/source/ --overwrite

# Output: findings/cov/web/index.html (afl-cov drives genhtml internally),
# plus findings/cov/lcov/trace.lcov_info for the raw LCOV trace.

# To regenerate HTML manually from the LCOV trace:
genhtml findings/cov/lcov/trace.lcov_info \
  --output-directory findings/cov/html/

The HTML output shows each source file as a table with line execution counts in the gutter. Lines that were never executed are highlighted in red; lines executed at least once are in blue/green; partially-covered branches show a split highlight. The branch coverage column in the per-file summary shows the number of taken/total branch outcomes.

Reading genhtml HTML Output

Open index.html and look at the directory-level summary first. Files with low line coverage that process input are your first targets for corpus work. A file with 20% line coverage means 80% of the code in that file was never executed by any input in your corpus.

When you drill into a file, the branch summary is more informative than the line summary. Scan for:

Red lines in error-handling code. Uncovered error paths in input parsing are the highest-value targets — they often contain the least-tested code and the most likely memory-management bugs.
Partial branch coverage on size/length checks. A check like if (len > MAX_SIZE) that is always false means your corpus never supplied an input large enough to hit the overflow guard — and therefore never tested whether the guard is correct.
Dead code regions. Functions with 0 calls that are accessible from the input path are almost certainly unreachable with your current seed corpus. Add a seed that explicitly exercises that code path.

Source-Based Coverage with Clang

Clang's source-based coverage (-fprofile-instr-generate -fcoverage-mapping) is a different system from GCC-style gcov. It instruments the binary at the compiler's intermediate representation (IR) level, producing coverage data at sub-statement granularity — individual expressions within a line can be reported as covered or uncovered. This makes it more precise than LCOV for understanding exactly which sub-expressions within a compound conditional were evaluated.

# Build with source-based coverage instrumentation (Clang only)
# This is distinct from GCC gcov-style coverage — it produces .profraw files
clang -fprofile-instr-generate -fcoverage-mapping \
  -o my_parser_profcov my_parser.c

# Merge profiling data from all corpus runs
llvm-profdata merge -sparse corpus/*.profraw -o merged.profdata

# Generate HTML coverage report
llvm-cov show ./my_parser_profcov \
  -instr-profile=merged.profdata \
  -format=html \
  -output-dir=coverage_html/ \
  -show-branches=count \
  -show-line-counts-or-regions

# Per-function summary to stdout
llvm-cov report ./my_parser_profcov -instr-profile=merged.profdata

The llvm-cov show report annotates each line with execution counts and, with -show-branches=count, shows both the taken and not-taken counts for every branch. A branch showing [0,14] means the condition was false zero times and true 14 times — the false branch is a gap in your corpus.

llvm-cov report (without show) prints a tabular summary suitable for CI gates: lines hit/total, functions hit/total, regions hit/total, branches hit/total, per file. This is what you would check in a CI step to enforce a minimum coverage threshold after adding new corpus entries.

Spotting Cold Paths That Need Explicit Corpus Entries

Not all uncovered paths are reachable from random mutations of your existing corpus. Some require a specific structure that the fuzzer is unlikely to generate without guidance. Patterns that consistently show up as cold in coverage reports:

Alternative codec/format branches. A parser that handles multiple file format versions behind a version-byte switch will only exercise the version present in your seeds. Add one seed per version.
Error recovery paths. Many parsers have a "resync after error" code path that is only reachable after a parse error occurs in a specific location. Craft a seed that deliberately triggers the error, then saves the state.
Conditional features gated on build-time flags. If the target was compiled with FEATURE_X=1, coverage data for the feature-gated paths only appears in builds that enable that flag. Build one instrumented binary per significant compile-time configuration.

When Coverage Plateaus

Coverage growth eventually stalls — this is normal and expected. After the initial rapid growth in the first hours of a campaign, the marginal coverage gain per CPU-hour drops steadily. A plateau does not mean there are no more bugs; it means the mutational space reachable from your current corpus has been reasonably exhausted.

Approaches when coverage stagnates:

Add domain-specific seeds. If the target parses a binary format, obtain real-world examples from public datasets or deliberately craft inputs that exercise specific code paths you identified in the coverage report.
Enable dictionary-assisted fuzzing. Provide AFL++ or libFuzzer with a dictionary of format-specific tokens (magic bytes, field names, known lengths). Both fuzzers use the dictionary in their havoc stage to construct inputs that pass shallow validation guards.
Switch to structure-aware fuzzing. If the format is highly structured (protobuf, ASN.1, XML), random byte mutations will hit validation checks early and fail to explore message-handling logic. Structure-aware fuzzing bypasses the validity check entirely.
Compare two corpora. Run afl-showmap on your current corpus and on a corpus from a different fuzzer or a different starting seed. The set difference (edges in one but not the other) shows you what each campaign is finding that the other is not. Merge the two corpora and re-run afl-cmin to produce a deduplicated minimal corpus that covers both edge sets.

Comparing Two Corpora's Coverage

A practical workflow for verifying that a new corpus represents genuine progress over an old one:

# View AFL++ edge coverage summary after a campaign
# -C asks afl-showmap to print collision and coverage statistics on stderr.
afl-showmap -C -i findings/default/queue/ -o /dev/null -- ./my_parser @@

# Produce a bitmap file for two corpora and compare them
afl-showmap -i corpus_a/ -o bitmap_a.bin -- ./my_parser @@
afl-showmap -i corpus_b/ -o bitmap_b.bin -- ./my_parser @@

# A non-zero byte in the bitmap means the edge was hit at least once.
# Count distinct hit edges (non-zero bytes) per bitmap:
python3 -c 'import sys; d=open(sys.argv[1],"rb").read(); print(sys.argv[1], sum(1 for b in d if b))' bitmap_a.bin
python3 -c 'import sys; d=open(sys.argv[1],"rb").read(); print(sys.argv[1], sum(1 for b in d if b))' bitmap_b.bin

# afl-cmin produces the minimal corpus that covers the same edge set
afl-cmin -i findings/default/queue/ -o corpus_min/ -- ./my_parser @@

Each bitmap file is the literal 64 KB shared-memory map written to disk; the count of non-zero bytes is the number of distinct edges hit across that corpus. A corpus with more non-zero bytes covers more edges. afl-showmap -C prints collision and coverage statistics to stderr during the run, which is convenient when you do not want to post-process the bitmap yourself. If corpus B has 15% more edges than corpus A after the same wall-clock fuzzing time, that is meaningful progress — more edges typically means a broader instrumented surface area explored and a higher probability of reaching a latent bug.

Coverage metrics are tools, not goals. A fuzzer at 95% branch coverage that has been running for three days is not necessarily better at finding bugs than a fuzzer at 70% coverage that is still exploring new paths. The actionable signal is: are new edges being discovered? If yes, keep running. If no, change something — the corpus, the dictionary, the fuzzer, or the target decomposition.