Benchmarking Go allocations: what go test -bench doesn't show

go test -bench is useful but it shows you three numbers per benchmark: ns/op, B/op, allocs/op. Here's what it doesn't show and why those omissions matter.

Variance across runs

The three numbers are from a single run. Benchmark variance on a loaded machine can be 20-30% for operations in the microsecond range. This is enough to mask real regressions and produce false positives.

PeakFord runs each benchmark multiple times and reports the trimmed mean and standard deviation. An 8% improvement that shows a standard deviation of 15% is not an improvement — it's noise.

Goroutine and GC interactions

Benchmarks that make allocations can trigger GC cycles that don't show up in the reported ns/op but inflate it unpredictably. PeakFord records GC pause time separately and flags benchmarks where GC overhead exceeds a threshold.

For allocation-heavy benchmarks, the most informative number isn't ns/op but (ns - gc_pause)/op alongside allocs/op. Separating these shows whether an optimisation reduced compute time, allocation count, or both.

The regression threshold problem

Choosing a regression threshold is a calibration exercise. Too tight (2%) and you're chasing noise. Too loose (20%) and you miss real problems. PeakFord's default is 5%, which is intentionally conservative; you can tighten it per-benchmark for operations where you have high confidence in the baseline stability.