bench

package module

v0.2.0 Latest Latest Go to latest Published: Jun 29, 2025 License: MIT Imports: 14 Imported by: 2

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/kelindar/bench

Links

Open Source Insights

README ¶

kelindar/bench
Go Version

Bench: Statistical Benchmarking for Go

A small, statistical benchmarking library for Go, designed for robust, repeatable, and insightful performance analysis using state-of-the-art BCa bootstrap inference.

Analyze performance with BCa (Bias-Corrected and Accelerated) bootstrap for rigorous statistical significance
Persist results incrementally in Gob format for resilience and tracking
Compare runs and reference implementations with confidence intervals
Format output in clean, customizable tables
Configurable thresholds, sampling and other options for precise control

This library applies the bias-corrected and accelerated (BCa) bootstrap to every set of timings. It shuffles the raw measurements 10 000 times (by default), then adjusts the percentile endpoints with the bias and the acceleration. BCa is non-parametric and enjoys second-order accuracy, so its confidence interval keeps nominal coverage without assuming any particular distribution and stays stable in the presence of moderate skew.

However, good practice is 25+ independent timings; smaller n inflates the acceleration estimate and can widen intervals. Similarly, very heavy-tailed timing data can erode coverage and may need trimming or more samples.

Use When

✅ You want statistically rigorous performance comparisons between Go implementations
✅ You need publication-quality statistical analysis with confidence intervals
✅ You need incremental, resilient result saving (e.g., for CI or long runs)
✅ You want to compare against previous or reference runs with clear significance
✅ You prefer clean, readable output and easy filtering
✅ You need to assert benchmarks in CI to avoid performance regressions

Not For

❌ Micro-benchmarks where Go's built-in testing.B is sufficient
❌ Long-term, distributed, or multi-process benchmarking
❌ Profiling memory/cpu in detail (use pprof for that)

Example Output

name                 time/op      ops/s        allocs/op    vs prev             
-------------------- ------------ ------------ ------------ ------------------ 
find                 479.7 µs     2.1K         0            ✅ +65% [-33%,-24%]
sort                 47.4 ns      21.1M        1            🟰 similar

Quick Start

package main

import "github.com/kelindar/bench"

func main() {
    bench.Run(func(b *bench.B) {
        // Simple benchmark
        b.Run("benchmark name", func(i int) {
            // code to benchmark
        })

        // Benchmark with reference comparison
        b.Run("benchmark vs ref",
            func(i int) { /* our implementation */ },
            func(i int) { /* reference implementation */ })
    },
    bench.WithFile("results.json"),   // optional: set results file
    bench.WithFilter("set"),          // optional: only run benchmarks starting with "set"
    bench.WithConfidence(95.0),       // optional: set confidence level (default 99.9%)
    // Add more options as needed
    )
}

Asserting Benchmarks in CI

Use bench.Assert inside your tests to automatically fail when a benchmark regresses compared to the previously recorded results. Assertions run in dry-run mode by default and are skipped when tests are executed with the -short flag.

func TestPerformance(t *testing.T) {
    bench.Assert(t, func(b *bench.B) {
        b.Run("my-bench", func(i int) {
            // code to benchmark
        })
    }, bench.WithFile("baseline.json"))
}

Options

The benchmark runner can be customized with a set of option functions. The table below explains what each option does and how you might use it.

Option	Description
`WithFile`	Use this to pick the file where benchmark results are stored. When the filename ends with `.gob`, the data is written in a compact binary format; otherwise JSON is used. Saving results lets you track performance over time or share them between machines.
`WithFilter`	Runs only the benchmarks whose names start with the provided prefix. This is handy when your suite has many benchmarks and you only want to focus on a subset without changing your code.
`WithSamples`	Sets how many samples should be collected for each benchmark. More samples give more stable statistics but also make the run take longer, so adjust the number depending on how precise you need the measurements to be.
`WithDuration`	Controls how long each sample runs. Increase the duration when the code under test is very fast or when you want less variation between runs.
`WithReference`	Enables the reference comparison column in the output. Provide a reference implementation when calling `b.Run` and Bench will show how your code performs against that reference, making regressions easy to spot.
`WithDryRun`	Prevents the library from writing results to disk. This option is useful for quick experiments or CI jobs where you just want to see the formatted output without updating any files.
`WithConfidence`	Sets the confidence level (in percent) for significance testing. Higher values make it harder for a difference to be considered statistically significant.

About

Bench is MIT licensed and maintained by @kelindar. PRs and issues welcome!

Documentation ¶

Index ¶

func Assert(t testing.TB, fn func(*B), opts ...Option)
func Run(fn func(*B), opts ...Option)
type B
- func (r *B) Run(name string, ourFn func(i int), refFn ...func(i int)) Report
- func (r *B) RunN(name string, ourFn func(i int) int, refFn ...func(i int) int) Report
type Option
type Report
type Result

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

func Assert ¶ added in v0.2.0

func Assert(t testing.TB, fn func(*B), opts ...Option)

Assert runs benchmarks in dry-run mode and fails the test if performance regresses. It is skipped when testing is run with -short.

func Run ¶

func Run(fn func(*B), opts ...Option)

Run executes benchmarks with the given configuration

Types ¶

type B ¶

type B struct {
	// contains filtered or unexported fields
}

B manages benchmarks and handles persistence

func (*B) Run ¶

func (r *B) Run(name string, ourFn func(i int), refFn ...func(i int)) Report

Run executes a benchmark with optional reference comparison

func (*B) RunN ¶ added in v0.0.3

func (r *B) RunN(name string, ourFn func(i int) int, refFn ...func(i int) int) Report

RunN executes a benchmark where each iteration may return the number of operations performed. This allows amortizing expensive setup or batching.

type Option ¶

type Option func(*config)

Option configures the benchmark runner It mutates the internal config used by Run.

func WithConfidence ¶ added in v0.0.3

func WithConfidence(level float64) Option

WithConfidence sets the confidence level for statistical significance tests

func WithDryRun ¶ added in v0.0.3

func WithDryRun() Option

WithDryRun disables writing benchmark results to disk

func WithDuration ¶

func WithDuration(d time.Duration) Option

WithDuration sets the duration for each sample

func WithFile ¶

func WithFile(filename string) Option

WithFile sets the filename for benchmark results

func WithFilter ¶

func WithFilter(prefix string) Option

WithFilter sets a prefix filter for benchmark names

func WithReference ¶

func WithReference() Option

WithReference enables reference comparison column

func WithSamples ¶

func WithSamples(n int) Option

WithSamples sets the number of samples to collect per benchmark

type Report ¶ added in v0.1.0

type Report struct {
	Delta         float64    // Delta is the difference between the medians (new - old)
	CI            [2]float64 // Interval is the confidence interval
	MedianControl float64    // MedianControl is the median of the control group
	MedianVariant float64    // MedianVariant is the median of the variant group
	Confidence    float64    // Confidence is the confidence level (e.g., 0.95 for 95%)
	Significant   bool       // Significant indicates if the confidence interval excludes zero
	Samples       int        // Samples is the number of bootstrap samples used
}

Report represents the result of BCa Report inference

type Result ¶

type Result struct {
	Name      string    `json:"name"`
	Samples   []float64 `json:"samples"`
	Allocs    []float64 `json:"-"`
	Timestamp int64     `json:"timestamp"`
}

Result represents a single benchmark result

Source Files ¶

View all Source files

Directories ¶

Path	Synopsis
example

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL