Skip to content

Deadlock in loadWAL() #9169

@bboreham

Description

@bboreham

After #7438, loadWAL() can hang when trying to process a duplicate series record.

Evidence:

goroutine 8 [sleep]:
time.Sleep(0xf4240)
        /usr/local/go/src/runtime/time.go:193 +0xd2
github.com/prometheus/prometheus/tsdb.(*Head).loadWAL(0xc0000f0800, 0xc076338000, 0xc0000d7e08, 0xc0005905a0, 0x0, 0x0)
        /home/vagrant/src/github.com/prometheus/prometheus/tsdb/head_wal.go:225 +0x11f7
github.com/prometheus/prometheus/tsdb.(*Head).Init(0xc0000f0800, 0x0, 0x0, 0x0)
        /home/vagrant/src/github.com/prometheus/prometheus/tsdb/head.go:513 +0xa25
github.com/prometheus/prometheus/tsdb.BenchmarkLoadFileWAL(0xc000022240)
        /home/vagrant/src/github.com/prometheus/prometheus/tsdb/loadwal_test.go:21 +0x148
testing.(*B).runN(0xc000022240, 0x1)
        /usr/local/go/src/testing/benchmark.go:192 +0xeb
testing.(*B).run1.func1(0xc000022240)
        /usr/local/go/src/testing/benchmark.go:232 +0x57
created by testing.(*B).run1
        /usr/local/go/src/testing/benchmark.go:225 +0x7f

goroutine 50 [select, 2 minutes]:
github.com/prometheus/prometheus/tsdb/wal.(*WAL).run(0xc000060000)
        /home/vagrant/src/github.com/prometheus/prometheus/tsdb/wal/wal.go:332 +0xbc
created by github.com/prometheus/prometheus/tsdb/wal.NewSize
        /home/vagrant/src/github.com/prometheus/prometheus/tsdb/wal/wal.go:301 +0x325

goroutine 86 [chan receive, 2 minutes]:
github.com/prometheus/prometheus/tsdb.(*Head).processWALSamples(0xc0000f0800, 0x0, 0xc047fb95c0, 0xc047fb9560, 0x0)
        /home/vagrant/src/github.com/prometheus/prometheus/tsdb/head_wal.go:366 +0x12f
github.com/prometheus/prometheus/tsdb.(*Head).loadWAL.func6(0xc0000f0800, 0xc075e9ad60, 0xc075e9ad70, 0xc047fb95c0, 0xc047fb9560)
        /home/vagrant/src/github.com/prometheus/prometheus/tsdb/head_wal.go:97 +0x48
created by github.com/prometheus/prometheus/tsdb.(*Head).loadWAL
        /home/vagrant/src/github.com/prometheus/prometheus/tsdb/head_wal.go:96 +0x405

goroutine 91 [chan send, 2 minutes]:
github.com/prometheus/prometheus/tsdb.(*Head).loadWAL.func8(0xc047fb9500, 0xc076338000, 0xee9cb0, 0xc075c38c30, 0xc00675f040, 0xc0069a2000, 0xc075c38c60, 0xc075c38c90, 0xc075c38cc0)
        /home/vagrant/src/github.com/prometheus/prometheus/tsdb/head_wal.go:154 +0x1f6
created by github.com/prometheus/prometheus/tsdb.(*Head).loadWAL
        /home/vagrant/src/github.com/prometheus/prometheus/tsdb/head_wal.go:126 +0x609

goroutine 88 [chan send, 2 minutes]:
github.com/prometheus/prometheus/tsdb.(*Head).processWALSamples(0xc0000f0800, 0x0, 0xc047fb9740, 0xc047fb96e0, 0x0)
        /home/vagrant/src/github.com/prometheus/prometheus/tsdb/head_wal.go:390 +0xfd
github.com/prometheus/prometheus/tsdb.(*Head).loadWAL.func6(0xc0000f0800, 0xc075e9ad60, 0xc075e9ad70, 0xc047fb9740, 0xc047fb96e0)
        /home/vagrant/src/github.com/prometheus/prometheus/tsdb/head_wal.go:97 +0x48
created by github.com/prometheus/prometheus/tsdb.(*Head).loadWAL
        /home/vagrant/src/github.com/prometheus/prometheus/tsdb/head_wal.go:96 +0x405

goroutine 87 [chan send, 2 minutes]:
github.com/prometheus/prometheus/tsdb.(*Head).processWALSamples(0xc0000f0800, 0x0, 0xc047fb9680, 0xc047fb9620, 0x0)
        /home/vagrant/src/github.com/prometheus/prometheus/tsdb/head_wal.go:390 +0xfd
github.com/prometheus/prometheus/tsdb.(*Head).loadWAL.func6(0xc0000f0800, 0xc075e9ad60, 0xc075e9ad70, 0xc047fb9680, 0xc047fb9620)
        /home/vagrant/src/github.com/prometheus/prometheus/tsdb/head_wal.go:97 +0x48
created by github.com/prometheus/prometheus/tsdb.(*Head).loadWAL
        /home/vagrant/src/github.com/prometheus/prometheus/tsdb/head_wal.go:96 +0x405

goroutine 89 [chan receive, 2 minutes]:
github.com/prometheus/prometheus/tsdb.(*Head).processWALSamples(0xc0000f0800, 0x0, 0xc047fb9800, 0xc047fb97a0, 0x0)
        /home/vagrant/src/github.com/prometheus/prometheus/tsdb/head_wal.go:366 +0x12f
github.com/prometheus/prometheus/tsdb.(*Head).loadWAL.func6(0xc0000f0800, 0xc075e9ad60, 0xc075e9ad70, 0xc047fb9800, 0xc047fb97a0)
        /home/vagrant/src/github.com/prometheus/prometheus/tsdb/head_wal.go:97 +0x48
created by github.com/prometheus/prometheus/tsdb.(*Head).loadWAL
        /home/vagrant/src/github.com/prometheus/prometheus/tsdb/head_wal.go:96 +0x405

goroutine 90 [chan receive, 2 minutes]:
github.com/prometheus/prometheus/tsdb.(*Head).loadWAL.func7(0xc075e9ad70, 0xc0000f0800, 0xc075e9ad68, 0xc00675f040, 0xc047fb9860)
        /home/vagrant/src/github.com/prometheus/prometheus/tsdb/head_wal.go:107 +0x9f
created by github.com/prometheus/prometheus/tsdb.(*Head).loadWAL
        /home/vagrant/src/github.com/prometheus/prometheus/tsdb/head_wal.go:105 +0x57d

This loop is waiting for input to be consumed:

for len(inputs[idx]) != 0 {

while the consumer, processWALSamples(), is waiting to send a buffer for re-use:

output <- samples

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions