Skip to content

tsdb.Open fails with invalid magic number 0 when running with reverted previously mmaped chunks  #7397

@bwplotka

Description

@bwplotka

Hi,

When starting the TSDB f4dd456 (initial version of mmap + chunks) used by Thanos Recevie we got following error:

level=error ts=2020-06-15T13:34:38.5403894Z caller=multitsdb.go:271 component=receive tenant=FB870BF3-9F3A-44FF-9BF7-D7A047A52F43 msg="failed to open tsdb" err="invalid magic number 0"
level=warn ts=2020-06-15T13:34:38.540465482Z caller=intrumentation.go:54 component=receive msg="changing probe status" status=not-ready reason="opening storage: invalid magic number 0"
level=info ts=2020-06-15T13:34:38.540508553Z caller=http.go:81 component=receive service=http/server component=receive msg="internal server shutdown" err="opening storage: invalid magic number 0"
level=info ts=2020-06-15T13:34:38.540523593Z caller=intrumentation.go:66 component=receive msg="changing probe status" status=not-healthy reason="opening storage: invalid magic number 0"
level=error ts=2020-06-15T13:34:38.540633727Z caller=main.go:211 err="invalid magic number 0\nopening storage\nmain.runReceive.func1\n\t/go/src/github.com/thanos-io/thanos/cmd/thanos/receive.go:316\ngithub.com/oklog/run.(*Group).Run.func1\n\t/go/pkg/mod/github.com/oklog/run@v1.1.0/group.go:38\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1373\nreceive command failed\nmain.main\n\t/go/src/github.com/thanos-io/thanos/cmd/thanos/main.go:211\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:203\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1373"

Repro:

  1. Deploy receive master-2020-05-25-c733564d ( TSDB cd73b3d - without mmap chunks features)
  2. Upgrade and deploy receive to master-2020-06-03-20004510 which maps TSDB upgrade from to 3268eac (mainly adds mmap chunks feature + fixes)
  3. Revert to Thanos master-2020-05-25-c733564d (so back to TSDB with no mmap chunks)
  4. Upgraded and deploy to master-2020-05-28-e7d431d3 (TSDB f4dd456 with initial mmap feature).
  5. See crash on startup.

I think we hit either lack of compatibility or some kind of partial write race case.
Also, we might want better error wraps in TSDB to ensure which file this actually relates to.

cc @codesome

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions