Skip to content

fix(kvstore): Prevent data loss from crash during DB format conversion#838

Merged
mattisonchao merged 9 commits intomainfrom
fixes.datalost.conversion
Jan 7, 2026
Merged

fix(kvstore): Prevent data loss from crash during DB format conversion#838
mattisonchao merged 9 commits intomainfrom
fixes.datalost.conversion

Conversation

@mattisonchao
Copy link
Copy Markdown
Member

@mattisonchao mattisonchao commented Dec 27, 2025

fixes: #846

The database format conversion process was not atomic. If the server crashed in the middle of the conversion, the database could be left in a corrupted state, leading to data loss.

This commit makes the conversion process crash-proof by:

  1. Backing up the original database before starting the conversion.
  2. If a crash occurs, restoring the backup on the next startup.
  3. Deleting the backup only after the conversion is successfully completed.

A trap mechanism was also added to allow for simulating crashes during tests to verify the recovery process. Additionally, the trap mechanism incurs zero cost in the release mode. That code will automatically be removed by compiler optimisation.

The assembly code comparison is as follows:

# default `disable_trap=false`
	0x03dc 00988 (/Users/mattison/projects/oxia-io/oxia/oxiad/dataserver/database/kvstore/kv_pebble_formats.go:251)	MOVD	github.com/oxia-db/oxia/oxiad/dataserver/database/kvstore.oldDb-176(SP), R0
	0x03e0 00992 (/Users/mattison/projects/oxia-io/oxia/oxiad/dataserver/database/kvstore/kv_pebble_formats.go:251)	PCDATA	$1, $15
	0x03e0 00992 (/Users/mattison/projects/oxia-io/oxia/oxiad/dataserver/database/kvstore/kv_pebble_formats.go:251)	CALL	github.com/cockroachdb/pebble/v2.(*DB).Close(SB)
	0x03e4 00996 (/Users/mattison/projects/oxia-io/oxia/oxiad/dataserver/database/kvstore/kv_pebble_formats.go:251)	CBNZ	R0, 1784
	0x03e8 01000 (/Users/mattison/projects/oxia-io/oxia/oxiad/dataserver/database/kvstore/kv_pebble_formats.go:255)	MOVD	github.com/oxia-db/oxia/oxiad/dataserver/database/kvstore.newDb-168(SP), R0
	0x03ec 01004 (/Users/mattison/projects/oxia-io/oxia/oxiad/dataserver/database/kvstore/kv_pebble_formats.go:255)	PCDATA	$1, $16
	0x03ec 01004 (/Users/mattison/projects/oxia-io/oxia/oxiad/dataserver/database/kvstore/kv_pebble_formats.go:255)	CALL	github.com/cockroachdb/pebble/v2.(*DB).Close(SB)
	0x03f0 01008 (/Users/mattison/projects/oxia-io/oxia/oxiad/dataserver/database/kvstore/kv_pebble_formats.go:255)	CBNZ	R0, 1756
	0x03f4 01012 (/Users/mattison/projects/oxia-io/oxia/oxiad/dataserver/database/kvstore/kv_pebble_formats.go:259)	MOVD	github.com/oxia-db/oxia/oxiad/dataserver/database/kvstore.p(FP), R4
	0x03f8 01016 (/Users/mattison/projects/oxia-io/oxia/oxiad/dataserver/database/kvstore/kv_pebble_formats.go:259)	LDP	(R4), (R0, R1)
	0x03fc 01020 (<unknown line number>)	NOP
	0x03fc 01020 (/Users/mattison/.local/share/mise/installs/go/1.25.2/src/os/file.go:441)	MOVD	github.com/oxia-db/oxia/oxiad/dataserver/database/kvstore.~r0.ptr-520(SP), R2
	0x0400 01024 (/Users/mattison/.local/share/mise/installs/go/1.25.2/src/os/file.go:441)	MOVD	github.com/oxia-db/oxia/oxiad/dataserver/database/kvstore.~r0.len-568(SP), R3
	0x0404 01028 (/Users/mattison/.local/share/mise/installs/go/1.25.2/src/os/file.go:441)	CALL	os.rename(SB)
	0x0408 01032 (/Users/mattison/projects/oxia-io/oxia/oxiad/dataserver/database/kvstore/kv_pebble_formats.go:259)	CBNZ	R0, 1728
	0x040c 01036 (<unknown line number>)	NOP
	0x040c 01036 (/Users/mattison/projects/oxia-io/oxia/oxiad/dataserver/database/kvstore/kv_pebble_formats.go:263)	MOVD	github.com/oxia-db/oxia/oxiad/dataserver/database/kvstore.p(FP), R4
	0x0410 01040 (/Users/mattison/projects/oxia-io/oxia/oxiad/dataserver/database/kvstore/kv_pebble_formats.go:263)	MOVD	24(R4), R5
	0x0414 01044 (/Users/mattison/projects/oxia-io/oxia/oxiad/dataserver/database/kvstore/kv_trap.go:30)	CBNZ	R5, 1060
	0x0418 01048 (/Users/mattison/projects/oxia-io/oxia/oxiad/dataserver/database/kvstore/kv_pebble_formats.go:263)	MOVD	ZR, R0
	0x041c 01052 (/Users/mattison/projects/oxia-io/oxia/oxiad/dataserver/database/kvstore/kv_pebble_formats.go:263)	MOVD	ZR, R1
	0x0420 01056 (/Users/mattison/projects/oxia-io/oxia/oxiad/dataserver/database/kvstore/kv_pebble_formats.go:263)	JMP	1124
	0x0424 01060 (/Users/mattison/projects/oxia-io/oxia/oxiad/dataserver/database/kvstore/kv_trap.go:33)	MOVD	(R5), R1
	0x0428 01064 (/Users/mattison/projects/oxia-io/oxia/oxiad/dataserver/database/kvstore/kv_trap.go:33)	MOVD	$type:map[string]func() error(SB), R0
	0x0430 01072 (/Users/mattison/projects/oxia-io/oxia/oxiad/dataserver/database/kvstore/kv_trap.go:33)	MOVD	$go:string."convertCrashAfterMoveOldDb"(SB), R2
	0x0438 01080 (/Users/mattison/projects/oxia-io/oxia/oxiad/dataserver/database/kvstore/kv_trap.go:33)	MOVD	$26, R3
	0x043c 01084 (/Users/mattison/projects/oxia-io/oxia/oxiad/dataserver/database/kvstore/kv_trap.go:33)	CALL	runtime.mapaccess1_faststr(SB)
	0x0440 01088 (/Users/mattison/projects/oxia-io/oxia/oxiad/dataserver/database/kvstore/kv_trap.go:33)	MOVD	(R0), R26
	0x0444 01092 (/Users/mattison/projects/oxia-io/oxia/oxiad/dataserver/database/kvstore/kv_trap.go:33)	CBNZ	R26, 1112
	0x0448 01096 (/Users/mattison/projects/oxia-io/oxia/oxiad/dataserver/database/kvstore/kv_pebble_formats.go:267)	MOVD	github.com/oxia-db/oxia/oxiad/dataserver/database/kvstore.p(FP), R4
	0x044c 01100 (/Users/mattison/projects/oxia-io/oxia/oxiad/dataserver/database/kvstore/kv_pebble_formats.go:263)	MOVD	ZR, R0
	0x0450 01104 (/Users/mattison/projects/oxia-io/oxia/oxiad/dataserver/database/kvstore/kv_pebble_formats.go:263)	MOVD	ZR, R1
	0x0454 01108 (/Users/mattison/projects/oxia-io/oxia/oxiad/dataserver/database/kvstore/kv_pebble_formats.go:263)	JMP	1124
	0x0458 01112 (/Users/mattison/projects/oxia-io/oxia/oxiad/dataserver/database/kvstore/kv_trap.go:34)	MOVD	(R26), R0
	0x045c 01116 (/Users/mattison/projects/oxia-io/oxia/oxiad/dataserver/database/kvstore/kv_trap.go:34)	CALL	(R0)
	0x0460 01120 (/Users/mattison/projects/oxia-io/oxia/oxiad/dataserver/database/kvstore/kv_pebble_formats.go:267)	MOVD	github.com/oxia-db/oxia/oxiad/dataserver/database/kvstore.p(FP), R4
	0x0464 01124 (/Users/mattison/projects/oxia-io/oxia/oxiad/dataserver/database/kvstore/kv_pebble_formats.go:263)	CBNZ	R0, 1716
	0x0468 01128 (/Users/mattison/projects/oxia-io/oxia/oxiad/dataserver/database/kvstore/kv_pebble_formats.go:267)	LDP	(R4), (R2, R3)
	0x046c 01132 (<unknown line number>)	NOP
	0x046c 01132 (/Users/mattison/.local/share/mise/installs/go/1.25.2/src/os/file.go:441)	MOVD	github.com/oxia-db/oxia/oxiad/dataserver/database/kvstore.~r0.ptr-512(SP), R0
	0x0470 01136 (/Users/mattison/.local/share/mise/installs/go/1.25.2/src/os/file.go:441)	MOVD	github.com/oxia-db/oxia/oxiad/dataserver/database/kvstore.~r0.len-560(SP), R1
	0x0474 01140 (/Users/mattison/.local/share/mise/installs/go/1.25.2/src/os/file.go:441)	PCDATA	$1, $17
	0x0474 01140 (/Users/mattison/.local/share/mise/installs/go/1.25.2/src/os/file.go:441)	CALL	os.rename(SB)
	0x0478 01144 (/Users/mattison/projects/oxia-io/oxia/oxiad/dataserver/database/kvstore/kv_pebble_formats.go:267)	CBNZ	R0, 1688
	0x047c 01148 (<unknown line number>)	NOP
	0x047c 01148 (/Users/mattison/projects/oxia-io/oxia/oxiad/dataserver/database/kvstore/kv_pebble_formats.go:271)	MOVD	github.com/oxia-db/oxia/oxiad/dataserver/database/kvstore.p(FP), R4
	0x0480 01152 (/Users/mattison/projects/oxia-io/oxia/oxiad/dataserver/database/kvstore/kv_pebble_formats.go:271)	MOVD	24(R4), R5
	0x0484 01156 (/Users/mattison/projects/oxia-io/oxia/oxiad/dataserver/database/kvstore/kv_trap.go:30)	CBNZ	R5, 1172
	0x0488 01160 (/Users/mattison/projects/oxia-io/oxia/oxiad/dataserver/database/kvstore/kv_pebble_formats.go:271)	MOVD	ZR, R0
	0x048c 01164 (/Users/mattison/projects/oxia-io/oxia/oxiad/dataserver/database/kvstore/kv_pebble_formats.go:271)	MOVD	ZR, R1
	0x0490 01168 (/Users/mattison/projects/oxia-io/oxia/oxiad/dataserver/database/kvstore/kv_pebble_formats.go:271)	JMP	1236
	0x0494 01172 (/Users/mattison/projects/oxia-io/oxia/oxiad/dataserver/database/kvstore/kv_trap.go:33)	MOVD	(R5), R1
	0x0498 01176 (/Users/mattison/projects/oxia-io/oxia/oxiad/dataserver/database/kvstore/kv_trap.go:33)	MOVD	$type:map[string]func() error(SB), R0
	0x04a0 01184 (/Users/mattison/projects/oxia-io/oxia/oxiad/dataserver/database/kvstore/kv_trap.go:33)	MOVD	$go:string."convertCrashAfterMoveNewDb"(SB), R2
	0x04a8 01192 (/Users/mattison/projects/oxia-io/oxia/oxiad/dataserver/database/kvstore/kv_trap.go:33)	MOVD	$26, R3
	0x04ac 01196 (/Users/mattison/projects/oxia-io/oxia/oxiad/dataserver/database/kvstore/kv_trap.go:33)	CALL	runtime.mapaccess1_faststr(SB)
	0x04b0 01200 (/Users/mattison/projects/oxia-io/oxia/oxiad/dataserver/database/kvstore/kv_trap.go:33)	MOVD	(R0), R26
	0x04b4 01204 (/Users/mattison/projects/oxia-io/oxia/oxiad/dataserver/database/kvstore/kv_trap.go:33)	CBNZ	R26, 1224
	0x04b8 01208 (/Users/mattison/projects/oxia-io/oxia/oxiad/dataserver/database/kvstore/kv_pebble_formats.go:283)	MOVD	github.com/oxia-db/oxia/oxiad/dataserver/database/kvstore.p(FP), R4
	0x04bc 01212 (/Users/mattison/projects/oxia-io/oxia/oxiad/dataserver/database/kvstore/kv_pebble_formats.go:271)	MOVD	ZR, R0
	0x04c0 01216 (/Users/mattison/projects/oxia-io/oxia/oxiad/dataserver/database/kvstore/kv_pebble_formats.go:271)	MOVD	ZR, R1
	0x04c4 01220 (/Users/mattison/projects/oxia-io/oxia/oxiad/dataserver/database/kvstore/kv_pebble_formats.go:271)	JMP	1236
	0x04c8 01224 (/Users/mattison/projects/oxia-io/oxia/oxiad/dataserver/database/kvstore/kv_trap.go:34)	MOVD	(R26), R0
	0x04cc 01228 (/Users/mattison/projects/oxia-io/oxia/oxiad/dataserver/database/kvstore/kv_trap.go:34)	CALL	(R0)
	0x04d0 01232 (/Users/mattison/projects/oxia-io/oxia/oxiad/dataserver/database/kvstore/kv_pebble_formats.go:283)	MOVD	github.com/oxia-db/oxia/oxiad/dataserver/database/kvstore.p(FP), R4
	0x04d4 01236 (/Users/mattison/projects/oxia-io/oxia/oxiad/dataserver/database/kvstore/kv_pebble_formats.go:271)	CBNZ	R0, 1676
	0x04d8 01240 (<unknown line number>)	NOP
	0x04d8 01240 (/Users/mattison/.local/share/mise/installs/go/1.25.2/src/os/path.go:74)	MOVD	github.com/oxia-db/oxia/oxiad/dataserver/database/kvstore.~r0.ptr-520(SP), R0
	0x04dc 01244 (/Users/mattison/.local/share/mise/installs/go/1.25.2/src/os/path.go:74)	MOVD	github.com/oxia-db/oxia/oxiad/dataserver/database/kvstore.~r0.len-568(SP), R1
	0x04e0 01248 (/Users/mattison/.local/share/mise/installs/go/1.25.2/src/os/path.go:74)	PCDATA	$1, $18
	0x04e0 01248 (/Users/mattison/.local/share/mise/installs/go/1.25.2/src/os/path.go:74)	CALL	os.removeAll(SB)
	0x04e4 01252 (/Users/mattison/projects/oxia-io/oxia/oxiad/dataserver/database/kvstore/kv_pebble_formats.go:276)	CBZ	R0, 1284
	0x04e8 01256 (/Users/mattison/projects/oxia-io/oxia/oxiad/dataserver/database/kvstore/kv_pebble_formats.go:277)	MOVD	$go:string."failed to remove old database"(SB), R2
	0x04f0 01264 (/Users/mattison/projects/oxia-io/oxia/oxiad/dataserver/database/kvstore/kv_pebble_formats.go:277)	MOVD	$29, R3
	0x04f4 01268 (/Users/mattison/projects/oxia-io/oxia/oxiad/dataserver/database/kvstore/kv_pebble_formats.go:277)	PCDATA	$1, $19
	0x04f4 01268 (/Users/mattison/projects/oxia-io/oxia/oxiad/dataserver/database/kvstore/kv_pebble_formats.go:277)	CALL	github.com/pkg/errors.Wrap(SB)
	0x04f8 01272 (/Users/mattison/projects/oxia-io/oxia/oxiad/dataserver/database/kvstore/kv_pebble_formats.go:277)	LDP	-8(RSP), (R29, R30)
	0x04fc 01276 (/Users/mattison/projects/oxia-io/oxia/oxiad/dataserver/database/kvstore/kv_pebble_formats.go:277)	ADD	$656, RSP
	0x0500 01280 (/Users/mattison/projects/oxia-io/oxia/oxiad/dataserver/database/kvstore/kv_pebble_formats.go:277)	RET	(R30)
	0x0504 01284 (/Users/mattison/projects/oxia-io/oxia/oxiad/dataserver/database/kvstore/kv_pebble_formats.go:280)	MOVD	github.com/oxia-db/oxia/oxiad/dataserver/database/kvstore..autotmp_305-528(SP), R0
	0x0508 01288 (/Users/mattison/projects/oxia-io/oxia/oxiad/dataserver/database/kvstore/kv_pebble_formats.go:280)	MOVD	github.com/oxia-db/oxia/oxiad/dataserver/database/kvstore..autotmp_306-536(SP), R1
	0x050c 01292 (/Users/mattison/projects/oxia-io/oxia/oxiad/dataserver/database/kvstore/kv_pebble_formats.go:280)	MOVD	github.com/oxia-db/oxia/oxiad/dataserver/database/kvstore..autotmp_307-136(SP), R2
	0x0510 01296 (/Users/mattison/projects/oxia-io/oxia/oxiad/dataserver/database/kvstore/kv_pebble_formats.go:280)	PCDATA	$1, $20
	0x0510 01296 (/Users/mattison/projects/oxia-io/oxia/oxiad/dataserver/database/kvstore/kv_pebble_formats.go:280)	CALL	time.Since(SB)
	0x0514 01300 (/Users/mattison/projects/oxia-io/oxia/oxiad/dataserver/database/kvstore/kv_pebble_formats.go:283)	MOVD	github.com/oxia-db/oxia/oxiad/dataserver/database/kvstore.p(FP), R3
	0x0518 01304 (/Users/mattison/projects/oxia-io/oxia/oxiad/dataserver/database/kvstore/kv_pebble_formats.go:283)	LDP	(R3), (R4, R5)
# `disable_trap=true` (release mode)

	0x03dc 00988 (/Users/mattison/projects/oxia-io/oxia/oxiad/dataserver/database/kvstore/kv_pebble_formats.go:251)	MOVD	github.com/oxia-db/oxia/oxiad/dataserver/database/kvstore.oldDb-176(SP), R0
	0x03e0 00992 (/Users/mattison/projects/oxia-io/oxia/oxiad/dataserver/database/kvstore/kv_pebble_formats.go:251)	PCDATA	$1, $15
	0x03e0 00992 (/Users/mattison/projects/oxia-io/oxia/oxiad/dataserver/database/kvstore/kv_pebble_formats.go:251)	CALL	github.com/cockroachdb/pebble/v2.(*DB).Close(SB)
	0x03e4 00996 (/Users/mattison/projects/oxia-io/oxia/oxiad/dataserver/database/kvstore/kv_pebble_formats.go:251)	CBNZ	R0, 1580
	0x03e8 01000 (/Users/mattison/projects/oxia-io/oxia/oxiad/dataserver/database/kvstore/kv_pebble_formats.go:255)	MOVD	github.com/oxia-db/oxia/oxiad/dataserver/database/kvstore.newDb-168(SP), R0
	0x03ec 01004 (/Users/mattison/projects/oxia-io/oxia/oxiad/dataserver/database/kvstore/kv_pebble_formats.go:255)	PCDATA	$1, $16
	0x03ec 01004 (/Users/mattison/projects/oxia-io/oxia/oxiad/dataserver/database/kvstore/kv_pebble_formats.go:255)	CALL	github.com/cockroachdb/pebble/v2.(*DB).Close(SB)
	0x03f0 01008 (/Users/mattison/projects/oxia-io/oxia/oxiad/dataserver/database/kvstore/kv_pebble_formats.go:255)	CBNZ	R0, 1552
	0x03f4 01012 (/Users/mattison/projects/oxia-io/oxia/oxiad/dataserver/database/kvstore/kv_pebble_formats.go:259)	MOVD	github.com/oxia-db/oxia/oxiad/dataserver/database/kvstore.p(FP), R4
	0x03f8 01016 (/Users/mattison/projects/oxia-io/oxia/oxiad/dataserver/database/kvstore/kv_pebble_formats.go:259)	LDP	(R4), (R0, R1)
	0x03fc 01020 (<unknown line number>)	NOP
	0x03fc 01020 (/Users/mattison/.local/share/mise/installs/go/1.25.2/src/os/file.go:441)	MOVD	github.com/oxia-db/oxia/oxiad/dataserver/database/kvstore.~r0.ptr-520(SP), R2
	0x0400 01024 (/Users/mattison/.local/share/mise/installs/go/1.25.2/src/os/file.go:441)	MOVD	github.com/oxia-db/oxia/oxiad/dataserver/database/kvstore.~r0.len-568(SP), R3
	0x0404 01028 (/Users/mattison/.local/share/mise/installs/go/1.25.2/src/os/file.go:441)	CALL	os.rename(SB)
	0x0408 01032 (/Users/mattison/projects/oxia-io/oxia/oxiad/dataserver/database/kvstore/kv_pebble_formats.go:259)	CBNZ	R0, 1524
	0x040c 01036 (/Users/mattison/projects/oxia-io/oxia/oxiad/dataserver/database/kvstore/kv_pebble_formats.go:267)	MOVD	github.com/oxia-db/oxia/oxiad/dataserver/database/kvstore.p(FP), R4
	0x0410 01040 (/Users/mattison/projects/oxia-io/oxia/oxiad/dataserver/database/kvstore/kv_pebble_formats.go:267)	LDP	(R4), (R2, R3)
	0x0414 01044 (<unknown line number>)	NOP
	0x0414 01044 (/Users/mattison/.local/share/mise/installs/go/1.25.2/src/os/file.go:441)	MOVD	github.com/oxia-db/oxia/oxiad/dataserver/database/kvstore.~r0.ptr-512(SP), R0
	0x0418 01048 (/Users/mattison/.local/share/mise/installs/go/1.25.2/src/os/file.go:441)	MOVD	github.com/oxia-db/oxia/oxiad/dataserver/database/kvstore.~r0.len-560(SP), R1
	0x041c 01052 (/Users/mattison/.local/share/mise/installs/go/1.25.2/src/os/file.go:441)	PCDATA	$1, $17
	0x041c 01052 (/Users/mattison/.local/share/mise/installs/go/1.25.2/src/os/file.go:441)	CALL	os.rename(SB)
	0x0420 01056 (/Users/mattison/projects/oxia-io/oxia/oxiad/dataserver/database/kvstore/kv_pebble_formats.go:267)	CBNZ	R0, 1496
	0x0424 01060 (<unknown line number>)	NOP
	0x0424 01060 (/Users/mattison/.local/share/mise/installs/go/1.25.2/src/os/path.go:74)	MOVD	github.com/oxia-db/oxia/oxiad/dataserver/database/kvstore.~r0.ptr-520(SP), R0
	0x0428 01064 (/Users/mattison/.local/share/mise/installs/go/1.25.2/src/os/path.go:74)	MOVD	github.com/oxia-db/oxia/oxiad/dataserver/database/kvstore.~r0.len-568(SP), R1
	0x042c 01068 (/Users/mattison/.local/share/mise/installs/go/1.25.2/src/os/path.go:74)	PCDATA	$1, $18
	0x042c 01068 (/Users/mattison/.local/share/mise/installs/go/1.25.2/src/os/path.go:74)	CALL	os.removeAll(SB)
	0x0430 01072 (/Users/mattison/projects/oxia-io/oxia/oxiad/dataserver/database/kvstore/kv_pebble_formats.go:276)	CBZ	R0, 1104
	0x0434 01076 (/Users/mattison/projects/oxia-io/oxia/oxiad/dataserver/database/kvstore/kv_pebble_formats.go:277)	MOVD	$go:string."failed to remove old database"(SB), R2
	0x043c 01084 (/Users/mattison/projects/oxia-io/oxia/oxiad/dataserver/database/kvstore/kv_pebble_formats.go:277)	MOVD	$29, R3
	0x0440 01088 (/Users/mattison/projects/oxia-io/oxia/oxiad/dataserver/database/kvstore/kv_pebble_formats.go:277)	PCDATA	$1, $19
	0x0440 01088 (/Users/mattison/projects/oxia-io/oxia/oxiad/dataserver/database/kvstore/kv_pebble_formats.go:277)	CALL	github.com/pkg/errors.Wrap(SB)
	0x0444 01092 (/Users/mattison/projects/oxia-io/oxia/oxiad/dataserver/database/kvstore/kv_pebble_formats.go:277)	LDP	-8(RSP), (R29, R30)
	0x0448 01096 (/Users/mattison/projects/oxia-io/oxia/oxiad/dataserver/database/kvstore/kv_pebble_formats.go:277)	ADD	$656, RSP
	0x044c 01100 (/Users/mattison/projects/oxia-io/oxia/oxiad/dataserver/database/kvstore/kv_pebble_formats.go:277)	RET	(R30)
	0x0450 01104 (/Users/mattison/projects/oxia-io/oxia/oxiad/dataserver/database/kvstore/kv_pebble_formats.go:280)	MOVD	github.com/oxia-db/oxia/oxiad/dataserver/database/kvstore..autotmp_297-528(SP), R0
	0x0454 01108 (/Users/mattison/projects/oxia-io/oxia/oxiad/dataserver/database/kvstore/kv_pebble_formats.go:280)	MOVD	github.com/oxia-db/oxia/oxiad/dataserver/database/kvstore..autotmp_298-536(SP), R1
	0x0458 01112 (/Users/mattison/projects/oxia-io/oxia/oxiad/dataserver/database/kvstore/kv_pebble_formats.go:280)	MOVD	github.com/oxia-db/oxia/oxiad/dataserver/database/kvstore..autotmp_299-136(SP), R2
	0x045c 01116 (/Users/mattison/projects/oxia-io/oxia/oxiad/dataserver/database/kvstore/kv_pebble_formats.go:280)	PCDATA	$1, $20
	0x045c 01116 (/Users/mattison/projects/oxia-io/oxia/oxiad/dataserver/database/kvstore/kv_pebble_formats.go:280)	CALL	time.Since(SB)
	0x0460 01120 (/Users/mattison/projects/oxia-io/oxia/oxiad/dataserver/database/kvstore/kv_pebble_formats.go:283)	MOVD	github.com/oxia-db/oxia/oxiad/dataserver/database/kvstore.p(FP), R3
	0x0464 01124 (/Users/mattison/projects/oxia-io/oxia/oxiad/dataserver/database/kvstore/kv_pebble_formats.go:283)	LDP	(R3), (R4, R5)

The database format conversion process was not atomic. If the server crashed
in the middle of the conversion, the database could be left in a corrupted
state, leading to data loss.

This commit makes the conversion process crash-proof by:
1. Backing up the original database before starting the conversion.
2. If a crash occurs, restoring the backup on the next startup.
3. Deleting the backup only after the conversion is successfully completed.

A trap mechanism was also added to allow for simulating crashes during
tests to verify the recovery process.
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements crash-proof database format conversion by adding backup/restore logic to prevent data loss when the server crashes mid-conversion. The conversion process now backs up the original database, performs the conversion, and only deletes the backup after successful completion. A trap mechanism with build tags enables testing crash scenarios without runtime overhead in production builds.

Key Changes

  • Added atomic backup/restore mechanism for database conversion with recovery on startup
  • Implemented build-tag controlled trap mechanism for simulating crashes in tests
  • Updated build configuration to disable trap mechanism in release builds

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 10 comments.

Show a summary per file
File Description
oxiad/dataserver/database/kvstore/kv_trap.go Implements trap mechanism for testing crash scenarios (enabled by default)
oxiad/dataserver/database/kvstore/kv_trap_ignore.go No-op trap implementation for production builds (with disable_trap tag)
oxiad/dataserver/database/kvstore/kv_pebble_formats.go Adds backup/restore logic and recovery checks during database conversion
oxiad/dataserver/database/kvstore/kv_pebble.go Integrates trap mechanism into KV creation flow
oxiad/dataserver/database/kvstore/kv.go Adds NewKVWithTrap method to Factory interface for test injection
oxiad/dataserver/database/kvstore/kv_pebble_formats_test.go Adds tests verifying crash recovery and backup cleanup
Makefile Updates build targets to include disable_trap tag for release builds

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

Copilot AI commented Dec 28, 2025

@mattisonchao I've opened a new pull request, #839, to work on those changes. Once the pull request is ready, I'll request review from you.

@mattisonchao
Copy link
Copy Markdown
Member Author

Hi @copilot

could you help resolve outdated comments?

Copy link
Copy Markdown
Contributor

Copilot AI commented Dec 28, 2025

@mattisonchao I've opened a new pull request, #840, to work on those changes. Once the pull request is ready, I'll request review from you.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 11 comments.

Comments suppressed due to low confidence (1)

oxiad/dataserver/database/kvstore/kv_pebble_formats.go:167

  • The checkConvertDB method lacks documentation explaining its critical role in crash recovery. Consider adding a comment describing the recovery logic and the various scenarios it handles, for example: "checkConvertDB checks if database conversion is needed and handles recovery from crashes during previous conversions. It detects incomplete conversions by checking for backup databases and performs appropriate recovery or cleanup."
func (p *pebbleDbConversion) checkConvertDB(desiredEncoding compare.Encoder) error {
	dbBackPath := makeDbBackupPath(p.dbPath)

	if !pathExists(p.dbPath) {
		// No db, check if we need to recover from backup
		if !pathExists(dbBackPath) {
			// no backup, no need to convert DB
			return nil
		}
		p.log.Info("Database backup found without primary database, indicating crash during conversion")
		// recover backup and keep going
		if err := os.Rename(dbBackPath, p.dbPath); err != nil {
			return err
		}
	}

	// DB already exists
	// Check if we need to clean up the backup, as previous backup cleanup may have failed.
	if pathExists(dbBackPath) {
		p.log.Info("Database backup found alongside primary database, indicating incomplete cleanup after conversion")
		if err := os.RemoveAll(dbBackPath); err != nil {
			return err
		}
	}

	var keyEncodingMarker string
	if markerData, err := os.ReadFile(filepath.Join(p.dbPath, markerFileName)); err != nil {
		if !os.IsNotExist(err) {
			return err
		}
		// Older versions were not setting the marker
		keyEncodingMarker = keyEncodingFormatOldCompareHierarchical
	} else {
		keyEncodingMarker = string(markerData)
	}

	if keyEncodingMarker == desiredEncoding.Name() {
		// Format is already correct, nothing to do
		return nil
	}

	switch keyEncodingMarker {
	case keyEncodingFormatOldCompareHierarchical:
		confOld := p.configForOldCompareHierarchical()
		confNew := p.configForNewerFormat()
		return p.convertDb(
			confOld, compare.EncoderNatural,
			confNew, desiredEncoding)

	case compare.EncoderNatural.Name(),
		compare.EncoderHierarchical.Name():
		confOld := p.configForNewerFormat()
		confNew := p.configForNewerFormat()
		oldEncoder, err := compare.GetEncoder(keyEncodingMarker)
		if err != nil {
			return err
		}
		return p.convertDb(
			confOld, oldEncoder,
			confNew, desiredEncoding)
	default:
		p.log.Warn("Found unknown encoding type. No conversion performed",
			slog.String("keyEncodingMarker", keyEncodingMarker))
		return nil
	}
}

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Copy Markdown
Member

@coderzc coderzc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@mattisonchao mattisonchao merged commit 81e9b58 into main Jan 7, 2026
15 of 17 checks passed
@mattisonchao mattisonchao deleted the fixes.datalost.conversion branch January 7, 2026 13:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug: data lost when upgrading from 0.14.4 to 0.15.1

4 participants