Conversation
the-mikedavis
left a comment
There was a problem hiding this comment.
I could reproduce the missing_key error by emulating the new test case ra_log_2_SUITE:writes_not_lost_after_multiple_wal_crashes/1 with a QQ in RabbitMQ.
rabbit_repro:run/0...
-module(rabbit_repro).
-export([run/0]).
run() ->
WalPid = whereis(ra_log_wal),
erlang:suspend_process(WalPid),
ok = ra:pipeline_command({'%2F_qq', node()}, rabbit_fifo:make_garbage_collection()),
exit(WalPid, kill),
wait_for_wal(WalPid),
WalPid2 = whereis(ra_log_wal),
exit(WalPid2, kill),
wait_for_wal(WalPid2),
ok = ra:pipeline_command({'%2F_qq', node()}, rabbit_fifo:make_garbage_collection()),
ok = ra:pipeline_command({'%2F_qq', node()}, rabbit_fifo:make_garbage_collection()),
timer:sleep(100),
rabbit_quorum_queue:wal_force_roll_over(node()),
ok.
wait_for_wal(OldPid) ->
ok = ra_lib:retry(fun () ->
P = whereis(ra_log_wal),
is_pid(P) andalso P =/= OldPid
end, 100, 100).I also edited the local copy of Ra to drop the ra_log:resend_from/2 call here. (Return {State0, []} instead.) To reproduce this we need the server's gap detection to fail to re-send commands. That might happen in the wild if the WAL crashes with the resend message in its mailbox, but the timing is very hard to reproduce.
- Start a broker on main with
make run-broker - Publish a few messages into a QQ named 'qq'
perf-test -x 1 -y 0 --pmessages 5 -u qq -qq - Run
rabbit_repro:run().in the shell - Dump the segment's index (
ra_log_segment:dump_index/1on the segment file path), notice the gap - Reboot or
rabbitmqctl stop_app+rabbitmqctl start_app
Index dump...
> ra_log_segment:dump_index("/tmp/rabbitmq-test-instances/rabbit@mango2/mnesia/rabbit@mango2/quorum/rabbit@mango2/2F_QQ8K3N38RGI2N8/0000000000000001.segment").
[{0,0,131080},
{1,1,131092},
{2,1,131121},
{3,1,131655},
{4,1,131838},
{5,1,132095},
{6,1,132352},
{7,1,132609},
{8,1,132866},
{9,1,133123},
%% 10 is missing here!!
{11,1,133203},
{12,1,133261},
{0,0,0},
{0,0,0},
{0,0,0},
{0,0,0},
{0,0,0},
{0,0,0},
{0,0,0},
{0,0,0},
{0,0,0},
{0,0,0},
{0,0,0},
{0,0,0},
{0,0,0},
{0,0,0},
{0,0,...},
{0,...},
{...}|...]Error logs on reboot...
2026-02-19 13:01:44.889972-05:00 [error] <0.1193.0> crasher:
2026-02-19 13:01:44.889972-05:00 [error] <0.1193.0> initial call: ra_server_proc:init/1
2026-02-19 13:01:44.889972-05:00 [error] <0.1193.0> pid: <0.1193.0>
2026-02-19 13:01:44.889972-05:00 [error] <0.1193.0> registered_name: '%2F_qq'
2026-02-19 13:01:44.889972-05:00 [error] <0.1193.0> exception exit: {missing_key,10,
2026-02-19 13:01:44.889972-05:00 [error] <0.1193.0> <<"/tmp/rabbitmq-test-instances/rabbit@mango2/mnesia/rabbit@mango2/quorum/rabbit@mango2/2F_QQ8K3
N38RGI2N8/0000000000000001.segment">>}
2026-02-19 13:01:44.889972-05:00 [error] <0.1193.0> in function ra_log_segment:fold0/8 (src/ra_log_segment.erl, line 426)
2026-02-19 13:01:44.889972-05:00 [error] <0.1193.0> in call from ra_log_reader:'-segment_fold/5-fun-1-'/4 (src/ra_log_reader.erl, line 344)
2026-02-19 13:01:44.889972-05:00 [error] <0.1193.0> in call from ra_log_reader:'-segment_fold/5-lists^foldl/2-1-'/3 (src/ra_log_reader.erl, line 341)
2026-02-19 13:01:44.889972-05:00 [error] <0.1193.0> in call from ra_log_reader:segment_fold/5 (src/ra_log_reader.erl, line 341)
2026-02-19 13:01:44.889972-05:00 [error] <0.1193.0> in call from ra_log:fold/5 (src/ra_log.erl, line 462)
2026-02-19 13:01:44.889972-05:00 [error] <0.1193.0> in call from ra_server:recover/1 (src/ra_server.erl, line 456)
2026-02-19 13:01:44.889972-05:00 [error] <0.1193.0> in call from ra_server_proc:recover/3 (src/ra_server_proc.erl, line 400)
|
I tried out this branch and rabbitmq/rabbitmq-server#13885 in the server together and the reproduction doesn't work. We can even add |
eaad846 to
bd15eae
Compare
The WAL writers map tracks the last index written by each ra server
and is used for gap detection. Previously this state was only recovered
by replaying .wal files on disk, so if the segment writer deleted a
WAL file and the WAL then crashed before the new file had any entries,
all tracking was lost. Subsequent writes with gaps would be silently
accepted, leading to gaps in segments.
Write a writers.snapshot file in the WAL directory on every roll-over.
During recovery, seed the writers map from this file before replaying
any WAL files. The snapshot is overwritten in place on each roll-over
so there is no cleanup needed.
Micro-benchmark shows < 200us overhead per roll-over for typical
deployments (up to a few hundred writers).
GC stale entries from the WAL writers map
The writers map grows monotonically -- when a ra server is deleted its
entry persists in memory and in the writers.snapshot file forever.
Add two complementary cleanup mechanisms:
1. At recovery time, trim the writers map against ra_directory to
remove entries for UIDs that are no longer registered.
2. At runtime, accept a forget_writer cast so that servers being
deleted can evict their entry immediately. Called from
ra_server_proc:terminate/3 on {shutdown, delete}.
Phase 1 is the reliable catch-all; phase 2 handles the common case
without waiting for a WAL restart.
Fix log state divergence after mem table overwrites segment range
During init, when the mem table contains entries that overwrite part of
the segment range, the segment range upper bound was not being clamped.
This caused MaxConfirmedWrittenIdx and LastSegRefIdx to be derived from
stale segment indexes, leading to incorrect last_written_index_term and
pending state after restart.
Fix ra_range:combine/2 to clamp the upper bound to AddRange's end,
since AddRange represents newer data that supersedes the tail of the
existing range. Compute a TruncSegmentRange in ra_log:init/1 that
excludes segment indexes overwritten by the mem table, and use it
when recovering MaxConfirmedWrittenIdx and LastSegRefIdx.
Harden take_after_overwrite_and_init to assert that the log overview
is consistent across close and re-init.
bd15eae to
f68c255
Compare
The WAL writers map tracks the last index written by each ra server
and is used for gap detection. Previously this state was only recovered
by replaying .wal files on disk, so if the segment writer deleted a
WAL file and the WAL then crashed before the new file had any entries,
all tracking was lost. Subsequent writes with gaps would be silently
accepted, leading to gaps in segments.
Write a writers.snapshot file in the WAL directory on every roll-over.
During recovery, seed the writers map from this file before replaying
any WAL files. The snapshot is overwritten in place on each roll-over
so there is no cleanup needed.
Micro-benchmark shows < 200us overhead per roll-over for typical
deployments (up to a few hundred writers).
GC stale entries from the WAL writers map
The writers map grows monotonically -- when a ra server is deleted its
entry persists in memory and in the writers.snapshot file forever.
Add two complementary cleanup mechanisms:
At recovery time, trim the writers map against ra_directory to
remove entries for UIDs that are no longer registered.
At runtime, accept a forget_writer cast so that servers being
deleted can evict their entry immediately. Called from
ra_server_proc:terminate/3 on {shutdown, delete}.
Phase 1 is the reliable catch-all; phase 2 handles the common case
without waiting for a WAL restart.
Fix log state divergence after mem table overwrites segment range
During init, when the mem table contains entries that overwrite part of
the segment range, the segment range upper bound was not being clamped.
This caused MaxConfirmedWrittenIdx and LastSegRefIdx to be derived from
stale segment indexes, leading to incorrect last_written_index_term and
pending state after restart.
Fix ra_range:combine/2 to clamp the upper bound to AddRange's end,
since AddRange represents newer data that supersedes the tail of the
existing range. Compute a TruncSegmentRange in ra_log:init/1 that
excludes segment indexes overwritten by the mem table, and use it
when recovering MaxConfirmedWrittenIdx and LastSegRefIdx.
Harden take_after_overwrite_and_init to assert that the log overview
is consistent across close and re-init