Skip to content

Config parameter wal_data_dir leads to crash #560

@mbj4668

Description

@mbj4668

Describe the bug

When wal_data_dir is used and is different than data_dir, the ra_log_segment_writer crashes with
a badmatch on node restart. It seems ra_log_segment_writer assumes that the wal file will be in the data_dir.

Reproduction steps

Tested on 2.17.1

$ make shell
1> ra:start([{wal_data_dir, "wal"}]).
{ok,[sasl,crypto,aten,gen_batch_server,seshat,ra]}
2>  halt().
$ make shell
1> ra:start([{wal_data_dir, "wal"}]).
=ERROR REPORT==== 4-Nov-2025::08:39:55.288171 ===
** Generic server ra_log_segment_writer terminating 
** Last message in was {'$gen_cast',{mem_tables,#{},"0000000000000001.wal"}}
** When Server state == {state,"/home/mbj/src/ra/nonode@nohost",default,
                            {write_concurrency,
                                #Ref<0.2884975887.1890189317.172623>},
                            #{max_size => 64000000,compute_checksums => true,
                              max_count => 4096,max_pending => 1024}}
** Reason for termination ==
** {{badmatch,{error,enoent}},
    [{ra_log_segment_writer,handle_cast,2,
                            [{file,"src/ra_log_segment_writer.erl"},
                             {line,180}]},
     {gen_server,try_handle_cast,3,[{file,"gen_server.erl"},{line,2371}]},
     {gen_server,handle_msg,6,[{file,"gen_server.erl"},{line,2433}]},
     {proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,329}]}]}
...

Expected behavior

I don't expect the code to crash.

Additional context

(the bug seems to be old, but the code didn't check the return value from prim_file:delete() before 24c7c00)

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions