Skip to content

Dead loop with kSkipAnyCorruptedRecords mode selected in some cases #11968

@qiuchengxuan

Description

@qiuchengxuan

Expected behavior

With kSkipAnyCorruptedRecords mode selected, rocksdb should ignore any error and salvage as much data as possible

Actual behavior

Ceph/rocksdb hang infinitely

c9049c18-2606-40cf-9b50-fb055d3e7fd5

Steps to reproduce the behavior

After a power cut, a corrupted block produced in one of WAL logs, which contents is

00000000: d907 0226 b8e3 071b 4577 fc00 2b04 040c  ...&....Ew..+...
00000010: 40df 9eb9 bd8a c3e8 5e53 2fdf bc6e 9cc1  @.......^S/..n..
00000020: cb20 896c f554 601a 1d51 ce00 1e43 6a3e  . .l.T`..Q...Cj>
00000030: 19bd b916 f981 ee6e 9468 204b 2b7c dc8a  .......n.h K+|..
00000040: a1c3 6293 7394 2f30 5c0b 9cd4 0a72 7ce4  ..b.s./0\....r|.
00000050: 3d07 0134 bae3 0743 0404 0c40 bade 3b7d  =..4...C...@..;}
00000060: 4242 fcb5 563b 5f36 b638 94c1 d677 3002  BB..V;_6.8...w0.
00000070: 4442 db4c cf7f 4ca1 4710 23d0 e4c3 fc82  DB.L..L.G.#.....
00000080: cfeb 359d b605 840b a4d5 4114 85d3 b111  ..5.......A.....
00000090: 2970 0a7a 9703 6b9f 79b0 98d9 0701 54ba  )p.z..k.y.....T.

And before this corrupted block, a partial record exists

Thus the reproduction prodecure is:

  1. When reading previous block, a partial record exists, so in_fragmented_record_ marked as true
  2. This corrupted block read as kRecyclableMiddleType, sequence number 0x1b4577fc, which is less than expected sequence number, so returned as kOldRecord, while buffer prefix not removed
  3. Since kSkipAnyCorruptedRecords mode selected and in_fragmented_record_ marked as true, a corruption log printed and in_fragmented_record_ flag cleared
  4. Continue to ReadPhysicalRecord with the same offset, kOldRecord returned again
  5. Since in_fragmented_record_ set to false, no more log printed, and thread fall into infinite loop

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions