Fix missing WAL in new manifest by rolling over the WAL deletion record from prev manifest#10892
Fix missing WAL in new manifest by rolling over the WAL deletion record from prev manifest#10892hx235 wants to merge 4 commits intofacebook:mainfrom
Conversation
b1f8684 to
c6d5247
Compare
|
@hx235 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
c6d5247 to
8977d68
Compare
|
@hx235 has updated the pull request. You must reimport the pull request before landing. |
|
@hx235 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
8977d68 to
3526676
Compare
|
@hx235 has updated the pull request. You must reimport the pull request before landing. |
|
@hx235 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
ajkr
left a comment
There was a problem hiding this comment.
Not sure yet. Isn't it odd that we write a WalAddition for 4.log when it's already obsolete, and then ignore it later?
HISTORY.md
Outdated
There was a problem hiding this comment.
Too much implementation detail IMO. How about: "For users of track_and_verify_wals_in_manifest, fixed a race condition in MANIFEST rollover (see max_manifest_file_size) leading to false positive DB::Open() failures"?
edit: Or just delete the "obsoleted WAL and recording such WAL addition of an obsoleted WAL" part; I think that's the only part in yours that is too low level
ajkr
left a comment
There was a problem hiding this comment.
Probably a straightforward thing we can add is change WriteCurrentStateToManifest() to call DeleteWalsBefore()
Ideally we would not have WalAdditions following WalDeletions affecting the same WAL, but that is a bigger problem unrelated to rollover or not - it just works fine in case no rollover happened.
3526676 to
220ba5f
Compare
|
@hx235 has updated the pull request. You must reimport the pull request before landing. |
220ba5f to
d1ca4fb
Compare
|
@hx235 has updated the pull request. You must reimport the pull request before landing. |
|
Update: addressed feedback and going thru stress test now. |
|
@hx235 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
d1ca4fb to
be3a9b0
Compare
|
@hx235 has updated the pull request. You must reimport the pull request before landing. |
1 similar comment
|
@hx235 has updated the pull request. You must reimport the pull request before landing. |
|
@hx235 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
|
@hx235 has updated the pull request. You must reimport the pull request before landing. |
9d92fdc to
b3c5745
Compare
|
@hx235 has updated the pull request. You must reimport the pull request before landing. |
|
@hx235 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
b3c5745 to
66bd569
Compare
|
@hx235 has updated the pull request. You must reimport the pull request before landing. |
db/version_set.cc
Outdated
66bd569 to
fc44539
Compare
|
@hx235 has updated the pull request. You must reimport the pull request before landing. |
|
@hx235 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
fc44539 to
745be9f
Compare
|
@hx235 has updated the pull request. You must reimport the pull request before landing. |
|
@hx235 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
…fest in VersionEditHandler::CheckIterationResult()
745be9f to
6da530d
Compare
|
@hx235 has updated the pull request. You must reimport the pull request before landing. |
|
@hx235 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
Summary: #10892 and #10955 mistakenly added two entries under sealed 7.9.history section. This PR fixes these two. No need to update 7.9 branch (https://github.com/facebook/rocksdb/blob/7.9.fb/HISTORY.md) cuz it's cut before these two PRs landed Pull Request resolved: #11013 Reviewed By: cbi42 Differential Revision: D41666514 Pulled By: hx235 fbshipit-source-id: c4bc7a29ff663664bf0be1ba1c7eab4d00a61528
…singMissingWAL) (#11186) Summary: **Context/Summary**: Simplify `TEST_F(DBWALTest, FixSyncWalOnObseletedWalWithNewManifestCausingMissingWAL)` based on #11016 (review) and delete unused sync points. Pull Request resolved: #11186 Test Plan: - UT failed before fix in #10892 and passes after - Check UT not flaky when running with https://app.circleci.com/pipelines/github/facebook/rocksdb/21985/workflows/5f6cc355-78c1-46d8-89ee-0fd679725a8a/jobs/540878 Reviewed By: ajkr Differential Revision: D43034723 Pulled By: hx235 fbshipit-source-id: f503774987b8f3718505f99e95080a7fad28ac66
Context
Options::track_and_verify_wals_in_manifest = trueverifies each of the WALs tracked in manifest indeed presents in the WAL folder. If not, a corruption "Missing WAL with log number" will be thrown.DB::SyncWAL()called at a specific timing (i.e, at theTEST_SYNC_POINT("FindObsoleteFiles::PostMutexUnlock")) can record in a new manifest the WAL addition of a WAL file that already had a WAL deletion recorded in the previous manifest.And the WAL deletion record is not rollover-ed to the new manifest. So the new manifest creates the illusion of such WAL never gets deleted and should presents at db re/open.
PurgeObsoleteFiles().As a consequence, upon
DB::Reopen(), this WAL file can be deleted while manifest still has its WAL addition record , which causes a false alarm of corruption "Missing WAL with log number" to be thrown.Summary
This PR fixes this false alarm by rolling over the WAL deletion record from prev manifest to the new manifest by adding the WAL deletion record to the new manifest.
Test
TEST_F(DBWALTest, FixSyncWalOnObseletedWalWithNewManifestCausingMissingWAL)that failed before the fix and passed after