Fix data race to VersionSet::io_status_#7034
Fix data race to VersionSet::io_status_#7034riversand963 wants to merge 2 commits intofacebook:masterfrom
Conversation
facebook-github-bot
left a comment
There was a problem hiding this comment.
@riversand963 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
|
Just double check, if the version->io_status_ is not ok, resume will be called. During this process, if ProcessManifestWrites is successful, io_status_ will be change back to IOStatus::OK() (the else case test io_status_.IsIOError()). Therefore, we do not need the reset anymore. |
That's correct. |
fb3faee to
0f9daf1
Compare
|
@riversand963 has updated the pull request. Re-import the pull request |
facebook-github-bot
left a comment
There was a problem hiding this comment.
@riversand963 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
zhichao-cao
left a comment
There was a problem hiding this comment.
LGTM. Thanks for the fix!
|
hmm, I think there are more data races to |
Summary: Currently, VersionSet::io_status_ can be concurrently accessed by multiple threads without lock, causing tsan test to fail. For example, a bg flush thread resets io_status_ before calling LogAndApply(), while another thread already in the process of LogAndApply() reads io_status_. We do not have to reset io_status_ each time we call LogAndApply(). io_status_ is part of the state of VersionSet, and it indicates the outcome of preceding MANIFEST/CURRENT files IO operations. Its value should be updated only when: 1. MANIFEST/CURRENT files IO fail for the first time. 2. MANIFEST/CURRENT files IO succeed as part of recovering from a prior failure without process restart, e.g. calling Resume(). Test Plan (devserver): COMPILE_WITH_TSAN=1 make check COMPILE_WITH_TSAN=1 make db_test2 ./db_test2 --gtest_filter=DBTest2.CompactionStall
0f9daf1 to
66d08fc
Compare
|
@riversand963 has updated the pull request. Re-import the pull request |
|
@zhichao-cao Thanks for the previous review. Could you take another look? |
facebook-github-bot
left a comment
There was a problem hiding this comment.
@riversand963 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
| mu->Lock(); | ||
| } | ||
|
|
||
| if (!io_s.ok()) { |
There was a problem hiding this comment.
So the general fix logic is, io_status_ is only updated here. In other places, we use local variable to keep the status in e.g., io_s.
There was a problem hiding this comment.
Yes, access to io_status_ should be protected by the db_mutex.
There was a problem hiding this comment.
Yeah, I notice that WriteCurrentStateToManifest is not protected by the db_mutex
|
Thanks for the fix. I think it is good to go now. |
|
@riversand963 merged this pull request in d47c871. |
Summary: After #6949 , VersionSet::io_status_ can be concurrently accessed by multiple threads without lock, causing tsan test to fail. For example, a bg flush thread resets io_status_ before calling LogAndApply(), while another thread already in the process of LogAndApply() reads io_status_. This is a bug. We do not have to reset io_status_ each time we call LogAndApply(). io_status_ is part of the state of VersionSet, and it indicates the outcome of preceding MANIFEST/CURRENT files IO operations. Its value should be updated only when: 1. MANIFEST/CURRENT files IO fail for the first time. 2. MANIFEST/CURRENT files IO succeed as part of recovering from a prior failure without process restart, e.g. calling Resume(). Test Plan (devserver): COMPILE_WITH_TSAN=1 make check COMPILE_WITH_TSAN=1 make db_test2 ./db_test2 --gtest_filter=DBTest2.CompactionStall Pull Request resolved: #7034 Reviewed By: zhichao-cao Differential Revision: D22247137 Pulled By: riversand963 fbshipit-source-id: 77b83e05390f3ee3cd2d96d3fdd6fe4f225e3216
Summary:
After #6949 , VersionSet::io_status_ can be concurrently accessed by multiple
threads without lock, causing tsan test to fail. For example, a bg flush thread
resets io_status_ before calling LogAndApply(), while another thread already in
the process of LogAndApply() reads io_status_. This is a bug.
We do not have to reset io_status_ each time we call LogAndApply(). io_status_
is part of the state of VersionSet, and it indicates the outcome of preceding
MANIFEST/CURRENT files IO operations. Its value should be updated only when:
failure without process restart, e.g. calling Resume().
Test Plan (devserver):
COMPILE_WITH_TSAN=1 make check
COMPILE_WITH_TSAN=1 make db_test2
./db_test2 --gtest_filter=DBTest2.CompactionStall