Skip to content

Fix data race to VersionSet::io_status_#7034

Closed
riversand963 wants to merge 2 commits intofacebook:masterfrom
riversand963:tsan-fix-vset
Closed

Fix data race to VersionSet::io_status_#7034
riversand963 wants to merge 2 commits intofacebook:masterfrom
riversand963:tsan-fix-vset

Conversation

@riversand963
Copy link
Copy Markdown
Contributor

Summary:
After #6949 , VersionSet::io_status_ can be concurrently accessed by multiple
threads without lock, causing tsan test to fail. For example, a bg flush thread
resets io_status_ before calling LogAndApply(), while another thread already in
the process of LogAndApply() reads io_status_. This is a bug.

We do not have to reset io_status_ each time we call LogAndApply(). io_status_
is part of the state of VersionSet, and it indicates the outcome of preceding
MANIFEST/CURRENT files IO operations. Its value should be updated only when:

  1. MANIFEST/CURRENT files IO fail for the first time.
  2. MANIFEST/CURRENT files IO succeed as part of recovering from a prior
    failure without process restart, e.g. calling Resume().

Test Plan (devserver):
COMPILE_WITH_TSAN=1 make check
COMPILE_WITH_TSAN=1 make db_test2
./db_test2 --gtest_filter=DBTest2.CompactionStall

Copy link
Copy Markdown
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@riversand963 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@zhichao-cao
Copy link
Copy Markdown
Contributor

Just double check, if the version->io_status_ is not ok, resume will be called. During this process, if ProcessManifestWrites is successful, io_status_ will be change back to IOStatus::OK() (the else case test io_status_.IsIOError()). Therefore, we do not need the reset anymore.

@riversand963
Copy link
Copy Markdown
Contributor Author

Just double check, if the version->io_status_ is not ok, resume will be called. During this process, if ProcessManifestWrites is successful, io_status_ will be change back to IOStatus::OK() (the else case test io_status_.IsIOError()). Therefore, we do not need the reset anymore.

That's correct.

@facebook-github-bot
Copy link
Copy Markdown
Contributor

@riversand963 has updated the pull request. Re-import the pull request

Copy link
Copy Markdown
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@riversand963 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Copy link
Copy Markdown
Contributor

@zhichao-cao zhichao-cao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for the fix!

@riversand963
Copy link
Copy Markdown
Contributor Author

hmm, I think there are more data races to VersionSet::io_status_. Let me update.

@riversand963 riversand963 added the WIP Work in progress label Jun 26, 2020
Summary:
Currently, VersionSet::io_status_ can be concurrently accessed by multiple
threads without lock, causing tsan test to fail. For example, a bg flush thread
resets io_status_ before calling LogAndApply(), while another thread already in
the process of LogAndApply() reads io_status_.
We do not have to reset io_status_ each time we call LogAndApply(). io_status_
is part of the state of VersionSet, and it indicates the outcome of preceding
MANIFEST/CURRENT files IO operations. Its value should be updated only when:

1. MANIFEST/CURRENT files IO fail for the first time.
2. MANIFEST/CURRENT files IO succeed as part of recovering from a prior
   failure without process restart, e.g. calling Resume().

Test Plan (devserver):
COMPILE_WITH_TSAN=1 make check
COMPILE_WITH_TSAN=1 make db_test2
./db_test2 --gtest_filter=DBTest2.CompactionStall
@facebook-github-bot
Copy link
Copy Markdown
Contributor

@riversand963 has updated the pull request. Re-import the pull request

@riversand963 riversand963 requested a review from zhichao-cao June 26, 2020 21:44
@riversand963
Copy link
Copy Markdown
Contributor Author

@zhichao-cao Thanks for the previous review. Could you take another look?

Copy link
Copy Markdown
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@riversand963 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@riversand963 riversand963 removed the WIP Work in progress label Jun 26, 2020
mu->Lock();
}

if (!io_s.ok()) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the general fix logic is, io_status_ is only updated here. In other places, we use local variable to keep the status in e.g., io_s.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, access to io_status_ should be protected by the db_mutex.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I notice that WriteCurrentStateToManifest is not protected by the db_mutex

@zhichao-cao
Copy link
Copy Markdown
Contributor

Thanks for the fix. I think it is good to go now.

@facebook-github-bot
Copy link
Copy Markdown
Contributor

@riversand963 merged this pull request in d47c871.

@riversand963 riversand963 deleted the tsan-fix-vset branch June 27, 2020 23:31
ajkr pushed a commit that referenced this pull request Jul 9, 2020
Summary:
After #6949 , VersionSet::io_status_ can be concurrently accessed by multiple
threads without lock, causing tsan test to fail. For example, a bg flush thread
resets io_status_ before calling LogAndApply(), while another thread already in
the process of LogAndApply() reads io_status_. This is a bug.

We do not have to reset io_status_ each time we call LogAndApply(). io_status_
is part of the state of VersionSet, and it indicates the outcome of preceding
MANIFEST/CURRENT files IO operations. Its value should be updated only when:

1. MANIFEST/CURRENT files IO fail for the first time.
2. MANIFEST/CURRENT files IO succeed as part of recovering from a prior
   failure without process restart, e.g. calling Resume().

Test Plan (devserver):
COMPILE_WITH_TSAN=1 make check
COMPILE_WITH_TSAN=1 make db_test2
./db_test2 --gtest_filter=DBTest2.CompactionStall
Pull Request resolved: #7034

Reviewed By: zhichao-cao

Differential Revision: D22247137

Pulled By: riversand963

fbshipit-source-id: 77b83e05390f3ee3cd2d96d3fdd6fe4f225e3216
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants