Avoid self-deadlock in the translog#29520
Merged
jasontedor merged 3 commits intoelastic:masterfrom Apr 15, 2018
Merged
Conversation
Today when reading an operation from the current generation fails tragically we attempt to close the translog. However, by invoking close before releasing the read lock we end up in self-deadlock because closing tries to acquire the write lock and the read lock can not be upgraded to a write lock. To avoid this, we move the close invocation outside of the try-with-resources that acquired the read lock. As an extra guard against this, we document the problem and add an assertion that we are not trying to invoke close while holding the read lock.
Collaborator
|
Pinging @elastic/es-distributed |
bleskes
approved these changes
Apr 15, 2018
| */ | ||
| private void closeOnTragicEvent(final Exception ex) { | ||
| // we can not hold a read lock here because closing will attempt to obtain a write lock and that would result in self-deadlock | ||
| assert readLock.isHeldByCurrentThread() == false : Thread.currentThread().getName(); |
ywelsch
approved these changes
Apr 15, 2018
Contributor
ywelsch
left a comment
There was a problem hiding this comment.
LGTM, thanks @jasontedor. Can you add a note to the PR description that this was introduced with #29401, i.e. affects code that was never released?
| private void closeOnTragicEvent(final Exception ex) { | ||
| // we can not hold a read lock here because closing will attempt to obtain a write lock and that would result in self-deadlock | ||
| assert readLock.isHeldByCurrentThread() == false : Thread.currentThread().getName(); | ||
| Objects.requireNonNull(ex); |
Contributor
There was a problem hiding this comment.
As we never expect a null value to be passed as parameter to this internal method, I'm not a fan of sprinkling this check here. This is defensive programming at its worst.
jasontedor
added a commit
that referenced
this pull request
Apr 15, 2018
Today when reading an operation from the current generation fails tragically we attempt to close the translog. However, by invoking close before releasing the read lock we end up in self-deadlock because closing tries to acquire the write lock and the read lock can not be upgraded to a write lock. To avoid this, we move the close invocation outside of the try-with-resources that acquired the read lock. As an extra guard against this, we document the problem and add an assertion that we are not trying to invoke close while holding the read lock.
jasontedor
added a commit
that referenced
this pull request
Apr 16, 2018
* 6.x: [TEST] REST client request without leading '/' (#29471) Prevent accidental changes of default values (#29528) [Docs] Add definitions to glossary (#29127) Avoid self-deadlock in the translog (#29520) Minor cleanup in NodeInfo.groovy Lazy configure build tasks that require older JDKs (#29519) Fix usage of NodeInfo#nodeVersion in build Control max size/count of warning headers (#28427) (#29516) Simplify snapshot check in root build file Make NodeInfo#nodeVersion strongly-typed as Version (#29515) Enable license header exclusions (#29379) Use proper Java version for BWC builds (#29493) Mute TranslogTests#testFatalIOExceptionsWhileWritingConcurrently
jasontedor
added a commit
that referenced
this pull request
Apr 16, 2018
* master: [TEST] REST client request without leading '/' (#29471) Using ObjectParser in UpdateRequest (#29293) Prevent accidental changes of default values (#29528) [Docs] Add definitions to glossary (#29127) Avoid self-deadlock in the translog (#29520) Minor cleanup in NodeInfo.groovy Lazy configure build tasks that require older JDKs (#29519) Simplify snapshot check in root build file Make NodeInfo#nodeVersion strongly-typed as Version (#29515) Enable license header exclusions (#29379) Use proper Java version for BWC builds (#29493) Mute TranslogTests#testFatalIOExceptionsWhileWritingConcurrently Enable skipping fetching latest for BWC builds (#29497)
jasontedor
added a commit
to jasontedor/elasticsearch
that referenced
this pull request
Apr 17, 2018
* master: (96 commits) TEST: Mute testEnsureWeReconnect Mute full cluster restart test recovery test REST high-level client: add support for Indices Update Settings API [take 2] (elastic#29327) Plugins: Fix native controller confirmation for non-meta plugin (elastic#29434) Remove PipelineExecutionService#executeIndexRequest (elastic#29537) Fix overflow error in parsing of long geohashes (elastic#29418) Remove unused index.ttl.disable_purge setting (elastic#29527) FullClusterRestartIT.testRecovery should wait for all initializing shards Build: Fail if any libs depend on non-core libs (elastic#29336) [TEST] REST client request without leading '/' (elastic#29471) Using ObjectParser in UpdateRequest (elastic#29293) Prevent accidental changes of default values (elastic#29528) [Docs] Add definitions to glossary (elastic#29127) Avoid self-deadlock in the translog (elastic#29520) Minor cleanup in NodeInfo.groovy Lazy configure build tasks that require older JDKs (elastic#29519) Simplify snapshot check in root build file Make NodeInfo#nodeVersion strongly-typed as Version (elastic#29515) Enable license header exclusions (elastic#29379) Use proper Java version for BWC builds (elastic#29493) ...
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Today when reading an operation from the current generation fails tragically we attempt to close the translog. However, by invoking close before releasing the read lock we end up in self-deadlock because closing tries to acquire the write lock and the read lock can not be upgraded to a write lock. To avoid this, we move the close invocation outside of the try-with-resources that acquired the read lock. As an extra guard against this, we document the problem and add an assertion that we are not trying to invoke close while holding the read lock.
Closes #29509, relates #29401