Fixing a race condition in EnrichCoordinatorProxyAction that can leave an item stuck in its queue#90688
Merged
masseyke merged 6 commits intoelastic:mainfrom Oct 5, 2022
Conversation
…e an item in its queue
Collaborator
|
Pinging @elastic/es-data-management (Team:Data Management) |
Collaborator
|
Hi @masseyke, I've created a changelog YAML for you. |
Member
Author
|
This PR causes a few more loops in the code, but I don't think it will be a noticeable performance hit -- the additional loops are rare and fast. I ran the test ( |
Member
Author
|
@elasticmachine update branch |
jbaiera
approved these changes
Oct 5, 2022
Member
jbaiera
left a comment
There was a problem hiding this comment.
LGTM, fixed a small typo is all.
...enrich/src/main/java/org/elasticsearch/xpack/enrich/action/EnrichCoordinatorProxyAction.java
Outdated
Show resolved
Hide resolved
…ich/action/EnrichCoordinatorProxyAction.java Co-authored-by: James Baiera <james.baiera@gmail.com>
weizijun
added a commit
to weizijun/elasticsearch
that referenced
this pull request
Oct 10, 2022
* main: (150 commits) Remove ToXContent interface from ChunkedToXContent (elastic#90409) Remove extra SearchService constructor (elastic#90733) Update min version for the diagnosis yaml test (elastic#90731) Use the AggTestConfig object in testCase (elastic#90699) [DOCS] Add links to clear trained model deployment cache API (elastic#90727) Assert wildcards are not expanded as specified by request options (elastic#90641) [TEST] Fix exit snapshot restore exit condition (elastic#90696) [TEST] Change to atomic file contents save (elastic#90695) Update forbiddenapis to 3.4 (elastic#90624) [Tests] Don't use concurrent search in scripted field type tests (elastic#90712) [ML] Move scaling is possible check for starting trained model (elastic#90706) Add new base test case for chunked xcontent types (elastic#90707) Fix testRedNoBlockedIndicesAndRedAllRoleNodes (elastic#90671) Fix nullpointer in docs test setup (elastic#90660) Don't produce build logs artifact when in a composite build Fixing a race condition in EnrichCoordinatorProxyAction that can leave an item stuck in its queue (elastic#90688) docs: update fleet/agent pipeline docs (elastic#90659) [HealthAPI] Use plural consistently in resource types (elastic#90682) [Testing] Enable bwc and fix sorting for 500_date_range (elastic#90681) Add profiling and documentation for dfs phase (elastic#90536) ... # Conflicts: # x-pack/plugin/mapper-aggregate-metric/src/test/java/org/elasticsearch/xpack/aggregatemetric/mapper/AggregateDoubleMetricFieldMapperTests.java
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
There is a race condition in EnrichCoordinatorProxyAction that can result in an item being stuck in its queue even once all threads related to any
schedule()calls have completed. The item will be flushed out on the next call toschedule()but there is no guarantee if or when that will happen. This PR adds an additional check for orphaned items in the queue.Here's what I believe is happening (I can only reproduce it in fewer than 1 in 10,000 tries so I don't have direct evidence):
(Note that there are actually more threads than just the 2 I mention since coordinateLookups() makes an async call back to itself)
Closes #90598