Fix NPE caused by race condition in async search when minimise round trips is true#117504
Merged
pawankartik-elastic merged 24 commits intoelastic:mainfrom Jan 27, 2025
Merged
Conversation
…trips is true Previously, the `notifyListShards()` initialised and updated the required pre-requisites (`searchResponse` being amongst them) when a search op began. This function takes in arguments that contain shard-specific details amongst others. Because this information is not immediately available when the search begins, it is not immediately called. In some specific cases, there can be a race condition that can cause the pre-requisities (such as `searchResponse`) to be accessed before they're initialised, causing an NPE. This fix addresses the race condition by splitting the initialisation and subsequent updation amongst 2 different methods. This way, the pre-requisities are always initialised and do not lead to an NPE.
341ec23 to
3328dae
Compare
…ize round trips is true
signify search start. To prevent polluting the progress listener with unnecessary search specific details, we now pass the `Clusters` object to `SearchTask` when a search op begins. This lets `AsyncSearchTask` access it and use it to initialise `MutableSearchResponse` appropriately.
quux00
reviewed
Jan 7, 2025
x-pack/plugin/async-search/src/main/java/org/elasticsearch/xpack/search/AsyncSearchTask.java
Outdated
Show resolved
Hide resolved
javanna
reviewed
Jan 8, 2025
x-pack/plugin/async-search/src/main/java/org/elasticsearch/xpack/search/AsyncSearchTask.java
Outdated
Show resolved
Hide resolved
.../plugin/async-search/src/main/java/org/elasticsearch/xpack/search/MutableSearchResponse.java
Show resolved
Hide resolved
quux00
reviewed
Jan 8, 2025
.../plugin/async-search/src/main/java/org/elasticsearch/xpack/search/MutableSearchResponse.java
Outdated
Show resolved
Hide resolved
Contributor
Author
|
Follow up:
Edit: After some internal discussions, it was decided to revert e7b8a7c and instead proceed with the approach in the commit 2636c74. |
Collaborator
|
Hi @pawankartik-elastic, I've created a changelog YAML for you. |
Collaborator
|
Pinging @elastic/es-search-foundations (Team:Search Foundations) |
javanna
approved these changes
Jan 21, 2025
Contributor
javanna
left a comment
There was a problem hiding this comment.
Code change LGTM, I wonder if we have enough test coverage. Didn't you work on reproducing the original issue and tests already? Perhaps we could expand unit tests in AsyncSearchTaskTests and recreate the scenario that required the fix?
x-pack/plugin/async-search/src/main/java/org/elasticsearch/xpack/search/AsyncSearchTask.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/async-search/src/main/java/org/elasticsearch/xpack/search/AsyncSearchTask.java
Outdated
Show resolved
Hide resolved
.../plugin/async-search/src/main/java/org/elasticsearch/xpack/search/MutableSearchResponse.java
Outdated
Show resolved
Hide resolved
776d9c3 to
066a805
Compare
quux00
reviewed
Jan 23, 2025
.../plugin/async-search/src/main/java/org/elasticsearch/xpack/search/MutableSearchResponse.java
Show resolved
Hide resolved
quux00
approved these changes
Jan 23, 2025
Contributor
quux00
left a comment
There was a problem hiding this comment.
LGTM - left one minor comment improvement suggestion
quux00
reviewed
Jan 24, 2025
.../plugin/async-search/src/main/java/org/elasticsearch/xpack/search/MutableSearchResponse.java
Outdated
Show resolved
Hide resolved
quux00
reviewed
Jan 24, 2025
...k/plugin/async-search/src/test/java/org/elasticsearch/xpack/search/AsyncSearchTaskTests.java
Outdated
Show resolved
Hide resolved
pawankartik-elastic
added a commit
to pawankartik-elastic/elasticsearch
that referenced
this pull request
Jan 27, 2025
…trips is true (elastic#117504) * Fix NPE caused by race condition in async search when minimise round trips is true Previously, the `notifyListShards()` initialised and updated the required pre-requisites (`searchResponse` being amongst them) when a search op began. This function takes in arguments that contain shard-specific details amongst others. Because this information is not immediately available when the search begins, it is not immediately called. In some specific cases, there can be a race condition that can cause the pre-requisities (such as `searchResponse`) to be accessed before they're initialised, causing an NPE. This fix addresses the race condition by splitting the initialisation and subsequent updation amongst 2 different methods. This way, the pre-requisities are always initialised and do not lead to an NPE. * Try: call `notifyListShards()` after `notifySearchStart()` when minimize round trips is true * Add removed code comment * Pass `Clusters` to `SearchTask` rather than using progress listener to signify search start. To prevent polluting the progress listener with unnecessary search specific details, we now pass the `Clusters` object to `SearchTask` when a search op begins. This lets `AsyncSearchTask` access it and use it to initialise `MutableSearchResponse` appropriately. * Use appropriate `clusters` object rather than re-building it * Do not double set `mutableSearchResponse` * Move mutable entities such as shard counts out of `MutableSearchResponse` * Address PR review: revert moving out mutable entities from `MutableSearchResponse` * Update docs/changelog/117504.yaml * Get rid of `SetOnce` for `searchResponse` * Drop redundant check around shards count * Add a test that calls `onListShards()` at last and clarify `updateShardsAndClusters()`'s comment * Fix test: ref count * Address review comment: rewrite comment and test
Collaborator
pawankartik-elastic
added a commit
to pawankartik-elastic/elasticsearch
that referenced
this pull request
Jan 27, 2025
…trips is true (elastic#117504) * Fix NPE caused by race condition in async search when minimise round trips is true Previously, the `notifyListShards()` initialised and updated the required pre-requisites (`searchResponse` being amongst them) when a search op began. This function takes in arguments that contain shard-specific details amongst others. Because this information is not immediately available when the search begins, it is not immediately called. In some specific cases, there can be a race condition that can cause the pre-requisities (such as `searchResponse`) to be accessed before they're initialised, causing an NPE. This fix addresses the race condition by splitting the initialisation and subsequent updation amongst 2 different methods. This way, the pre-requisities are always initialised and do not lead to an NPE. * Try: call `notifyListShards()` after `notifySearchStart()` when minimize round trips is true * Add removed code comment * Pass `Clusters` to `SearchTask` rather than using progress listener to signify search start. To prevent polluting the progress listener with unnecessary search specific details, we now pass the `Clusters` object to `SearchTask` when a search op begins. This lets `AsyncSearchTask` access it and use it to initialise `MutableSearchResponse` appropriately. * Use appropriate `clusters` object rather than re-building it * Do not double set `mutableSearchResponse` * Move mutable entities such as shard counts out of `MutableSearchResponse` * Address PR review: revert moving out mutable entities from `MutableSearchResponse` * Update docs/changelog/117504.yaml * Get rid of `SetOnce` for `searchResponse` * Drop redundant check around shards count * Add a test that calls `onListShards()` at last and clarify `updateShardsAndClusters()`'s comment * Fix test: ref count * Address review comment: rewrite comment and test
elasticsearchmachine
pushed a commit
that referenced
this pull request
Jan 27, 2025
…trips is true (#117504) (#120955) * Fix NPE caused by race condition in async search when minimise round trips is true Previously, the `notifyListShards()` initialised and updated the required pre-requisites (`searchResponse` being amongst them) when a search op began. This function takes in arguments that contain shard-specific details amongst others. Because this information is not immediately available when the search begins, it is not immediately called. In some specific cases, there can be a race condition that can cause the pre-requisities (such as `searchResponse`) to be accessed before they're initialised, causing an NPE. This fix addresses the race condition by splitting the initialisation and subsequent updation amongst 2 different methods. This way, the pre-requisities are always initialised and do not lead to an NPE. * Try: call `notifyListShards()` after `notifySearchStart()` when minimize round trips is true * Add removed code comment * Pass `Clusters` to `SearchTask` rather than using progress listener to signify search start. To prevent polluting the progress listener with unnecessary search specific details, we now pass the `Clusters` object to `SearchTask` when a search op begins. This lets `AsyncSearchTask` access it and use it to initialise `MutableSearchResponse` appropriately. * Use appropriate `clusters` object rather than re-building it * Do not double set `mutableSearchResponse` * Move mutable entities such as shard counts out of `MutableSearchResponse` * Address PR review: revert moving out mutable entities from `MutableSearchResponse` * Update docs/changelog/117504.yaml * Get rid of `SetOnce` for `searchResponse` * Drop redundant check around shards count * Add a test that calls `onListShards()` at last and clarify `updateShardsAndClusters()`'s comment * Fix test: ref count * Address review comment: rewrite comment and test
elasticsearchmachine
pushed a commit
that referenced
this pull request
Jan 27, 2025
…trips is true (#117504) (#120954) * Fix NPE caused by race condition in async search when minimise round trips is true Previously, the `notifyListShards()` initialised and updated the required pre-requisites (`searchResponse` being amongst them) when a search op began. This function takes in arguments that contain shard-specific details amongst others. Because this information is not immediately available when the search begins, it is not immediately called. In some specific cases, there can be a race condition that can cause the pre-requisities (such as `searchResponse`) to be accessed before they're initialised, causing an NPE. This fix addresses the race condition by splitting the initialisation and subsequent updation amongst 2 different methods. This way, the pre-requisities are always initialised and do not lead to an NPE. * Try: call `notifyListShards()` after `notifySearchStart()` when minimize round trips is true * Add removed code comment * Pass `Clusters` to `SearchTask` rather than using progress listener to signify search start. To prevent polluting the progress listener with unnecessary search specific details, we now pass the `Clusters` object to `SearchTask` when a search op begins. This lets `AsyncSearchTask` access it and use it to initialise `MutableSearchResponse` appropriately. * Use appropriate `clusters` object rather than re-building it * Do not double set `mutableSearchResponse` * Move mutable entities such as shard counts out of `MutableSearchResponse` * Address PR review: revert moving out mutable entities from `MutableSearchResponse` * Update docs/changelog/117504.yaml * Get rid of `SetOnce` for `searchResponse` * Drop redundant check around shards count * Add a test that calls `onListShards()` at last and clarify `updateShardsAndClusters()`'s comment * Fix test: ref count * Address review comment: rewrite comment and test
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Previously, the
notifyListShards()initialised and updated the required pre-requisites (searchResponsebeing amongst them) when a search op began. This function takes in arguments that contain shard-specific details amongst others. Because this information is not immediately available when the search begins, it is not immediately called. In some specific cases, there can be a race condition that can cause the pre-requisities (such assearchResponse) to be accessed before they're initialised, causing an NPE.This fix addresses the race condition by ensuring that
MutableSearchResponseis instantiated right from the beginning and instead populating the shards count andclustersviaonListShards().