Improving cache lookup to reduce recomputing / searches#77259
Improving cache lookup to reduce recomputing / searches#77259mjmbischoff merged 14 commits intoelastic:masterfrom
Conversation
- Avoiding race condition while keeping cache lookup fast / non-blocking using a combination of computeIfAbsent and CompletableFuture - Invalidation of the cache is idempotent avoiding invalidating again but does require a lookup. - Cache entries are slightly bigger due to bookkeeping but seems like a good trade-off
5ba57cf to
b994286
Compare
|
Hey @mjmbischoff, thanks for putting this PR up. However I wonder whether the current approach in the PR can still lead to blocking threads? If I understand the |
|
Yes the first thread will do work. Technically all threads 'block', because as you know any call blocks, just for a very short time. The thread 'winning' out will have a little bit more work to do, although the others are blocked / parked at that point in time. This is because the The computation however is only the matter of creating an TL:DR the thread on the threadpoll of the originClient is the one that does all the work the rest is callbacks and cheap calls. Feel free to make the EnrichCoordinatorProxyAction slow to assert this. Now that I think of it I guess I could write a test for this. |
|
Ah also forgot to mention that this approach also allows to cache failures, which it now immediately invalidates but perhaps it makes sense to observe some time out to save resources. |
Apologies, I somehow thought that the remote call was done as part of other threads waiting... That is wrong. Just the creation of the |
martijnvg
left a comment
There was a problem hiding this comment.
Thanks for explaining @mjmbischoff. I left a few comments.
x-pack/plugin/enrich/src/main/java/org/elasticsearch/xpack/enrich/EnrichCache.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/enrich/src/main/java/org/elasticsearch/xpack/enrich/EnrichCache.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/enrich/src/main/java/org/elasticsearch/xpack/enrich/EnrichProcessorFactory.java
Outdated
Show resolved
Hide resolved
|
Pinging @elastic/es-data-management (Team:Data Management) |
Made the cache read-through Added / updated tests
…chUncaughtExceptionHandler
|
Think we're getting close. Added comments to the code to clarify (or to make my miss-assumptions explicit. ;-)) |
martijnvg
left a comment
There was a problem hiding this comment.
Thanks @mjmbischoff for the update. I left a few comments.
x-pack/plugin/enrich/src/main/java/org/elasticsearch/xpack/enrich/EnrichCache.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/enrich/src/main/java/org/elasticsearch/xpack/enrich/EnrichCache.java
Outdated
Show resolved
Hide resolved
| assertThat(cacheStats.getCount(), equalTo(3L)); | ||
| assertThat(cacheStats.getHits(), equalTo(0L)); | ||
| assertThat(cacheStats.getMisses(), equalTo(0L)); | ||
| assertThat(cacheStats.getMisses(), equalTo(3L)); |
There was a problem hiding this comment.
why is the expected miscount increased here?
There was a problem hiding this comment.
Without introducing a method specifically for the tests, warming the cache requires a cache miss to trigger a resolve.
The class is also final so can't do the usual trickery to break encapsulation. A specific put or constructor where we can pass in a Cache<CacheKey, CompletableFuture<SearchResponse>> cache are alternatives - I don't know whats preferred?
There was a problem hiding this comment.
I see. Maybe we should make EnrichCache non final?
|
@elasticmachine update branch |
Remove indirection as caching negative lookups is unlikely to be a good idea.
|
@martijnvg As always many thanks for reviewing! Looking at #76800 I think it's targeted for 7.16 but I'm unsure if that train has already left the station, can I leave the backporting, if any, to you? |
|
@mjmbischoff I will backport this to 7.16, there is still some time before feature freeze. |
Backporting elastic#77259 to 7.x branch. Improved the cache logic to avoid duplicate searches when multiple requests target the same, not-yet-cached, value.
Backporting #77259 to 7.x branch. Improved the cache logic to avoid duplicate searches when multiple requests target the same, not-yet-cached, value. Co-authored-by: Michael Bischoff <michael.bischoff@elastic.co>
This PR reverts the optimisation that was added via #77259. This optimisation cleverly ensures no duplicate searches happen if multiple threads concurrently execute the same search. However there are issues with the implementation that cause issues like #84781. The optimisation make use of CompletableFuture and in this case we don't check whether the result has completed exceptionally. Which causes the callback not being invoked and this leads to bulk request not being completed and hanging around. The ingest framework due to its asynchronous nature is already complex and adding CompletableFuture into the mix makes debugging these issues very time consuming. This is the main reason why we like to revert this commit.
Forwardporting elastic#85000 to master branch. This PR reverts the optimisation that was added via elastic#77259. This optimisation cleverly ensures no duplicate searches happen if multiple threads concurrently execute the same search. However there are issues with the implementation that cause issues like elastic#84781. The optimisation make use of CompletableFuture and in this case we don't check whether the result has completed exceptionally. Which causes the callback not being invoked and this leads to bulk request not being completed and hanging around. The ingest framework due to its asynchronous nature is already complex and adding CompletableFuture into the mix makes debugging these issues very time consuming. This is the main reason why we like to revert this commit.
Forwardporting #85000 to master branch. This PR reverts the optimisation that was added via #77259. This optimisation cleverly ensures no duplicate searches happen if multiple threads concurrently execute the same search. However there are issues with the implementation that cause issues like #84781. The optimisation make use of CompletableFuture and in this case we don't check whether the result has completed exceptionally. Which causes the callback not being invoked and this leads to bulk request not being completed and hanging around. The ingest framework due to its asynchronous nature is already complex and adding CompletableFuture into the mix makes debugging these issues very time consuming. This is the main reason why we like to revert this commit.
Forward port #85000 (Revert enrich cache lookup optimisation ) and #84838 (CompoundProcessor should also catch exceptions when executing a processor) to 8.1 branch. Revert enrich cache lookup optimisation (#85028) Forwardporting #85000 to master branch. This PR reverts the optimisation that was added via #77259. This optimisation cleverly ensures no duplicate searches happen if multiple threads concurrently execute the same search. However there are issues with the implementation that cause issues like #84781. The optimisation make use of CompletableFuture and in this case we don't check whether the result has completed exceptionally. Which causes the callback not being invoked and this leads to bulk request not being completed and hanging around. The ingest framework due to its asynchronous nature is already complex and adding CompletableFuture into the mix makes debugging these issues very time consuming. This is the main reason why we like to revert this commit. * CompoundProcessor should also catch exceptions when executing a processor (#84838) (#85035) Currently, CompoundProcessor does not catch Exception and if a processor throws an error and a method higher in the call stack doesn't catch the exception then pipeline execution stalls and bulk requests may not complete. Usually these exceptions are caught by IngestService#executePipelines(...) method, but when a processor executes async (for example: enrich processor) and the thread that executes enrich is no longer the original write thread then there is no logic that deals with failing pipeline execution and cleaning resources up. This then leads to memory leaks. Closes #84781 Also change how 'pipeline doesn't exist' error is thrown in TrackingResultProcessor. With the change to CompoundProcessor thrown exceptions are caught and delegated to handler. SimulateExecutionService in verbose mode ignores exceptions delegated to its handler, since it assumes that processorResultList contains the result (successful or not successful) of every processor in the pipeline. In case TrackingResultProcessor for PipelineProcessor couldn't find the mentioned pipeline then it just throws an error without updating the processorResultList. This commit addresses that.
Adressing:
// intentionally non-locking for simplicity...it's OK if we re-put the same key/value in the cache during a race condition.computeIfAbsent(..)andCompletableFuture