[ML] Move PyTorch request ID and cache hit indicator to top level#88901
Merged
droberts195 merged 10 commits intoelastic:mainfrom Aug 3, 2022
Merged
[ML] Move PyTorch request ID and cache hit indicator to top level#88901droberts195 merged 10 commits intoelastic:mainfrom
droberts195 merged 10 commits intoelastic:mainfrom
Conversation
This change will facilitate a performance improvement on the C++ side. The request ID and cache hit indicator are the parts that need to be changed when the C++ process responds to an inference request. Having them at the top level means we do not need to parse and manipulate the original response - we can simply cache the inner object of the response and add the outer fields around it when serializing it.
Collaborator
|
Pinging @elastic/ml-core (Team:ML) |
Author
|
Ah, the time taken should move up too, as that is calculated differently for cache hits. |
benwtrent
approved these changes
Jul 28, 2022
|
|
||
| errorCount++; | ||
|
|
||
| logger.trace(() -> format("[%s] Parsed error with id [%s]", deploymentId, errorResult.requestId())); |
Member
There was a problem hiding this comment.
Seeing how processResult is synchronized and processErrorResult isn't, errorCount can be woefully incorrect due to race conditions.
That can be fixed in a different commit. This particular change looks good.
Author
There was a problem hiding this comment.
Since it's quite a simple change I addressed it in 1bdf3a0
droberts195
added a commit
to droberts195/elasticsearch
that referenced
this pull request
Aug 3, 2022
These tests will fail if elastic/ml-cpp#2376 with them unmuted. elastic#88901 will follow up with the Java side changes.
This was referenced Aug 3, 2022
droberts195
added a commit
that referenced
this pull request
Aug 3, 2022
These tests will fail if elastic/ml-cpp#2376 with them unmuted. #88901 will follow up with the Java side changes.
droberts195
added a commit
to elastic/ml-cpp
that referenced
this pull request
Aug 3, 2022
…2376) Previously the inference cache stored complete results, including a request ID and time taken. This was inefficient as it then meant the original response had to be parsed and modified before sending back to the Java side. This PR changes the cache to store just the inner portion of the inference result. Then the outer layer is added per request after retrieving from the cache. Additionally, the result writing functions are moved into a class of their own, which means they can be unit tested. Companion to elastic/elasticsearch#88901
Author
|
@elasticmachine update branch |
This was referenced Feb 22, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This change will facilitate a performance improvement on the C++
side. The request ID and cache hit indicator are the parts that
need to be changed when the C++ process responds to an inference
request. Having them at the top level means we do not need to
parse and manipulate the original response - we can simply cache
the inner object of the response and add the outer fields around
it when serializing it.
Replaces #88485
Companion to elastic/ml-cpp#2376