[ML] Change inference cache to store only the inner part of results#2376
Merged
droberts195 merged 6 commits intoelastic:mainfrom Aug 3, 2022
Merged
[ML] Change inference cache to store only the inner part of results#2376droberts195 merged 6 commits intoelastic:mainfrom
droberts195 merged 6 commits intoelastic:mainfrom
Conversation
Hacky work in progress to test - do not review yet Previously the inference cache stored complete results, including a request ID and time taken. This was inefficient as it then meant the original response had to be parsed and modified before sending back to the Java side. This PR changes the cache to store just the inner portion of the inference result. Then the outer layer is added per request after retrieving from the cache.
droberts195
added a commit
to droberts195/elasticsearch
that referenced
this pull request
Aug 3, 2022
These tests will fail if elastic/ml-cpp#2376 with them unmuted. elastic#88901 will follow up with the Java side changes.
This was referenced Aug 3, 2022
droberts195
added a commit
to elastic/elasticsearch
that referenced
this pull request
Aug 3, 2022
These tests will fail if elastic/ml-cpp#2376 with them unmuted. #88901 will follow up with the Java side changes.
Author
|
retest |
droberts195
added a commit
to elastic/elasticsearch
that referenced
this pull request
Aug 3, 2022
…8901) This change will facilitate a performance improvement on the C++ side. The request ID and cache hit indicator are the parts that need to be changed when the C++ process responds to an inference request. Having them at the top level means we do not need to parse and manipulate the original response - we can simply cache the inner object of the response and add the outer fields around it when serializing it. Companion to elastic/ml-cpp#2376
pull bot
pushed a commit
to abitmore/ml-cpp
that referenced
this pull request
Aug 4, 2022
In elastic#2376 the results writer for inference was changed to splice a preformatted cached section of the final result document into an outer wrapper that varies per request. The approach used worked when assertions were disabled but tripped assertions in RapidJSON due to its internal counters thinking that the spliced portion was a single value instead of a key and value together. This PR fixes the code when assertions are enabled by taking advantage of the fact that the internal value count of the RapidJSON writer is accessible to derived classes, so we can add a method that adds a key and value together and increments the counter twice.
davidkyle
added a commit
that referenced
this pull request
Jun 19, 2023
Adapt the test script for the result format changes in #2376
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Previously the inference cache stored complete results, including
a request ID and time taken. This was inefficient as it then meant
the original response had to be parsed and modified before sending
back to the Java side.
This PR changes the cache to store just the inner portion of the
inference result. Then the outer layer is added per request after
retrieving from the cache.
Additionally, the result writing functions are moved into a class
of their own, which means they can be unit tested.
Companion to elastic/elasticsearch#88901