Skip to content

[ML] Change inference cache to store only the inner part of results#2376

Merged
droberts195 merged 6 commits intoelastic:mainfrom
droberts195:refactor_pytorch_results
Aug 3, 2022
Merged

[ML] Change inference cache to store only the inner part of results#2376
droberts195 merged 6 commits intoelastic:mainfrom
droberts195:refactor_pytorch_results

Conversation

@droberts195
Copy link
Copy Markdown

@droberts195 droberts195 commented Jul 28, 2022

Previously the inference cache stored complete results, including
a request ID and time taken. This was inefficient as it then meant
the original response had to be parsed and modified before sending
back to the Java side.

This PR changes the cache to store just the inner portion of the
inference result. Then the outer layer is added per request after
retrieving from the cache.

Additionally, the result writing functions are moved into a class
of their own, which means they can be unit tested.

Companion to elastic/elasticsearch#88901

Hacky work in progress to test - do not review yet

Previously the inference cache stored complete results, including
a request ID and time taken. This was inefficient as it then meant
the original response had to be parsed and modified before sending
back to the Java side.

This PR changes the cache to store just the inner portion of the
inference result. Then the outer layer is added per request after
retrieving from the cache.
@droberts195 droberts195 marked this pull request as ready for review August 3, 2022 09:13
droberts195 added a commit to droberts195/elasticsearch that referenced this pull request Aug 3, 2022
These tests will fail if elastic/ml-cpp#2376 with
them unmuted. elastic#88901 will follow up with the Java
side changes.
droberts195 added a commit to elastic/elasticsearch that referenced this pull request Aug 3, 2022
These tests will fail if elastic/ml-cpp#2376 with
them unmuted. #88901 will follow up with the Java
side changes.
Copy link
Copy Markdown
Contributor

@dimitris-athanasiou dimitris-athanasiou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@droberts195
Copy link
Copy Markdown
Author

retest

@droberts195 droberts195 merged commit 4266662 into elastic:main Aug 3, 2022
@droberts195 droberts195 deleted the refactor_pytorch_results branch August 3, 2022 14:25
droberts195 added a commit to elastic/elasticsearch that referenced this pull request Aug 3, 2022
…8901)

This change will facilitate a performance improvement on the C++
side. The request ID and cache hit indicator are the parts that
need to be changed when the C++ process responds to an inference
request. Having them at the top level means we do not need to
parse and manipulate the original response - we can simply cache
the inner object of the response and add the outer fields around
it when serializing it.

Companion to elastic/ml-cpp#2376
pull bot pushed a commit to abitmore/ml-cpp that referenced this pull request Aug 4, 2022
In elastic#2376 the results writer for inference was changed
to splice a preformatted cached section of the final result
document into an outer wrapper that varies per request.

The approach used worked when assertions were disabled but
tripped assertions in RapidJSON due to its internal counters
thinking that the spliced portion was a single value instead
of a key and value together.

This PR fixes the code when assertions are enabled by
taking advantage of the fact that the internal value
count of the RapidJSON writer is accessible to derived
classes, so we can add a method that adds a key and
value together and increments the counter twice.
davidkyle added a commit that referenced this pull request Jun 19, 2023
Adapt the test script for the result format changes in #2376
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants