[ML] Limit in flight requests when indexing model download parts#112992
[ML] Limit in flight requests when indexing model download parts#112992
Conversation
|
Pinging @elastic/ml-core (Team:ML) |
pxsalehi
left a comment
There was a problem hiding this comment.
LGTM for the newer changes. Although, ideally there should be a test that shows the number of parallel downloads is limited. In any case, I'm gonna leave the final LGTM to ML, if that's ok.
| true | ||
| ); | ||
|
|
||
| client.execute(PutTrainedModelDefinitionPartAction.INSTANCE, modelPartRequest).actionGet(); |
There was a problem hiding this comment.
This is the part that is blocking and/or slowing down the indexing, correct? We'll wait for the client response rather than continue asynchronously?
There was a problem hiding this comment.
The gain is from using 5 threads to stream the download and index the parts. The non-blocking write meant that we had more than 5 in flight requests (the download is faster than the indexing) and that was causing the OOM. In order to limit the number of requests to at most 5 there has to be some element of blocking. Model download uses it's a dedicated thread pool so the block does not starve other parts of the code of resources
| for (int i = 0; i < ranges.size() - 1; i++) { | ||
| assertThat(ranges.get(i).rangeStart(), is(startBytes)); | ||
| long end = startBytes + ((long) ranges.get(i).numParts() * chunkSize) - 1; | ||
| assertThat(ranges.get(i).rangeEnd(), is(end)); | ||
| long expectedNumBytesInRange = (long) chunkSize * ranges.get(i).numParts() - 1; | ||
| assertThat(ranges.get(i).rangeEnd() - ranges.get(i).rangeStart(), is(expectedNumBytesInRange)); | ||
| assertThat(ranges.get(i).startPart(), is(startPartIndex)); |
|
@elasticmachine update branch |
…stic#112992) Restores the changes from elastic#111684 which uses multiple streams to improve the time to download and install the built in ml models. The first iteration has a problem where the number of in-flight requests was not properly limited which is fixed here. Additionally there are now circuit breaker checks on allocating the buffer used to store the model definition.
💔 Backport failed
You can use sqren/backport to manually backport by running |
…2992) (#113514) Restores the changes from #111684 which uses multiple streams to improve the time to download and install the built in ml models. The first iteration has a problem where the number of in-flight requests was not properly limited which is fixed here. Additionally there are now circuit breaker checks on allocating the buffer used to store the model definition.
…stic#112992) Restores the changes from elastic#111684 which uses multiple streams to improve the time to download and install the built in ml models. The first iteration has a problem where the number of in-flight requests was not properly limited which is fixed here. Additionally there are now circuit breaker checks on allocating the buffer used to store the model definition. # Conflicts: # x-pack/plugin/ml-package-loader/src/main/java/org/elasticsearch/xpack/ml/packageloader/action/TransportLoadTrainedModelPackage.java # x-pack/plugin/ml-package-loader/src/test/java/org/elasticsearch/xpack/ml/packageloader/action/TransportLoadTrainedModelPackageTests.java
…2992) (#113710) Restores the changes from #111684 which uses multiple streams to improve the time to download and install the built in ml models. The first iteration has a problem where the number of in-flight requests was not properly limited which is fixed here. Additionally there are now circuit breaker checks on allocating the buffer used to store the model definition. # Conflicts: # x-pack/plugin/ml-package-loader/src/main/java/org/elasticsearch/xpack/ml/packageloader/action/TransportLoadTrainedModelPackage.java # x-pack/plugin/ml-package-loader/src/test/java/org/elasticsearch/xpack/ml/packageloader/action/TransportLoadTrainedModelPackageTests.java
#111684 improved the model install time by using multiple streams and threads to download and write the model parts. The change was reverted in #112961 after it was discovered the be the cause of Out Of Memory exceptions.
The design relied on using a fixed size thread pool to limit the concurrent downloads and hence also manage memory usage. However, the indexing of the downloaded document was performed async which meant a new download request would be forked and executed while the write request was still in flight leading to large numbers of in flight requests. The fix here is to block on the index write.
The first commit is the revert of the revert, the later commits introduce the blocking write and reuse a byte buffer that was being recreated for every downloaded part. Allocating that byte buffer is now protected by a circuit breaker.
Labelled as a non issue because the code that caused the OOM was reverted before it made it to a production environment.