[ML] Write downloaded model parts async#111684
Merged
Conversation
Collaborator
|
Hi @davidkyle, I've created a changelog YAML for you. |
jimczi
reviewed
Aug 8, 2024
...kage-loader/src/main/java/org/elasticsearch/xpack/ml/packageloader/action/ModelImporter.java
Outdated
Show resolved
Hide resolved
...kage-loader/src/main/java/org/elasticsearch/xpack/ml/packageloader/action/ModelImporter.java
Outdated
Show resolved
Hide resolved
Member
Author
|
@elasticmachine update branch |
Collaborator
|
Pinging @elastic/ml-core (Team:ML) |
jimczi
reviewed
Aug 20, 2024
...kage-loader/src/main/java/org/elasticsearch/xpack/ml/packageloader/action/ModelImporter.java
Outdated
Show resolved
Hide resolved
...kage-loader/src/main/java/org/elasticsearch/xpack/ml/packageloader/action/ModelImporter.java
Outdated
Show resolved
Hide resolved
Member
Author
|
@elasticmachine update branch |
Member
Author
|
In classic cloud this change has taken the model download & install time down from 30 seconds to [7 - 10] seconds with the total time to download and deploy ELSER optimised at 14 seconds. In serverless the download & install time is down to 21 seconds and the total time to download and deploy ELSER optimised 31 seconds. Those severless numbers aren't good enough, I will try another approach |
Member
Author
|
@elasticmachine update branch |
davidkyle
added a commit
to davidkyle/elasticsearch
that referenced
this pull request
Sep 13, 2024
…#111684) Uses the range header to split the model download into multiple streams using a separate thread for each stream
elasticsearchmachine
pushed a commit
that referenced
this pull request
Sep 13, 2024
#112859) Uses the range header to split the model download into multiple streams using a separate thread for each stream
davidkyle
added a commit
to davidkyle/elasticsearch
that referenced
this pull request
Sep 13, 2024
…#111684) Uses the range header to split the model download into multiple streams using a separate thread for each stream # Conflicts: # x-pack/plugin/ml-package-loader/src/main/java/org/elasticsearch/xpack/ml/packageloader/action/TransportLoadTrainedModelPackage.java # x-pack/plugin/ml-package-loader/src/test/java/org/elasticsearch/xpack/ml/packageloader/action/TransportLoadTrainedModelPackageTests.java
davidkyle
added a commit
to davidkyle/elasticsearch
that referenced
this pull request
Sep 16, 2024
…lastic#111684)" This reverts commit 13bd6c0.
elasticsearchmachine
pushed a commit
that referenced
this pull request
Sep 16, 2024
davidkyle
added a commit
to davidkyle/elasticsearch
that referenced
this pull request
Sep 17, 2024
…lastic#111684) (elastic#112859)" This reverts commit 4fe2851.
This was referenced Sep 17, 2024
elasticsearchmachine
pushed a commit
that referenced
this pull request
Sep 17, 2024
elasticsearchmachine
pushed a commit
that referenced
this pull request
Sep 17, 2024
davidkyle
added a commit
that referenced
this pull request
Sep 25, 2024
…2992) Restores the changes from #111684 which uses multiple streams to improve the time to download and install the built in ml models. The first iteration has a problem where the number of in-flight requests was not properly limited which is fixed here. Additionally there are now circuit breaker checks on allocating the buffer used to store the model definition.
davidkyle
added a commit
to davidkyle/elasticsearch
that referenced
this pull request
Sep 25, 2024
…stic#112992) Restores the changes from elastic#111684 which uses multiple streams to improve the time to download and install the built in ml models. The first iteration has a problem where the number of in-flight requests was not properly limited which is fixed here. Additionally there are now circuit breaker checks on allocating the buffer used to store the model definition.
elasticsearchmachine
pushed a commit
that referenced
this pull request
Sep 25, 2024
…2992) (#113514) Restores the changes from #111684 which uses multiple streams to improve the time to download and install the built in ml models. The first iteration has a problem where the number of in-flight requests was not properly limited which is fixed here. Additionally there are now circuit breaker checks on allocating the buffer used to store the model definition.
davidkyle
added a commit
to davidkyle/elasticsearch
that referenced
this pull request
Sep 27, 2024
…stic#112992) Restores the changes from elastic#111684 which uses multiple streams to improve the time to download and install the built in ml models. The first iteration has a problem where the number of in-flight requests was not properly limited which is fixed here. Additionally there are now circuit breaker checks on allocating the buffer used to store the model definition. # Conflicts: # x-pack/plugin/ml-package-loader/src/main/java/org/elasticsearch/xpack/ml/packageloader/action/TransportLoadTrainedModelPackage.java # x-pack/plugin/ml-package-loader/src/test/java/org/elasticsearch/xpack/ml/packageloader/action/TransportLoadTrainedModelPackageTests.java
elasticsearchmachine
pushed a commit
that referenced
this pull request
Sep 27, 2024
…2992) (#113710) Restores the changes from #111684 which uses multiple streams to improve the time to download and install the built in ml models. The first iteration has a problem where the number of in-flight requests was not properly limited which is fixed here. Additionally there are now circuit breaker checks on allocating the buffer used to store the model definition. # Conflicts: # x-pack/plugin/ml-package-loader/src/main/java/org/elasticsearch/xpack/ml/packageloader/action/TransportLoadTrainedModelPackage.java # x-pack/plugin/ml-package-loader/src/test/java/org/elasticsearch/xpack/ml/packageloader/action/TransportLoadTrainedModelPackageTests.java
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
It has been observed that downloading and installing the built in
.elser_model_2and.multilingual-e5-smallmodels is much slower than expected. The cause is in theModelImporterclass which downloads the model definition in 1MB chunks then blocks as the model part is written to the index.The download server supports the Range header, to speed up the download and install multiple connections are made to the server each asking for a separate range. A dedicated thread handle downloading and index the parts in each range. 5 connections are used in this PR, reading a 1MB chunk at a time to limit the amount of memory used.
The final part of the model definition must be written last as it causes an index refresh making the full model definition visible, if the refresh occurs before all parts are written and not all the parts are visible then deploying the model will fail. This is achieved by indexing the final part only once all the other streams have completed.
There is a problem with calculating the SHA 256 Message Digest of the downloaded model. For one the MessageDigest is not thread safe, more problematically the model parts are not downloaded sequentially and the resulting digest changes depending on the order in which the parts are downloaded.