[ML] Add PyTorch model configuration#71035
[ML] Add PyTorch model configuration#71035davidkyle merged 9 commits intoelastic:feature/pytorch-inferencefrom
Conversation
|
Pinging @elastic/ml-core (Team:ML) |
There was a problem hiding this comment.
Because the model definitions are always streamed as compressed strings we never lookup the named writable for TrainedModel
There was a problem hiding this comment.
I know this is temporary, but we will definitely need this populated in the future. This way we know if there is enough free resources to assign a model to the node.
Also, I suggest simply making estimatedNumOperations a 1 if we are not setting it for now.
There was a problem hiding this comment.
I've put this on my TODO list. Yes the model will use memory but that is native memory not JVM, for the purpose of accounting the shallow size is right. Perhaps we add a long nativeRamBytesUsed() method for models loaded in a native process.
There was a problem hiding this comment.
Before release this needs to be integrated with the MlMemoryTracker and NodeLoadDetector classes. They will need to track memory requirement for the 3 types of things we now have (anomaly detector jobs, data frame analytics jobs and native trained models), and then the node selectors will need to take into account the sum of all the requirements on each node.
I guess it's a non-trivial problem to know how much memory a PyTorch model will require when loaded given only the size of its .pt file to work with. We'll have to do some experiments and see if there's an approximate formula that gives reasonable results. We also need to account for the fact that the .pt file is held in the C++ process's heap while it loads the model, so total requirement will be sizeof(loaded model) + sizeof(.pt file) + sizeof(static overhead) + sizeof(code).
...c/main/java/org/elasticsearch/xpack/core/ml/inference/trainedmodel/pytorch/PyTorchModel.java
Outdated
Show resolved
Hide resolved
...c/main/java/org/elasticsearch/xpack/core/ml/inference/trainedmodel/pytorch/PyTorchModel.java
Outdated
Show resolved
Hide resolved
.../plugin/core/src/main/java/org/elasticsearch/xpack/core/ml/inference/TrainedModelConfig.java
Outdated
Show resolved
Hide resolved
...l/src/main/java/org/elasticsearch/client/ml/inference/trainedmodel/pytorch/PyTorchModel.java
Outdated
Show resolved
Hide resolved
...lugin/ml/src/main/java/org/elasticsearch/xpack/ml/action/TransportPutTrainedModelAction.java
Outdated
Show resolved
Hide resolved
...lugin/ml/src/main/java/org/elasticsearch/xpack/ml/action/TransportPutTrainedModelAction.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
spaces are there for readability.
There was a problem hiding this comment.
It may be more readable but it makes the test very brittle. I've found this is still quite readable as each column is delineated by a \s+ and the regex is easer to grok.
There was a problem hiding this comment.
It may be more readable but it makes the test very brittle.
I don't understand how it makes the test more brittle. Tests don't fail more often or not due to the spaces and to me, it seems WAY easier to know what column my current regex clause is messing with when there are white spaces.
x-pack/plugin/src/yamlRestTest/resources/rest-api-spec/test/ml/trained_model_cat_apis.yml
Outdated
Show resolved
Hide resolved
This reverts commit 83fc8c98c7592690790157b9ad1a0c6e4e781.
|
Adds the model_type field to TrainedModelConfig for distinguishing between models |
The feature branch contains changes to configure PyTorch models with a TrainedModelConfig and defines a format to store the binary models. The _start and _stop deployment actions control the model lifecycle and the model can be directly evaluated with the _infer endpoint. 2 Types of NLP tasks are supported: Named Entity Recognition and Fill Mask. The feature branch consists of these PRs: #73523, #72218, #71679 #71323, #71035, #71177, #70713
Adds the
model_typefield toTrainedModelConfigfor distinguishing between models that can be loaded via the model loading service and those that require a native process.model_typenow appears in the CAT trained models action.Existing models without a
model_typemust be either a tree ensemble or the land ident model. I've added the field to the lang ident config so all models with a nullmodel_typemust be a tree ensemble.model_typeis set on creation of new models either by the user or by interrogating the TrainedModelDefinition. I didn't want to break the API for existing users by requiringmodel_typeis set since it can be set automatically.The new class
PyTorchModelimplementsTrainedModeland has a simple definition which is just the ID of the PyTorch model it uses. Loading the PyTorch model and checking it exists when theTrainedModelConfigis PUT is a TODO