[ML] Load and evaluate 3rd Party Model by davidkyle · Pull Request #72218 · elastic/elasticsearch

davidkyle · 2021-04-26T10:57:42Z

This adds a location field to TrainedModelConfig for large models that cannot be PUT inline with the config. If location is set then the definition is not required, instead the model will be loaded from the name and index specified in the location object following the convention used for restoring TrainedModelDefinitionDocs. I've deleted the class PyTorchModel which was the previous attempt to achieve the same by putting the model location in the definition but the model definition is not as easy to read as it needs to be explicitly requested and having the location in the config makes more sense both in code and API usage. When a model deployment starts the model is loaded from the location in the config.

Example model configuration

{
    "description": "a large model that needs to be split into chunks,
    "model_type": "pytorch",
    "inference_config": {
        "classification": {
            "num_top_classes": 1
        }
    },
    "input": {
        "field_names": ["text_field"]
    },
    "location": {
        "model_id": "distilbert-finetuned-sst",
        "index": "big_model"
    }
}

The _infer action has been updated to accept a JSON document which is passes straight through to the inference request without any validation. This is a temporary measure more thought needs to go into the API design

davidkyle · 2021-04-26T20:37:45Z

retest this please

elasticmachine · 2021-04-27T13:58:51Z

Pinging @elastic/ml-core (Team:ML)

benwtrent

location should be a named object.

The initial one is called index with a default location indicating the ml inference index, this way the logic for restoring models can be slightly more unified, we should always restore a definition from a location object and each location type knows how query for docs and provide them.

Example:

"location": {"index": {"name": ".ml-inference-*"}} // I am not sure if we should include model id

^ The default location for current models (if not shipped in resource files)

Also, within the index location type, we shouldn't supply the model_id. The model id should ALWAYS be the same as the model config. If we allow them to be different, we will needlessly complicate matters.

dimitris-athanasiou · 2021-04-28T09:28:05Z

location should be a named object.

Is the idea that we could then support a future location that is not an index?

droberts195 · 2021-04-28T09:39:47Z

Is the idea that we could then support a future location that is not an index?

Yes, the hope is that one day Elasticsearch will have a large binary blob storage facility (several people have requested this), and we'll use that for models added after that time.

davidkyle · 2021-04-28T10:29:55Z

I like the idea that all trained model configurations have a location so conceptually there is little difference between the large models uploaded externally and the smaller DFA models. For existing models the location is .ml-inference-* so there are no BWC problems.

The model id should ALWAYS be the same as the model config

In this case the model id is the name of the PyTorch model and may be different for a few reasons

To give the config a more meaningful name other than distillBert-XXX
If multiple configs reference the same PyTorch model

I'm not sure 2. is valid use case as the main option to tweak is the input field and that can be achieved with a field map. Let's keep this open for review because I love to prune redundant configuration and will do so if we can't find a use for it once we have more experience.

location should be a named object.

++

I'm not convinced we will ever have another storage option but the proposed config allows flexibility and is no more or less readable than the current implementation.

benwtrent · 2021-04-29T14:03:00Z

...lugin/ml/src/main/java/org/elasticsearch/xpack/ml/action/TransportPutTrainedModelAction.java

+        if (hasModelDefinition) {
+            trainedModelConfig.setEstimatedHeapMemory(config.getModelDefinition().ramBytesUsed())
+                .setEstimatedOperations(config.getModelDefinition().getTrainedModel().estimatedNumOperations());
+        }


🤔

we will have to rethink this. It would be good to somehow indicate the work and size required for a pytorch model.

++ we will need a measure of the model size in TrainedModelConfig to allocate the persistent task by memory

benwtrent · 2021-04-29T14:04:04Z

.../main/java/org/elasticsearch/xpack/ml/action/TransportStartTrainedModelDeploymentAction.java

+
+
+                TrainedModelConfig trainedModelConfig = getModelResponse.getResources().results().get(0);
+                if (trainedModelConfig.getModelType() != TrainedModelType.PYTORCH) {


I am assuming that this is just for this development spike. Before merging we will allow deploying of all models (I would think)

I thought it was a sane check to perform right now but may not make sense in the future

The feature branch contains changes to configure PyTorch models with a TrainedModelConfig and defines a format to store the binary models. The _start and _stop deployment actions control the model lifecycle and the model can be directly evaluated with the _infer endpoint. 2 Types of NLP tasks are supported: Named Entity Recognition and Fill Mask. The feature branch consists of these PRs: #73523, #72218, #71679 #71323, #71035, #71177, #70713

davidkyle added the :ml Machine learning label Apr 26, 2021

davidkyle force-pushed the load-and-infer branch 2 times, most recently from 4423b8c to e5c5bec Compare April 27, 2021 12:48

davidkyle marked this pull request as ready for review April 27, 2021 13:58

elasticmachine added the Team:ML Meta label for the ML team label Apr 27, 2021

davidkyle force-pushed the load-and-infer branch from e5c5bec to db8a7aa Compare April 27, 2021 13:59

benwtrent reviewed Apr 27, 2021

View reviewed changes

davidkyle added 10 commits April 28, 2021 18:41

Start & stop deployment rest spec

90608fd

Get model definition on start deployment

5a42c0e

deployment yml test

1c966aa

Add trained model location

b33aca0

Send entire inference doc to inference action

6e66b14

Delete PyTorchModel class in favour of using a location config

2511962

Fixing tests

fb47ca7

Fix Messages usage

587d6df

Refresh!

ee709c5

More test

811aeb5

davidkyle force-pushed the load-and-infer branch from 2dc5158 to 0131351 Compare April 28, 2021 18:52

Make TrainedModelLocation a named object

4165dc3

davidkyle force-pushed the load-and-infer branch from 0131351 to 4165dc3 Compare April 28, 2021 20:04

benwtrent approved these changes Apr 29, 2021

View reviewed changes

davidkyle merged commit 63bb0c6 into elastic:feature/pytorch-inference Apr 29, 2021

davidkyle deleted the load-and-infer branch April 29, 2021 14:28

This was referenced May 27, 2021

[ML] Natural Language Processing tasks and models #72894

Closed

[ML] Natural Language Processing tasks and models #73520

Closed

This was referenced May 28, 2021

[ML] Natural Language Processing tasks and models #73521

Closed

[ML] Natural Language Processing tasks and models #73523

Merged

[ML] Merge the pytorch-inference feature branch #73660

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ML] Load and evaluate 3rd Party Model#72218

[ML] Load and evaluate 3rd Party Model#72218
davidkyle merged 11 commits intoelastic:feature/pytorch-inferencefrom
davidkyle:load-and-infer

davidkyle commented Apr 26, 2021 •

edited

Loading

Uh oh!

davidkyle commented Apr 26, 2021

Uh oh!

elasticmachine commented Apr 27, 2021

Uh oh!

benwtrent left a comment

Uh oh!

dimitris-athanasiou commented Apr 28, 2021 •

edited

Loading

Uh oh!

droberts195 commented Apr 28, 2021

Uh oh!

davidkyle commented Apr 28, 2021

Uh oh!

benwtrent Apr 29, 2021

Uh oh!

davidkyle Apr 29, 2021

Uh oh!

benwtrent Apr 29, 2021

Uh oh!

davidkyle Apr 29, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants



		TrainedModelConfig trainedModelConfig = getModelResponse.getResources().results().get(0);
		if (trainedModelConfig.getModelType() != TrainedModelType.PYTORCH) {

Conversation

davidkyle commented Apr 26, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Example model configuration

Uh oh!

davidkyle commented Apr 26, 2021

Uh oh!

elasticmachine commented Apr 27, 2021

Uh oh!

benwtrent left a comment

Choose a reason for hiding this comment

Uh oh!

dimitris-athanasiou commented Apr 28, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

droberts195 commented Apr 28, 2021

Uh oh!

davidkyle commented Apr 28, 2021

Uh oh!

benwtrent Apr 29, 2021

Choose a reason for hiding this comment

Uh oh!

davidkyle Apr 29, 2021

Choose a reason for hiding this comment

Uh oh!

benwtrent Apr 29, 2021

Choose a reason for hiding this comment

Uh oh!

davidkyle Apr 29, 2021

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

davidkyle commented Apr 26, 2021 •

edited

Loading

dimitris-athanasiou commented Apr 28, 2021 •

edited

Loading