Skip to content

[ML] Load and evaluate 3rd Party Model#72218

Merged
davidkyle merged 11 commits intoelastic:feature/pytorch-inferencefrom
davidkyle:load-and-infer
Apr 29, 2021
Merged

[ML] Load and evaluate 3rd Party Model#72218
davidkyle merged 11 commits intoelastic:feature/pytorch-inferencefrom
davidkyle:load-and-infer

Conversation

@davidkyle
Copy link
Copy Markdown
Member

@davidkyle davidkyle commented Apr 26, 2021

This adds a location field to TrainedModelConfig for large models that cannot be PUT inline with the config. If location is set then the definition is not required, instead the model will be loaded from the name and index specified in the location object following the convention used for restoring TrainedModelDefinitionDocs. I've deleted the class PyTorchModel which was the previous attempt to achieve the same by putting the model location in the definition but the model definition is not as easy to read as it needs to be explicitly requested and having the location in the config makes more sense both in code and API usage. When a model deployment starts the model is loaded from the location in the config.

Example model configuration

{
    "description": "a large model that needs to be split into chunks,
    "model_type": "pytorch",
    "inference_config": {
        "classification": {
            "num_top_classes": 1
        }
    },
    "input": {
        "field_names": ["text_field"]
    },
    "location": {
        "model_id": "distilbert-finetuned-sst",
        "index": "big_model"
    }
}

The _infer action has been updated to accept a JSON document which is passes straight through to the inference request without any validation. This is a temporary measure more thought needs to go into the API design

@davidkyle davidkyle added the :ml Machine learning label Apr 26, 2021
@davidkyle
Copy link
Copy Markdown
Member Author

retest this please

@davidkyle davidkyle force-pushed the load-and-infer branch 2 times, most recently from 4423b8c to e5c5bec Compare April 27, 2021 12:48
@davidkyle davidkyle marked this pull request as ready for review April 27, 2021 13:58
@elasticmachine elasticmachine added the Team:ML Meta label for the ML team label Apr 27, 2021
@elasticmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/ml-core (Team:ML)

Copy link
Copy Markdown
Member

@benwtrent benwtrent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

location should be a named object.

The initial one is called index with a default location indicating the ml inference index, this way the logic for restoring models can be slightly more unified, we should always restore a definition from a location object and each location type knows how query for docs and provide them.

Example:

"location": {"index": {"name": ".ml-inference-*"}} // I am not sure if we should include model id

^ The default location for current models (if not shipped in resource files)

Also, within the index location type, we shouldn't supply the model_id. The model id should ALWAYS be the same as the model config. If we allow them to be different, we will needlessly complicate matters.

@dimitris-athanasiou
Copy link
Copy Markdown
Contributor

dimitris-athanasiou commented Apr 28, 2021

location should be a named object.

Is the idea that we could then support a future location that is not an index?

@droberts195
Copy link
Copy Markdown

Is the idea that we could then support a future location that is not an index?

Yes, the hope is that one day Elasticsearch will have a large binary blob storage facility (several people have requested this), and we'll use that for models added after that time.

@davidkyle
Copy link
Copy Markdown
Member Author

I like the idea that all trained model configurations have a location so conceptually there is little difference between the large models uploaded externally and the smaller DFA models. For existing models the location is .ml-inference-* so there are no BWC problems.

The model id should ALWAYS be the same as the model config

In this case the model id is the name of the PyTorch model and may be different for a few reasons

  1. To give the config a more meaningful name other than distillBert-XXX
  2. If multiple configs reference the same PyTorch model

I'm not sure 2. is valid use case as the main option to tweak is the input field and that can be achieved with a field map. Let's keep this open for review because I love to prune redundant configuration and will do so if we can't find a use for it once we have more experience.

location should be a named object.

++

I'm not convinced we will ever have another storage option but the proposed config allows flexibility and is no more or less readable than the current implementation.

Comment on lines +151 to +154
if (hasModelDefinition) {
trainedModelConfig.setEstimatedHeapMemory(config.getModelDefinition().ramBytesUsed())
.setEstimatedOperations(config.getModelDefinition().getTrainedModel().estimatedNumOperations());
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔

we will have to rethink this. It would be good to somehow indicate the work and size required for a pytorch model.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++ we will need a measure of the model size in TrainedModelConfig to allocate the persistent task by memory



TrainedModelConfig trainedModelConfig = getModelResponse.getResources().results().get(0);
if (trainedModelConfig.getModelType() != TrainedModelType.PYTORCH) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am assuming that this is just for this development spike. Before merging we will allow deploying of all models (I would think)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought it was a sane check to perform right now but may not make sense in the future

@davidkyle davidkyle merged commit 63bb0c6 into elastic:feature/pytorch-inference Apr 29, 2021
@davidkyle davidkyle deleted the load-and-infer branch April 29, 2021 14:28
davidkyle added a commit that referenced this pull request Jun 3, 2021
The feature branch contains changes to configure PyTorch models with a 
TrainedModelConfig and defines a format to store the binary models. 
The _start and _stop deployment actions control the model lifecycle 
and the model can be directly evaluated with the _infer endpoint. 
2 Types of NLP tasks are supported: Named Entity Recognition and Fill Mask.

The feature branch consists of these PRs: #73523, #72218, #71679
#71323, #71035, #71177, #70713
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:ml Machine learning Team:ML Meta label for the ML team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants