[ML] Natural Language Processing tasks and models#73523
Merged
davidkyle merged 21 commits intoelastic:feature/pytorch-inferencefrom Jun 2, 2021
Merged
[ML] Natural Language Processing tasks and models#73523davidkyle merged 21 commits intoelastic:feature/pytorch-inferencefrom
davidkyle merged 21 commits intoelastic:feature/pytorch-inferencefrom
Conversation
Collaborator
|
Pinging @elastic/ml-core (Team:ML) |
Member
|
run elasticsearch-ci/part-1 |
Contributor
|
jenkins test this please |
Contributor
dimitris-athanasiou
left a comment
There was a problem hiding this comment.
Looks good. Just a couple of test related comments.
Contributor
There was a problem hiding this comment.
This one is left empty. Should we add some tests here?
Contributor
There was a problem hiding this comment.
We should add some tests for this one
dimitris-athanasiou
approved these changes
Jun 1, 2021
Contributor
dimitris-athanasiou
left a comment
There was a problem hiding this comment.
LGTM Just a question about the name of the fill mask results field. Good to merge though even if you decide to change that.
Contributor
There was a problem hiding this comment.
Should this also be predictions?
davidkyle
added a commit
that referenced
this pull request
Jun 3, 2021
The feature branch contains changes to configure PyTorch models with a TrainedModelConfig and defines a format to store the binary models. The _start and _stop deployment actions control the model lifecycle and the model can be directly evaluated with the _infer endpoint. 2 Types of NLP tasks are supported: Named Entity Recognition and Fill Mask. The feature branch consists of these PRs: #73523, #72218, #71679 #71323, #71035, #71177, #70713
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Following on from #72218 which defined how large PyTorch models can be stored, this PR introduces the concepts of Natural Language Processing tasks and defines a way to evaluate BERT models.
Mask Fill and Named Entity Recognition tasks are implemented here but others could be easily added now the framework is in place. In particular this PR implements tokenisation of input text for BERT models and defines a structure for post-graph processing.
Once the PyTorch model is uploaded a trained model config referencing it must be PUT
And the model deployed:
Mask Fill Example
Returns
NER Example
Returns:
Feature branch PR
Co-authored-by: Dimitris Athanasiou dimitris@elastic.co