[ML] Natural Language Processing tasks and models by davidkyle · Pull Request #73523 · elastic/elasticsearch

davidkyle · 2021-05-28T11:39:12Z

Following on from #72218 which defined how large PyTorch models can be stored, this PR introduces the concepts of Natural Language Processing tasks and defines a way to evaluate BERT models.

Mask Fill and Named Entity Recognition tasks are implemented here but others could be easily added now the framework is in place. In particular this PR implements tokenisation of input text for BERT models and defines a structure for post-graph processing.

Once the PyTorch model is uploaded a trained model config referencing it must be PUT

PUT ml/trained_models/bert-model-for-maskfill
{
    "description": "Mask fill model",
    "model_type": "pytorch",
    "inference_config": {
        "classification": {
            "num_top_classes": 1
        }
    },
    "input": {
        "field_names": ["text_field"]
    },
    "location": {
        "index": {
            "model_id": "bert-model-for-maskfill",
            "name": "big_model"
        }
    }
}

And the model deployed:

POST _ml/trained_models/deployment/bert-model-for-maskfill/_start

Mask Fill Example

POST _ml/trained_models/deployment/bert-model-for-maskfill/_infer
{
  "input": "Paris is the [MASK] of France."
}

Returns

[
  {
    "token" : "capital",
    "score" : 0.9861745037766138,
    "sequence" : "Paris is the capital of France."
  },
  {
    "token" : "center",
    "score" : 0.00372138405614492,
    "sequence" : "Paris is the center of France."
  },
  {
    "token" : "Capital",
    "score" : 0.003259749401778711,
    "sequence" : "Paris is the Capital of France."
  },
  {
    "token" : "centre",
    "score" : 0.002157122475609145,
    "sequence" : "Paris is the centre of France."
  },
  {
    "token" : "city",
    "score" : 9.026127599384262E-4,
    "sequence" : "Paris is the city of France."
  }
]

NER Example

POST _ml/trained_models/deployment/bert-model-fine-tuned-for-ner/_infer
{
  "input": "Today's GAH is live from Amsterdam, BC, London, Munich and Texas"
}

Returns:

[
  {
    "label" : "organisation",
    "score" : 0.940775243737086,
    "word" : "GAH"
  },
  {
    "label" : "location",
    "score" : 0.9987588832004948,
    "word" : "Amsterdam"
  },
  {
    "label" : "location",
    "score" : 0.9958452874139202,
    "word" : "BC"
  },
  {
    "label" : "location",
    "score" : 0.9981461858828271,
    "word" : "London"
  },
  {
    "label" : "location",
    "score" : 0.9991212183928049,
    "word" : "Munich"
  },
  {
    "label" : "location",
    "score" : 0.9994121461792658,
    "word" : "Texas"
  }
]

Feature branch PR

Co-authored-by: Dimitris Athanasiou dimitris@elastic.co

elasticmachine · 2021-05-28T11:39:16Z

Pinging @elastic/ml-core (Team:ML)

benwtrent · 2021-05-28T11:53:48Z

run elasticsearch-ci/part-1

mark-vieira · 2021-05-28T16:21:35Z

jenkins test this please

dimitris-athanasiou

Looks good. Just a couple of test related comments.

dimitris-athanasiou · 2021-06-01T11:26:56Z

x-pack/plugin/ml/src/test/java/org/elasticsearch/xpack/ml/inference/nlp/TaskTypeTests.java

This one is left empty. Should we add some tests here?

dimitris-athanasiou · 2021-06-01T11:27:37Z

x-pack/plugin/ml/src/main/java/org/elasticsearch/xpack/ml/inference/nlp/FillMaskProcessor.java

We should add some tests for this one

dimitris-athanasiou

LGTM Just a question about the name of the fill mask results field. Good to merge though even if you decide to change that.

dimitris-athanasiou · 2021-06-01T16:08:20Z

...in/core/src/main/java/org/elasticsearch/xpack/core/ml/inference/results/FillMaskResults.java

Should this also be predictions?

dimitris-athanasiou · 2021-06-01T16:09:31Z

x-pack/plugin/ml/src/main/java/org/elasticsearch/xpack/ml/inference/nlp/FillMaskProcessor.java

The feature branch contains changes to configure PyTorch models with a TrainedModelConfig and defines a format to store the binary models. The _start and _stop deployment actions control the model lifecycle and the model can be directly evaluated with the _infer endpoint. 2 Types of NLP tasks are supported: Named Entity Recognition and Fill Mask. The feature branch consists of these PRs: #73523, #72218, #71679 #71323, #71035, #71177, #70713

davidkyle added >feature :ml Machine learning labels May 28, 2021

elasticmachine added the Team:ML Meta label for the ML team label May 28, 2021

davidkyle force-pushed the bert-tokenizer branch from 45c44a4 to 2b5b0e2 Compare June 1, 2021 08:10

davidkyle mentioned this pull request Jun 1, 2021

[ML] Add Trained Model Post-Processors #69571

Closed

dimitris-athanasiou reviewed Jun 1, 2021

View reviewed changes

dimitris-athanasiou approved these changes Jun 1, 2021

View reviewed changes

davidkyle and others added 20 commits June 1, 2021 21:23

WIP

e3fafa2

Add the tokenization pipeline

7b74033

Pass 'inputs' to infer request instead of the big whole doc

ea665e4

Add special tokens and do_lower_case setting

d38a054

Add pipeline post processor

558ce9f

Fixing tests

568fabd

Implement NER result processor

8ce7ede

Add fill_mask processor

f3aef86

Move results into core and add tests

5edff63

Drop Pipeline terminology

b76f14e

Remove big config file

5d2491f

Use a common BERT request builder

4b26720

Add top k function

9aa0457

Handle punctuation chars next to the [MASK] token

0f0424b

Ner Processor tests

bc050ae

tidy up

1788374

Heap based top k

92c4123

Implement top k using a priority queue

b7a4a7f

Fixes

b744aa3

Fill Mask test

5abec25

Check for error from pytorch results

ff2a6c1

davidkyle force-pushed the bert-tokenizer branch from 199e247 to ff2a6c1 Compare June 1, 2021 20:24

davidkyle merged commit 8e51034 into elastic:feature/pytorch-inference Jun 2, 2021

davidkyle deleted the bert-tokenizer branch June 2, 2021 10:13

davidkyle mentioned this pull request Jun 2, 2021

[ML] Merge the pytorch-inference feature branch #73660

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ML] Natural Language Processing tasks and models#73523

[ML] Natural Language Processing tasks and models#73523
davidkyle merged 21 commits intoelastic:feature/pytorch-inferencefrom
davidkyle:bert-tokenizer

davidkyle commented May 28, 2021 •

edited by dimitris-athanasiou

Loading

Uh oh!

elasticmachine commented May 28, 2021

Uh oh!

benwtrent commented May 28, 2021

Uh oh!

mark-vieira commented May 28, 2021

Uh oh!

dimitris-athanasiou left a comment

Uh oh!

dimitris-athanasiou Jun 1, 2021

Uh oh!

dimitris-athanasiou Jun 1, 2021

Uh oh!

dimitris-athanasiou left a comment

Uh oh!

dimitris-athanasiou Jun 1, 2021

Uh oh!

dimitris-athanasiou Jun 1, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Conversation

davidkyle commented May 28, 2021 • edited by dimitris-athanasiou Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Mask Fill Example

NER Example

Uh oh!

elasticmachine commented May 28, 2021

Uh oh!

benwtrent commented May 28, 2021

Uh oh!

mark-vieira commented May 28, 2021

Uh oh!

dimitris-athanasiou left a comment

Choose a reason for hiding this comment

Uh oh!

dimitris-athanasiou Jun 1, 2021

Choose a reason for hiding this comment

Uh oh!

dimitris-athanasiou Jun 1, 2021

Choose a reason for hiding this comment

Uh oh!

dimitris-athanasiou left a comment

Choose a reason for hiding this comment

Uh oh!

dimitris-athanasiou Jun 1, 2021

Choose a reason for hiding this comment

Uh oh!

dimitris-athanasiou Jun 1, 2021

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

davidkyle commented May 28, 2021 •

edited by dimitris-athanasiou

Loading