[ML] Use perAllocation and perDeployment memory usage in the model assignment planner by valeriy42 · Pull Request #98874 · elastic/elasticsearch

valeriy42 · 2023-08-25T12:31:52Z

Building upon #98139, this PR extends the model assignment planning algorithms and the linear solver to use the extended memory fields. It also adds unit tests to verify the new behavior.

I needed to adjust the old unit tests since we use the estimateMemoryUsage routine, which would compute 2*memoryBytes + 240 MB as the memory requirement. Previously, in the unit tests, we were simply using memoryBytes field value.

…update-assignment-planner

elasticsearchmachine · 2023-10-09T09:13:31Z

Hi @valeriy42, I've created a changelog YAML for you.

elasticsearchmachine · 2023-10-09T09:13:31Z

Pinging @elastic/ml-core (Team:ML)

droberts195

LGTM

droberts195 · 2023-11-01T11:55:34Z

@valeriy42 this is failing CI because of this:

Forbidden method invocation: java.lang.String#format(java.lang.String,java.lang.Object[]) [Uses default locale]
in org.elasticsearch.upgrades.MlAssignmentPlannerUpgradeIT (MlAssignmentPlannerUpgradeIT.java:141)

You can fix that by using Strings.format instead of String.format. (Where Strings is an Elastic class that wraps up adding the locale.)

valeriy42 · 2023-11-02T12:14:34Z

@elasticmachine update branch

tveasey

Good work on this! LGTM

This PR adds an ability to estimate per deployment and per allocation memory usage of NLP transformer models. It uses torch.profiler and performs logs the peak memory usage during the inference. This information is then used in Elasticsearch to provision models with sufficient memory (elastic/elasticsearch#98874).

…model assignment planner (#98874)" This reverts commit aa2f6e7.

…model assignment planner (#98874)" (#101834) There were a number of BWC test failures after the PR was merged today. I'll revert it and investigate the failures locally. Reverts #98874

…signment planner (elastic#98874) Building upon elastic#98139, this PR extends the model assignment planning algorithms and the linear solver to use the extended memory fields. It also adds unit tests to verify the new behavior. I needed to adjust the old unit tests since we use the estimateMemoryUsage routine, which would compute 2*memoryBytes + 240 MB as the memory requirement. Previously, in the unit tests, we were simply using memoryBytes field value.

…in the model assignment planner" (#101853) The original PR #98874 missed the memory overhead adjustment from #86416. As it caused some BWC test failures on the CI, I reverted it in #101834. This PR reintegrates the functionality and extends the BWC integration test with the memory constant depending on the version of the old cluster.

…model assignment planner" This reverts commit 31ca2f7. The functionality of elastic#98874 is being removed from 8.12 because it means that models which were working successfully on 2GB nodes in 8.11 will no longer fit on 2GB nodes. This will be frustrating for trial users. Before 8.13 we need to do a more thorough assessment of which models will and won't fit on 2GB nodes as a result of better memory estimation. It may be possible to tweak the memory usage estimation so that we require more memory than 8.11 but not so much more that our recommended trial models no longer fit onto 2GB nodes.

valeriy42 added 3 commits August 24, 2023 17:42

comments on work scope

08e3c53

add memory estimation to AssignmentPlan.Deployment

96c15d3

Updated linear solver and rounding routines

6e00431

elasticsearchmachine added the v8.11.0 label Aug 25, 2023

fix unit test compilation errors

259f883

valeriy42 added >non-issue :ml Machine learning labels Aug 25, 2023

valeriy42 added 3 commits August 30, 2023 16:49

extend unit tests

613b92e

change memoryUsage to memoryBytes in deployments

dc8070d

fixing original unit tests

9f83f09

mattc58 added v8.12.0 and removed v8.11.0 labels Oct 4, 2023

valeriy42 added 5 commits October 6, 2023 10:56

Unit test for down scaling cluster

3ba327c

optimal allocation test works

be7cfcd

extend unit tests with new memory fields

0d6a11b

formatting

2a58efd

Merge branch 'main' of https://github.com/elastic/elasticsearch into …

f9c338b

…update-assignment-planner

valeriy42 changed the title ~~[ML] WIP Assignment Planner change~~ [ML] Assignment Planner change Oct 9, 2023

valeriy42 changed the title ~~[ML] Assignment Planner change~~ [ML] Use perAllocation and perDeployment memory usage in the model assignment planner Oct 9, 2023

valeriy42 added >enhancement and removed >non-issue labels Oct 9, 2023

valeriy42 marked this pull request as ready for review October 9, 2023 09:13

elasticsearchmachine added the Team:ML Meta label for the ML team label Oct 9, 2023

Update docs/changelog/98874.yaml

71756f8

valeriy42 added 2 commits October 9, 2023 11:13

Update .gitignore

6badb52

remove dead code

d9ba0bd

valeriy42 requested review from droberts195 and tveasey October 9, 2023 09:18

valeriy42 mentioned this pull request Oct 31, 2023

[ML] Refactor assignment planning code #101612

Closed

valeriy42 added 2 commits October 31, 2023 15:27

add references to the refactoring issue

8b3963d

fix integration test

b3eb3f4

valeriy42 requested review from droberts195 and tveasey October 31, 2023 14:36

fix forbidden api check

92753e8

droberts195 approved these changes Oct 31, 2023

View reviewed changes

assign models to node only when possible

b8f3511

Merge branch 'main' into update-assignment-planner

ccf5555

valeriy42 mentioned this pull request Nov 2, 2023

[ML] Better memory estimation for NLP models elastic/eland#568

Merged

tveasey approved these changes Nov 6, 2023

View reviewed changes

valeriy42 merged commit aa2f6e7 into elastic:main Nov 6, 2023

valeriy42 deleted the update-assignment-planner branch November 6, 2023 11:18

valeriy42 restored the update-assignment-planner branch November 6, 2023 14:33

valeriy42 added a commit that referenced this pull request Nov 6, 2023

Revert "[ML] Use perAllocation and perDeployment memory usage in the …

9906009

…model assignment planner (#98874)" This reverts commit aa2f6e7.

valeriy42 mentioned this pull request Nov 6, 2023

Revert "[ML] Use perAllocation and perDeployment memory usage in the model assignment planner" #101834

Merged

valeriy42 mentioned this pull request Nov 7, 2023

Revert Revert "[ML] Use perAllocation and perDeployment memory usage in the model assignment planner" #101853

Merged

williamrandolph mentioned this pull request Nov 8, 2023

[CI] MlAssignmentPlannerUpgradeIT testMlAssignmentPlannerUpgrade failing #101926

Closed

droberts195 mentioned this pull request Dec 11, 2023

Revert "[ML] Use perAllocation and perDeployment memory usage in the model assignment planner" #103283

Closed

droberts195 added v8.13.0 v8.12.0 and removed v8.12.0 v8.13.0 labels Dec 11, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ML] Use perAllocation and perDeployment memory usage in the model assignment planner#98874

[ML] Use perAllocation and perDeployment memory usage in the model assignment planner#98874
valeriy42 merged 42 commits intoelastic:mainfrom
valeriy42:update-assignment-planner

valeriy42 commented Aug 25, 2023 •

edited

Loading

Uh oh!

elasticsearchmachine commented Oct 9, 2023

Uh oh!

elasticsearchmachine commented Oct 9, 2023

Uh oh!

droberts195 left a comment

Uh oh!

droberts195 commented Nov 1, 2023

Uh oh!

valeriy42 commented Nov 2, 2023

Uh oh!

tveasey left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Conversation

valeriy42 commented Aug 25, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

elasticsearchmachine commented Oct 9, 2023

Uh oh!

elasticsearchmachine commented Oct 9, 2023

Uh oh!

droberts195 left a comment

Choose a reason for hiding this comment

Uh oh!

droberts195 commented Nov 1, 2023

Uh oh!

valeriy42 commented Nov 2, 2023

Uh oh!

tveasey left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

valeriy42 commented Aug 25, 2023 •

edited

Loading