[ML] Better memory estimation for NLP models#568
Conversation
|
@qherreros , since I ported your code for memory estimation, would you mind looking at the relevant part in |
5647287 to
f618672
Compare
|
@davidkyle could you please look at the PR since you have the best overview of the |
davidkyle
left a comment
There was a problem hiding this comment.
Looks good. Please add an assertion to the model config creation tests that these new settings are present and have sensible values
https://github.com/elastic/eland/blob/main/tests/ml/pytorch/test_pytorch_model_config_pytest.py#L149
|
Thank you for the review @davidkyle . I addressed your comments, it would be great if you could take another look. |
|
@pquentin do you know why the read the docs build has started failing I notice the build is using Python 3.12.0 which could have something to do with it |
Yes, we were using the latest Python version supported by Read the Docs and it recently became Python 3.12. It was failing to build numpy because we pin it to an older version that does not support Python 3.12. #627 fixes this by asking Python 3.10 explicitly. |
|
Thanks for fixing the docs build @pquentin |
This PR adds an ability to estimate per deployment and per allocation memory usage of NLP transformer models. It uses
torch.profilerand performs logs the peak memory usage during the inference.This information is then used in Elasticsearch to provision models with sufficient memory (elastic/elasticsearch#98874).