[FR] Allow argument for 'local_dst_path' when loading Pyfunc Models #4154

ameya-parab · 2021-03-01T23:28:36Z

Willingness to contribute

The MLflow Community encourages new feature contributions. Would you or another member of your organization be willing to contribute an implementation of this feature (either as an MLflow Plugin or an enhancement to the MLflow code base)?

Yes. I can contribute this feature independently.
Yes. I would be willing to contribute this feature with guidance from the MLflow community.
No. I cannot contribute this feature at this time.

Proposal Summary

Currently, the 'mlflow.pyfunc.load_model(MODEL_URI)' just accepts the remote model URI (S3 in our case) when trying to load a Python flavored MLFlow model and its artifacts. As part of this call, the load_model method, downloads the artifacts registered when logging the MLFLow model to a temporary directory in the local filesystem for serving. I would like to open a feature request, to allow specifying a local path when calling the load_model function, this would enable the users to download the artifacts and the model to a specific location for further analysis.

Motivation

What is the use case for this feature?
- Enables downloading the remote model and its artifacts to a specified location which caters to reduced model loading
times as the model is directly loaded from a local file path and can be reused by other programs, if required.
Why is this use case valuable to support for MLflow users in general?
- The feature will reduce the overall model serving time for large models as the artifacts and the model itself would be
available at a local filesystem path for multiple programs to use.
Why is this use case valuable to support for your project(s) or organization?
- This will help us to tackle the long loading times when iniializing the model serving framework as our production
deployment consists of ensemble of models rather than a single model.
Why is it currently difficult to achieve this use case?
- We have a workaround in place to shutil the models and their artifacts from the temporary directories to a different
location everytime it is initialized by a program, this is not optimized and can be error prone.

What component(s), interfaces, languages, and integrations does this feature affect?

Components

Interfaces

area/uiux: Front-end, user experience, JavaScript, plotting
area/docker: Docker use across MLflow's components, such as MLflow Projects and MLflow Models
area/sqlalchemy: Use of SQLAlchemy in the Tracking Service or Model Registry
area/windows: Windows support

Languages

language/r: R APIs and clients
language/java: Java APIs and clients
language/new: Proposals for new client languages

Integrations

integrations/azure: Azure and Azure ML integrations
integrations/sagemaker: SageMaker integrations
integrations/databricks: Databricks integrations

Details

import mlflow

LOGGED_MODEL_S3_URI : "s3://buckeet/path/to/the/model/artifacts"

# Current API Call
model = mlflow.pyfunc.load_model(LOGGED_MODEL_S3_URI)

# Proposed API Call
model = mlflow.pyfunc.load_model(
                    LOGGED_MODEL_S3_URI, 
                    local_dst_path=os.path.join(os.path.expanduser("~"), "model")
               )

Proposed solution

Changes in pyfunc.init.py

# Current load_model implementation

def load_model(model_uri: str, suppress_warnings: bool = True) -> PyFuncModel:
    """
    Load a model stored in Python function format.

    :param model_uri: The location, in URI format, of the MLflow model. For example:

                      - ``/Users/me/path/to/local/model``
                      - ``relative/path/to/local/model``
                      - ``s3://my_bucket/path/to/model``
                      - ``runs:/<mlflow_run_id>/run-relative/path/to/model``
                      - ``models:/<model_name>/<model_version>``
                      - ``models:/<model_name>/<stage>``

                      For more information about supported URI schemes, see
                      `Referencing Artifacts <https://www.mlflow.org/docs/latest/concepts.html#
                      artifact-locations>`_.
    :param suppress_warnings: If ``True``, non-fatal warning messages associated with the model
                              loading process will be suppressed. If ``False``, these warning
                              messages will be emitted.
    """
    local_path = _download_artifact_from_uri(artifact_uri=model_uri)
    model_meta = Model.load(os.path.join(local_path, MLMODEL_FILE_NAME))

    conf = model_meta.flavors.get(FLAVOR_NAME)
    if conf is None:
        raise MlflowException(
            'Model does not have the "{flavor_name}" flavor'.format(flavor_name=FLAVOR_NAME),
            RESOURCE_DOES_NOT_EXIST,
        )
    model_py_version = conf.get(PY_VERSION)
    if not suppress_warnings:
        _warn_potentially_incompatible_py_version_if_necessary(model_py_version=model_py_version)
    if CODE in conf and conf[CODE]:
        code_path = os.path.join(local_path, conf[CODE])
        mlflow.pyfunc.utils._add_code_to_system_path(code_path=code_path)
    data_path = os.path.join(local_path, conf[DATA]) if (DATA in conf) else local_path
    model_impl = importlib.import_module(conf[MAIN])._load_pyfunc(data_path)
    return PyFuncModel(model_meta=model_meta, model_impl=model_impl)


# Proposed load_model implementation

def load_model(model_uri: str, local_dst_path: str = None, suppress_warnings: bool = True) -> PyFuncModel:
    """
    Load a model stored in Python function format.

    :param model_uri: The location, in URI format, of the MLflow model. For example:

                      - ``/Users/me/path/to/local/model``
                      - ``relative/path/to/local/model``
                      - ``s3://my_bucket/path/to/model``
                      - ``runs:/<mlflow_run_id>/run-relative/path/to/model``
                      - ``models:/<model_name>/<model_version>``
                      - ``models:/<model_name>/<stage>``

                      For more information about supported URI schemes, see
                      `Referencing Artifacts <https://www.mlflow.org/docs/latest/concepts.html#
                      artifact-locations>`_.
    :param local_dst_path: The local file system path to download model and its artifacts. 
                              Defaults to `None`.
    :param suppress_warnings: If ``True``, non-fatal warning messages associated with the model
                              loading process will be suppressed. If ``False``, these warning
                              messages will be emitted.
    """
    local_path = _download_artifact_from_uri(artifact_uri=model_uri, output_path=local_dst_path)
    model_meta = Model.load(os.path.join(local_path, MLMODEL_FILE_NAME))

    conf = model_meta.flavors.get(FLAVOR_NAME)
    if conf is None:
        raise MlflowException(
            'Model does not have the "{flavor_name}" flavor'.format(flavor_name=FLAVOR_NAME),
            RESOURCE_DOES_NOT_EXIST,
        )
    model_py_version = conf.get(PY_VERSION)
    if not suppress_warnings:
        _warn_potentially_incompatible_py_version_if_necessary(model_py_version=model_py_version)
    if CODE in conf and conf[CODE]:
        code_path = os.path.join(local_path, conf[CODE])
        mlflow.pyfunc.utils._add_code_to_system_path(code_path=code_path)
    data_path = os.path.join(local_path, conf[DATA]) if (DATA in conf) else local_path
    model_impl = importlib.import_module(conf[MAIN])._load_pyfunc(data_path)
    return PyFuncModel(model_meta=model_meta, model_impl=model_impl)

The text was updated successfully, but these errors were encountered:

dmatrix · 2021-03-05T17:36:25Z

@ameyaparab28 Thanks for filing this feature request. By local filesystem, you mean FS available on your production server machine. Do you want to load those models from that local FS path on the production server's FS rather than loading it from the S3 for each model?

This seems like a good first issue since most of what you suggest might work. Do you want to contribute?

cc: @harupy

ameya-parab · 2021-03-09T19:20:46Z

@dmatrix @harupy - Yes, by local file system I meant the file system on the Production Server. The reasoning behind opening the FR, was that we would ideally like to download the model/s once from the remote URI to the Production Server's file system and use it multiple times in different applications.

I can take up the issue, but would require some guidance for testing and merging it onto the master branch.

ankh6 · 2021-03-30T21:09:49Z

Hey @dmatrix I'd like to make my first contribution if the issue is free
If it is the case, I'll work on it :)

dmatrix · 2021-04-01T18:51:03Z

@ankh6 Yes, we welcome contributions. This is an API change so we have to ensure backward compatibility too.
@ameya-parab are you working on this?

ameya-parab · 2021-04-02T15:24:21Z

@dmatrix, I was planning on working on the issue and would create a PR in a couple of weeks. Thanks!

ameya-parab · 2021-04-05T02:18:20Z

@dmatrix ...I was working on running through the linter and unit tests before raising a PR but ran into a bunch of Package import and other errors. I have pasted the bash output below:

source lint.sh

========== pycodestyle ==========

mlflow tests:1:1: E902 FileNotFoundError: [Errno 2] No such file or directory: 'mlflow tests'

========== pylint ==========

Using config file /Users/parabam1/Projects/OpenSource/mlflow/pylintrc
************* Module mlflow.mlflow
mlflow/__init__.py (30,0): [E0611 no-name-in-module] No name 'version' in module 'mlflow'

.
.
.


========== rstcheck ==========

Traceback (most recent call last):
  File "/Users/parabam1/miniconda3/envs/mlflow-dev-env/bin/rstcheck", line 8, in <module>
    sys.exit(main())
  File "/Users/parabam1/miniconda3/envs/mlflow-dev-env/lib/python3.6/site-packages/rstcheck.py", line 918, in main
    with enable_sphinx_if_possible():
  File "/Users/parabam1/miniconda3/envs/mlflow-dev-env/lib/python3.6/contextlib.py", line 81, in __enter__
    return next(self.gen)
  File "/Users/parabam1/miniconda3/envs/mlflow-dev-env/lib/python3.6/site-packages/rstcheck.py", line 876, in enable_sphinx_if_possible
    status=None)
  File "/Users/parabam1/miniconda3/envs/mlflow-dev-env/lib/python3.6/site-packages/sphinx/application.py", line 172, in __init__
    raise ApplicationError(__('Source directory and destination '
sphinx.errors.ApplicationError: Source directory and destination directory cannot be identical

One of the previous steps failed, check above

I have setup my development env using the instructions mentioned over here.

dmatrix · 2021-04-06T15:59:44Z

@ameya-parab can you try resync with the master and merge changes into your branch?
Also, make sure you have all the requirements for your dev environment.

ameya-parab added the enhancement label Mar 1, 2021

github-actions bot added area/artifacts area/models labels Mar 1, 2021

ameya-parab changed the title ~~[FR] Allow argument for 'local_dest_path' when loading Pyfunc Models~~ [FR] Allow argument for 'local_dst_path' when loading Pyfunc Models Mar 1, 2021

dmatrix added good first issue needs author feedback labels Mar 5, 2021

stale bot removed the needs author feedback label Mar 9, 2021

dmatrix added the needs author feedback label Apr 1, 2021

stale bot removed the needs author feedback label Apr 2, 2021

mlflow / mlflow

[FR] Allow argument for 'local_dst_path' when loading Pyfunc Models #4154

[FR] Allow argument for 'local_dst_path' when loading Pyfunc Models #4154

ameya-parab commented Mar 1, 2021 •

edited by dmatrix

dmatrix commented Mar 5, 2021

ameya-parab commented Mar 9, 2021

ankh6 commented Mar 30, 2021 •

edited

dmatrix commented Apr 1, 2021

ameya-parab commented Apr 2, 2021

ameya-parab commented Apr 5, 2021

dmatrix commented Apr 6, 2021 •

edited

mlflow / mlflow

[FR] Allow argument for 'local_dst_path' when loading Pyfunc Models #4154

[FR] Allow argument for 'local_dst_path' when loading Pyfunc Models #4154

Comments

ameya-parab commented Mar 1, 2021 • edited by dmatrix

Willingness to contribute

Proposal Summary

Motivation

What component(s), interfaces, languages, and integrations does this feature affect?

Details

Proposed solution

dmatrix commented Mar 5, 2021

ameya-parab commented Mar 9, 2021

ankh6 commented Mar 30, 2021 • edited

dmatrix commented Apr 1, 2021

ameya-parab commented Apr 2, 2021

ameya-parab commented Apr 5, 2021

dmatrix commented Apr 6, 2021 • edited

ameya-parab commented Mar 1, 2021 •

edited by dmatrix

ankh6 commented Mar 30, 2021 •

edited

dmatrix commented Apr 6, 2021 •

edited