Skip to content

Ensure all autologging callbacks are picklable#5039

Merged
harupy merged 20 commits into
mlflow:masterfrom
harupy:fix-autolog-callbacks
Nov 17, 2021
Merged

Ensure all autologging callbacks are picklable#5039
harupy merged 20 commits into
mlflow:masterfrom
harupy:fix-autolog-callbacks

Conversation

@harupy

@harupy harupy commented Nov 10, 2021

Copy link
Copy Markdown
Member

Signed-off-by: harupy hkawamura0130@gmail.com

What changes are proposed in this pull request?

Ensure all autologging callbacks are picklable.

How is this patch tested?

Unit tests

Does this PR change the documentation?

  • No. You can skip the rest of this section.
  • Yes. Make sure the changed pages / sections render correctly by following the steps below.
  1. Check the status of the ci/circleci: build_doc check. If it's successful, proceed to the
    next step, otherwise fix it.
  2. Click Details on the right to open the job page of CircleCI.
  3. Click the Artifacts tab.
  4. Click docs/build/html/index.html.
  5. Find the changed pages / sections and make sure they render correctly.

Release Notes

Is this a user-facing change?

  • No. You can skip the rest of this section.
  • Yes. Give a description of this change to be included in the release notes for MLflow users.

(Details in 1-2 sentences. You can just refer to another PR with a description if this PR is part of a larger change.)

What component(s), interfaces, languages, and integrations does this PR affect?

Components

  • area/artifacts: Artifact stores and artifact logging
  • area/build: Build and test infrastructure for MLflow
  • area/docs: MLflow documentation pages
  • area/examples: Example code
  • area/model-registry: Model Registry service, APIs, and the fluent client calls for Model Registry
  • area/models: MLmodel format, model serialization/deserialization, flavors
  • area/projects: MLproject format, project running backends
  • area/scoring: MLflow Model server, model deployment tools, Spark UDFs
  • area/server-infra: MLflow Tracking server backend
  • area/tracking: Tracking Service, tracking client APIs, autologging

Interface

  • area/uiux: Front-end, user experience, plotting, JavaScript, JavaScript dev server
  • area/docker: Docker use across MLflow's components, such as MLflow Projects and MLflow Models
  • area/sqlalchemy: Use of SQLAlchemy in the Tracking Service or Model Registry
  • area/windows: Windows support

Language

  • language/r: R APIs and clients
  • language/java: Java APIs and clients
  • language/new: Proposals for new client languages

Integrations

  • integrations/azure: Azure and Azure ML integrations
  • integrations/sagemaker: SageMaker integrations
  • integrations/databricks: Databricks integrations

How should the PR be classified in the release notes? Choose one:

  • rn/breaking-change - The PR will be mentioned in the "Breaking Changes" section
  • rn/none - No description will be included. The PR will be mentioned only by the PR number in the "Small Bugfixes and Documentation Updates" section
  • rn/feature - A new user-facing feature worth mentioning in the release notes
  • rn/bug-fix - A user-facing bug fix worth mentioning in the release notes
  • rn/documentation - A user-facing documentation change worth mentioning in the release notes

@github-actions github-actions Bot added the rn/none List under Small Changes in Changelogs. label Nov 10, 2021
Comment thread tests/xgboost/test_xgboost_autolog.py Outdated
@harupy harupy changed the title [WIP] Investigate autologging integrations that don't work with multiprocessing Add autologging test for multiprocessing Nov 10, 2021
Comment thread mlflow/ml-package-versions.yml Outdated
@harupy harupy force-pushed the fix-autolog-callbacks branch from 9a9a895 to 7618e83 Compare November 12, 2021 04:33
Signed-off-by: harupy <hkawamura0130@gmail.com>
@harupy harupy force-pushed the fix-autolog-callbacks branch from 2f0ffed to 61bd6ad Compare November 15, 2021 02:10
@harupy

harupy commented Nov 15, 2021

Copy link
Copy Markdown
Member Author

Fastai doesn't seem to support distributed training using CPUs:

https://docs.fast.ai/distributed.html

@harupy

harupy commented Nov 15, 2021

Copy link
Copy Markdown
Member Author

Distributed training in MXnet seems pretty complex:

https://mxnet.apache.org/versions/1.8.0/api/faq/distributed_training

@harupy

harupy commented Nov 15, 2021

Copy link
Copy Markdown
Member Author

For sklearn, we already have a test for parallelised training:

with sklearn.utils.parallel_backend(backend=backend):
cv_model.fit(X, y)

Is there anything else that we need to test for scikit-learn?

Signed-off-by: harupy <hkawamura0130@gmail.com>
Signed-off-by: harupy <hkawamura0130@gmail.com>
Signed-off-by: harupy <hkawamura0130@gmail.com>
Signed-off-by: harupy <hkawamura0130@gmail.com>
Signed-off-by: harupy <hkawamura0130@gmail.com>
Signed-off-by: harupy <hkawamura0130@gmail.com>
Signed-off-by: harupy <hkawamura0130@gmail.com>
Signed-off-by: harupy <hkawamura0130@gmail.com>
Signed-off-by: harupy <hkawamura0130@gmail.com>
Signed-off-by: harupy <hkawamura0130@gmail.com>
@harupy harupy changed the title Add autologging test for multiprocessing Ensure all autologging callbacks are picklable Nov 16, 2021

@dbczumar dbczumar left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@harupy Looks great! Can we address LightGBM as well?

@dbczumar

Copy link
Copy Markdown
Collaborator

For sklearn, we already have a test for parallelised training:

with sklearn.utils.parallel_backend(backend=backend):
cv_model.fit(X, y)

Is there anything else that we need to test for scikit-learn?

Does this test case use a multiprocess backend?

@harupy

harupy commented Nov 16, 2021

Copy link
Copy Markdown
Member Author

@dbczumar Thanks for the comment!

Does this test case use a multiprocess backend?

We run the test using lowky which is a multi-processing based backend (doc):

@pytest.mark.parametrize("backend", [None, "threading", "loky"])
@pytest.mark.parametrize("max_tuning_runs", [None, 3])
def test_parameter_search_estimators_produce_expected_outputs(
cv_class, search_space, backend, max_tuning_runs
):
mlflow.sklearn.autolog(
log_input_examples=True, log_model_signatures=True, max_tuning_runs=max_tuning_runs,
)
svc = sklearn.svm.SVC()
cv_model = cv_class(svc, search_space, n_jobs=5, return_train_score=True)
X, y = get_iris()
def train_cv_model():
if backend is None:
cv_model.fit(X, y)
else:
with sklearn.utils.parallel_backend(backend=backend):
cv_model.fit(X, y)

@dbczumar

Copy link
Copy Markdown
Collaborator

@dbczumar Thanks for the comment!

Does this test case use a multiprocess backend?

We run the test using lowky which is a multi-processing based backend (doc):

@pytest.mark.parametrize("backend", [None, "threading", "loky"])
@pytest.mark.parametrize("max_tuning_runs", [None, 3])
def test_parameter_search_estimators_produce_expected_outputs(
cv_class, search_space, backend, max_tuning_runs
):
mlflow.sklearn.autolog(
log_input_examples=True, log_model_signatures=True, max_tuning_runs=max_tuning_runs,
)
svc = sklearn.svm.SVC()
cv_model = cv_class(svc, search_space, n_jobs=5, return_train_score=True)
X, y = get_iris()
def train_cv_model():
if backend is None:
cv_model.fit(X, y)
else:
with sklearn.utils.parallel_backend(backend=backend):
cv_model.fit(X, y)

Sounds good!

Signed-off-by: harupy <hkawamura0130@gmail.com>
Signed-off-by: harupy <hkawamura0130@gmail.com>
Signed-off-by: harupy <hkawamura0130@gmail.com>
@harupy

harupy commented Nov 16, 2021

Copy link
Copy Markdown
Member Author

@dbczumar

Can we address LightGBM as well?

Done!

Signed-off-by: harupy <hkawamura0130@gmail.com>
@harupy harupy force-pushed the fix-autolog-callbacks branch from 9c27efe to 6d2c7ef Compare November 16, 2021 10:34
Comment on lines +64 to +72
def picklable_exception_safe_function(function):
"""
Wraps the specified function with broad exception handling to guard
against unexpected errors during autologging while preserving picklability.
"""
if is_testing():
setattr(function, _ATTRIBUTE_EXCEPTION_SAFE, True)

return update_wrapper_extended(functools.partial(_safe_function, function), function)

@harupy harupy Nov 16, 2021

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dbczumar I tried using picklable_exception_safe_function in _ExceptionSafeClass:

diff --git a/mlflow/utils/autologging_utils/safety.py b/mlflow/utils/autologging_utils/safety.py
index 11ffde645..4b15e38ea 100644
--- a/mlflow/utils/autologging_utils/safety.py
+++ b/mlflow/utils/autologging_utils/safety.py
@@ -96,7 +96,7 @@ def _exception_safe_class_factory(base_class):
     class _ExceptionSafeClass(base_class):
         def __new__(cls, name, bases, dct):
             for m in dct:
                 # class methods or static methods are not callable.
                 if callable(dct[m]):
-                    dct[m] = exception_safe_function(dct[m])
+                    dct[m] = picklable_exception_safe_function(dct[m])
             return base_class.__new__(cls, name, bases, dct)
 
     return _ExceptionSafeClass

but this gave:

% pytest tests/xgboost/test_xgboost_autolog.py -k is_pickable
============================================================ test session starts =============================================================
platform linux -- Python 3.7.10, pytest-6.2.5, py-1.10.0, pluggy-1.0.0 -- /home/haru/miniconda3/envs/mlflow-dev-env/bin/python
cachedir: .pytest_cache
rootdir: /home/haru/Desktop/repositories/mlflow, configfile: pytest.ini
collected 29 items / 27 deselected / 2 selected                                                                                              

tests/xgboost/test_xgboost_autolog.py::test_callback_func_is_pickable PASSED                                                           [ 50%]
tests/xgboost/test_xgboost_autolog.py::test_callback_class_is_pickable FAILED                                                          [100%]

================================================================== FAILURES ==================================================================
______________________________________________________ test_callback_class_is_pickable _______________________________________________________

    @pytest.mark.skipif(
        Version(xgb.__version__.replace("SNAPSHOT", "dev")) < Version("1.3.0"),
        reason="`xgboost.callback.TrainingCallback` is not supported",
    )
    def test_callback_class_is_pickable():
        from mlflow.xgboost._autolog import AutologCallback
    
>       cb = AutologCallback(BatchMetricsLogger(run_id="1234"), eval_results={})

AutologCallback = <class 'mlflow.xgboost._autolog.AutologCallback'>

tests/xgboost/test_xgboost_autolog.py:577: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

function = <function AutologCallback.__init__ at 0x7fd1c2699dd0>
args = (<mlflow.utils.autologging_utils.BatchMetricsLogger object at 0x7fd19e0ac890>,), kwargs = {'eval_results': {}}

    def _safe_function(function, *args, **kwargs):
        try:
>           return function(*args, **kwargs)
E           TypeError: __init__() missing 1 required positional argument: 'metrics_logger'

args       = (<mlflow.utils.autologging_utils.BatchMetricsLogger object at 0x7fd19e0ac890>,)
function   = <function AutologCallback.__init__ at 0x7fd1c2699dd0>
kwargs     = {'eval_results': {}}

mlflow/utils/autologging_utils/safety.py:56: TypeError

It appears that functools.partial alters the __init__ behavior. I'm investigating a workaround.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@harupy This solution makes sense once we can resolve the __init__ issues. Let's also make sure to remove exception_safe_function and replace it with picklable_exception_safe_function.

@harupy harupy Nov 17, 2021

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This SO post says:

partial objects are like function objects in that they are callable, weak referencable, and can have attributes. There are some important differences. For instance, the name and doc attributes are not created automatically. Also, partial objects defined in classes behave like static methods and do not transform into bound methods during instance attribute look-up.

This SO post suggests using functools.partialmethod, but this makes a class non-callable:

__________________________________________________________________________________ test_callback_class_is_pickable __________________________________________________________________________________

    @pytest.mark.skipif(
        not IS_TRAINING_CALLBACK_SUPPORTED,
        reason="`xgboost.callback.TrainingCallback` is not supported",
    )
    def test_callback_class_is_pickable():
        from mlflow.xgboost._autolog import AutologCallback
    
>       cb = AutologCallback(BatchMetricsLogger(run_id="1234"), eval_results={})

AutologCallback = <class 'mlflow.xgboost._autolog.AutologCallback'>

tests/xgboost/test_xgboost_autolog.py:577: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

function = <mlflow.xgboost._autolog.AutologCallback object at 0x7f8a8bb92850>
args = (<function AutologCallback.__init__ at 0x7f8ab0311050>, <mlflow.utils.autologging_utils.BatchMetricsLogger object at 0x7f8a8bcf04d0>), kwargs = {'eval_results': {}}

    def _safe_function(function, *args, **kwargs):
        try:
>           return function(*args, **kwargs)
E           TypeError: 'AutologCallback' object is not callable

args       = (<function AutologCallback.__init__ at 0x7f8ab0311050>, <mlflow.utils.autologging_utils.BatchMetricsLogger object at 0x7f8a8bcf04d0>)
function   = <mlflow.xgboost._autolog.AutologCallback object at 0x7f8a8bb92850>
kwargs     = {'eval_results': {}}

Git diff:

diff --git a/mlflow/utils/autologging_utils/safety.py b/mlflow/utils/autologging_utils/safety.py
index 11ffde645..e76932f7c 100644
--- a/mlflow/utils/autologging_utils/safety.py
+++ b/mlflow/utils/autologging_utils/safety.py
@@ -72,6 +72,17 @@ def picklable_exception_safe_function(function):
     return update_wrapper_extended(functools.partial(_safe_function, function), function)
 
 
+def picklable_exception_safe_method(function):
+    """
+    Wraps the specified function with broad exception handling to guard
+    against unexpected errors during autologging while preserving picklability.
+    """
+    if is_testing():
+        setattr(function, _ATTRIBUTE_EXCEPTION_SAFE, True)
+
+    return update_wrapper_extended(functools.partialmethod(_safe_function, function), function)
+
+
 def _exception_safe_class_factory(base_class):
     """
     Creates an exception safe metaclass that inherits from `base_class`.
@@ -96,7 +107,7 @@ def _exception_safe_class_factory(base_class):
             for m in dct:
                 # class methods or static methods are not callable.
                 if callable(dct[m]):
-                    dct[m] = exception_safe_function(dct[m])
+                    dct[m] = picklable_exception_safe_method(dct[m])
             return base_class.__new__(cls, name, bases, dct)
 
     return _ExceptionSafeClass
                                 

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. Let's ignore https://github.com/mlflow/mlflow/pull/5039/files#r750767771 and use a separate function for methods defined on classes :). Thanks @harupy !

Signed-off-by: harupy <hkawamura0130@gmail.com>
Signed-off-by: harupy <hkawamura0130@gmail.com>
Signed-off-by: harupy <hkawamura0130@gmail.com>

@dbczumar dbczumar left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks a bunch, @harupy!

Signed-off-by: harupy <hkawamura0130@gmail.com>
Signed-off-by: harupy <hkawamura0130@gmail.com>
@harupy harupy merged commit 27b3cd2 into mlflow:master Nov 17, 2021
@harupy harupy deleted the fix-autolog-callbacks branch November 17, 2021 04:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

rn/none List under Small Changes in Changelogs.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants