Ensure all autologging callbacks are picklable by harupy · Pull Request #5039 · mlflow/mlflow

harupy · 2021-11-10T07:57:00Z

Signed-off-by: harupy hkawamura0130@gmail.com

What changes are proposed in this pull request?

Ensure all autologging callbacks are picklable.

How is this patch tested?

Unit tests

Does this PR change the documentation?

No. You can skip the rest of this section.
Yes. Make sure the changed pages / sections render correctly by following the steps below.

Check the status of the ci/circleci: build_doc check. If it's successful, proceed to the
next step, otherwise fix it.
Click Details on the right to open the job page of CircleCI.
Click the Artifacts tab.
Click docs/build/html/index.html.
Find the changed pages / sections and make sure they render correctly.

Release Notes

Is this a user-facing change?

No. You can skip the rest of this section.
Yes. Give a description of this change to be included in the release notes for MLflow users.

(Details in 1-2 sentences. You can just refer to another PR with a description if this PR is part of a larger change.)

What component(s), interfaces, languages, and integrations does this PR affect?

Components

Interface

area/uiux: Front-end, user experience, plotting, JavaScript, JavaScript dev server
area/docker: Docker use across MLflow's components, such as MLflow Projects and MLflow Models
area/sqlalchemy: Use of SQLAlchemy in the Tracking Service or Model Registry
area/windows: Windows support

Language

language/r: R APIs and clients
language/java: Java APIs and clients
language/new: Proposals for new client languages

Integrations

integrations/azure: Azure and Azure ML integrations
integrations/sagemaker: SageMaker integrations
integrations/databricks: Databricks integrations

How should the PR be classified in the release notes? Choose one:

rn/breaking-change - The PR will be mentioned in the "Breaking Changes" section
rn/none - No description will be included. The PR will be mentioned only by the PR number in the "Small Bugfixes and Documentation Updates" section
rn/feature - A new user-facing feature worth mentioning in the release notes
rn/bug-fix - A user-facing bug fix worth mentioning in the release notes
rn/documentation - A user-facing documentation change worth mentioning in the release notes

Signed-off-by: harupy <hkawamura0130@gmail.com>

harupy · 2021-11-15T05:31:50Z

Fastai doesn't seem to support distributed training using CPUs:

https://docs.fast.ai/distributed.html

harupy · 2021-11-15T05:32:49Z

Distributed training in MXnet seems pretty complex:

https://mxnet.apache.org/versions/1.8.0/api/faq/distributed_training

harupy · 2021-11-15T05:37:06Z

For sklearn, we already have a test for parallelised training:

mlflow/tests/sklearn/test_sklearn_autolog.py

Lines 732 to 733 in d6ae841

    
           with sklearn.utils.parallel_backend(backend=backend): 
        
               cv_model.fit(X, y)

Is there anything else that we need to test for scikit-learn?

Signed-off-by: harupy <hkawamura0130@gmail.com>

dbczumar

@harupy Looks great! Can we address LightGBM as well?

dbczumar · 2021-11-16T09:19:12Z

For sklearn, we already have a test for parallelised training:

mlflow/tests/sklearn/test_sklearn_autolog.py

Lines 732 to 733 in d6ae841

with sklearn.utils.parallel_backend(backend=backend):

cv_model.fit(X, y)

Is there anything else that we need to test for scikit-learn?

Does this test case use a multiprocess backend?

harupy · 2021-11-16T09:24:58Z

@dbczumar Thanks for the comment!

Does this test case use a multiprocess backend?

We run the test using lowky which is a multi-processing based backend (doc):

mlflow/tests/sklearn/test_sklearn_autolog.py

Lines 715 to 733 in d6ae841

    
           @pytest.mark.parametrize("backend", [None, "threading", "loky"]) 
        
           @pytest.mark.parametrize("max_tuning_runs", [None, 3]) 
        
           def test_parameter_search_estimators_produce_expected_outputs( 
        
               cv_class, search_space, backend, max_tuning_runs 
        
           ): 
        
               mlflow.sklearn.autolog( 
        
                   log_input_examples=True, log_model_signatures=True, max_tuning_runs=max_tuning_runs, 
        
               ) 
        
               svc = sklearn.svm.SVC() 
        
               cv_model = cv_class(svc, search_space, n_jobs=5, return_train_score=True) 
        
               X, y = get_iris() 
        
               def train_cv_model(): 
        
                   if backend is None: 
        
                       cv_model.fit(X, y) 
        
                   else: 
        
                       with sklearn.utils.parallel_backend(backend=backend): 
        
                           cv_model.fit(X, y)

dbczumar · 2021-11-16T09:31:57Z

@dbczumar Thanks for the comment!

Does this test case use a multiprocess backend?

We run the test using lowky which is a multi-processing based backend (doc):

mlflow/tests/sklearn/test_sklearn_autolog.py

Lines 715 to 733 in d6ae841

@pytest.mark.parametrize("backend", [None, "threading", "loky"])

@pytest.mark.parametrize("max_tuning_runs", [None, 3])

def test_parameter_search_estimators_produce_expected_outputs(

cv_class, search_space, backend, max_tuning_runs

):

mlflow.sklearn.autolog(

log_input_examples=True, log_model_signatures=True, max_tuning_runs=max_tuning_runs,

)

svc = sklearn.svm.SVC()

cv_model = cv_class(svc, search_space, n_jobs=5, return_train_score=True)

X, y = get_iris()

def train_cv_model():

if backend is None:

cv_model.fit(X, y)

else:

with sklearn.utils.parallel_backend(backend=backend):

cv_model.fit(X, y)

Sounds good!

Signed-off-by: harupy <hkawamura0130@gmail.com>

harupy · 2021-11-16T10:06:16Z

@dbczumar

Can we address LightGBM as well?

Done!

Signed-off-by: harupy <hkawamura0130@gmail.com>

harupy · 2021-11-16T10:37:54Z

+def picklable_exception_safe_function(function):
+    """
+    Wraps the specified function with broad exception handling to guard
+    against unexpected errors during autologging while preserving picklability.
+    """
+    if is_testing():
+        setattr(function, _ATTRIBUTE_EXCEPTION_SAFE, True)
+
+    return update_wrapper_extended(functools.partial(_safe_function, function), function)


@dbczumar I tried using picklable_exception_safe_function in _ExceptionSafeClass:

diff --git a/mlflow/utils/autologging_utils/safety.py b/mlflow/utils/autologging_utils/safety.py index 11ffde645..4b15e38ea 100644 --- a/mlflow/utils/autologging_utils/safety.py +++ b/mlflow/utils/autologging_utils/safety.py @@ -96,7 +96,7 @@ def _exception_safe_class_factory(base_class): class _ExceptionSafeClass(base_class): def __new__(cls, name, bases, dct): for m in dct: # class methods or static methods are not callable. if callable(dct[m]): - dct[m] = exception_safe_function(dct[m]) + dct[m] = picklable_exception_safe_function(dct[m]) return base_class.__new__(cls, name, bases, dct) return _ExceptionSafeClass

but this gave:

% pytest tests/xgboost/test_xgboost_autolog.py -k is_pickable ============================================================ test session starts ============================================================= platform linux -- Python 3.7.10, pytest-6.2.5, py-1.10.0, pluggy-1.0.0 -- /home/haru/miniconda3/envs/mlflow-dev-env/bin/python cachedir: .pytest_cache rootdir: /home/haru/Desktop/repositories/mlflow, configfile: pytest.ini collected 29 items / 27 deselected / 2 selected tests/xgboost/test_xgboost_autolog.py::test_callback_func_is_pickable PASSED [ 50%] tests/xgboost/test_xgboost_autolog.py::test_callback_class_is_pickable FAILED [100%] ================================================================== FAILURES ================================================================== ______________________________________________________ test_callback_class_is_pickable _______________________________________________________ @pytest.mark.skipif( Version(xgb.__version__.replace("SNAPSHOT", "dev")) < Version("1.3.0"), reason="`xgboost.callback.TrainingCallback` is not supported", ) def test_callback_class_is_pickable(): from mlflow.xgboost._autolog import AutologCallback > cb = AutologCallback(BatchMetricsLogger(run_id="1234"), eval_results={}) AutologCallback = <class 'mlflow.xgboost._autolog.AutologCallback'> tests/xgboost/test_xgboost_autolog.py:577: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ function = <function AutologCallback.__init__ at 0x7fd1c2699dd0> args = (<mlflow.utils.autologging_utils.BatchMetricsLogger object at 0x7fd19e0ac890>,), kwargs = {'eval_results': {}} def _safe_function(function, *args, **kwargs): try: > return function(*args, **kwargs) E TypeError: __init__() missing 1 required positional argument: 'metrics_logger' args = (<mlflow.utils.autologging_utils.BatchMetricsLogger object at 0x7fd19e0ac890>,) function = <function AutologCallback.__init__ at 0x7fd1c2699dd0> kwargs = {'eval_results': {}} mlflow/utils/autologging_utils/safety.py:56: TypeError

It appears that functools.partial alters the __init__ behavior. I'm investigating a workaround.

@harupy This solution makes sense once we can resolve the __init__ issues. Let's also make sure to remove exception_safe_function and replace it with picklable_exception_safe_function.

This SO post says:

partial objects are like function objects in that they are callable, weak referencable, and can have attributes. There are some important differences. For instance, the name and doc attributes are not created automatically. Also, partial objects defined in classes behave like static methods and do not transform into bound methods during instance attribute look-up.

This SO post suggests using functools.partialmethod, but this makes a class non-callable:

__________________________________________________________________________________ test_callback_class_is_pickable __________________________________________________________________________________ @pytest.mark.skipif( not IS_TRAINING_CALLBACK_SUPPORTED, reason="`xgboost.callback.TrainingCallback` is not supported", ) def test_callback_class_is_pickable(): from mlflow.xgboost._autolog import AutologCallback > cb = AutologCallback(BatchMetricsLogger(run_id="1234"), eval_results={}) AutologCallback = <class 'mlflow.xgboost._autolog.AutologCallback'> tests/xgboost/test_xgboost_autolog.py:577: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ function = <mlflow.xgboost._autolog.AutologCallback object at 0x7f8a8bb92850> args = (<function AutologCallback.__init__ at 0x7f8ab0311050>, <mlflow.utils.autologging_utils.BatchMetricsLogger object at 0x7f8a8bcf04d0>), kwargs = {'eval_results': {}} def _safe_function(function, *args, **kwargs): try: > return function(*args, **kwargs) E TypeError: 'AutologCallback' object is not callable args = (<function AutologCallback.__init__ at 0x7f8ab0311050>, <mlflow.utils.autologging_utils.BatchMetricsLogger object at 0x7f8a8bcf04d0>) function = <mlflow.xgboost._autolog.AutologCallback object at 0x7f8a8bb92850> kwargs = {'eval_results': {}}

Git diff:

diff --git a/mlflow/utils/autologging_utils/safety.py b/mlflow/utils/autologging_utils/safety.py index 11ffde645..e76932f7c 100644 --- a/mlflow/utils/autologging_utils/safety.py +++ b/mlflow/utils/autologging_utils/safety.py @@ -72,6 +72,17 @@ def picklable_exception_safe_function(function): return update_wrapper_extended(functools.partial(_safe_function, function), function) +def picklable_exception_safe_method(function): + """ + Wraps the specified function with broad exception handling to guard + against unexpected errors during autologging while preserving picklability. + """ + if is_testing(): + setattr(function, _ATTRIBUTE_EXCEPTION_SAFE, True) + + return update_wrapper_extended(functools.partialmethod(_safe_function, function), function) + + def _exception_safe_class_factory(base_class): """ Creates an exception safe metaclass that inherits from `base_class`. @@ -96,7 +107,7 @@ def _exception_safe_class_factory(base_class): for m in dct: # class methods or static methods are not callable. if callable(dct[m]): - dct[m] = exception_safe_function(dct[m]) + dct[m] = picklable_exception_safe_method(dct[m]) return base_class.__new__(cls, name, bases, dct) return _ExceptionSafeClass

Got it. Let's ignore https://github.com/mlflow/mlflow/pull/5039/files#r750767771 and use a separate function for methods defined on classes :). Thanks @harupy !

Signed-off-by: harupy <hkawamura0130@gmail.com>

dbczumar

LGTM! Thanks a bunch, @harupy!

Signed-off-by: harupy <hkawamura0130@gmail.com>

github-actions Bot added the rn/none List under Small Changes in Changelogs. label Nov 10, 2021

harupy requested review from WeichenXu123 and dbczumar November 10, 2021 07:57

harupy commented Nov 10, 2021

View reviewed changes

Comment thread tests/xgboost/test_xgboost_autolog.py Outdated

harupy changed the title ~~[WIP] Investigate autologging integrations that don't work with multiprocessing~~ Add autologging test for multiprocessing Nov 10, 2021

WeichenXu123 reviewed Nov 11, 2021

View reviewed changes

Comment thread mlflow/ml-package-versions.yml Outdated

harupy force-pushed the fix-autolog-callbacks branch from 9a9a895 to 7618e83 Compare November 12, 2021 04:33

revert unrelated changes

61bd6ad

Signed-off-by: harupy <hkawamura0130@gmail.com>

harupy force-pushed the fix-autolog-callbacks branch from 2f0ffed to 61bd6ad Compare November 15, 2021 02:10

harupy added 10 commits November 16, 2021 10:45

use __init__.py

508e572

Signed-off-by: harupy <hkawamura0130@gmail.com>

test callback picklability

aaa9517

Signed-off-by: harupy <hkawamura0130@gmail.com>

conditionally define AutologCallback

9bc3e9b

Signed-off-by: harupy <hkawamura0130@gmail.com>

simplify tests

642b211

Signed-off-by: harupy <hkawamura0130@gmail.com>

simplify tests again

cc42bee

Signed-off-by: harupy <hkawamura0130@gmail.com>

fix imports

74d1bab

Signed-off-by: harupy <hkawamura0130@gmail.com>

use __init__.py for gluon

3955d16

Signed-off-by: harupy <hkawamura0130@gmail.com>

gluon

8071406

Signed-off-by: harupy <hkawamura0130@gmail.com>

use __init__.py

ff25a6f

Signed-off-by: harupy <hkawamura0130@gmail.com>

tensorflow

c733669

Signed-off-by: harupy <hkawamura0130@gmail.com>

harupy changed the title ~~Add autologging test for multiprocessing~~ Ensure all autologging callbacks are picklable Nov 16, 2021

dbczumar reviewed Nov 16, 2021

View reviewed changes

harupy added 3 commits November 16, 2021 18:55

fix xgboost

58330d8

Signed-off-by: harupy <hkawamura0130@gmail.com>

lightgbm

ff74cc2

Signed-off-by: harupy <hkawamura0130@gmail.com>

add fastai test

d59d073

Signed-off-by: harupy <hkawamura0130@gmail.com>

update docstring

6d2c7ef

Signed-off-by: harupy <hkawamura0130@gmail.com>

harupy force-pushed the fix-autolog-callbacks branch from 9c27efe to 6d2c7ef Compare November 16, 2021 10:34

harupy commented Nov 16, 2021

View reviewed changes

refactor xgboost files

866b150

Signed-off-by: harupy <hkawamura0130@gmail.com>

harupy requested review from WeichenXu123 and dbczumar November 16, 2021 10:49

harupy added 2 commits November 17, 2021 03:39

fix skipif condition

a00b1d1

Signed-off-by: harupy <hkawamura0130@gmail.com>

skip test_tf_keras_autolog_distributed_training

5afc66a

Signed-off-by: harupy <hkawamura0130@gmail.com>

dbczumar approved these changes Nov 17, 2021

View reviewed changes

harupy added 2 commits November 17, 2021 12:21

rename exception_safe_function

f3c40e2

Signed-off-by: harupy <hkawamura0130@gmail.com>

replace remain with be

a9f1ecc

Signed-off-by: harupy <hkawamura0130@gmail.com>

harupy merged commit 27b3cd2 into mlflow:master Nov 17, 2021

harupy deleted the fix-autolog-callbacks branch November 17, 2021 04:51

jwyyy mentioned this pull request Nov 17, 2021

Autologging functionality for scikit-learn integration with XGBoost (Part 2) #5078

Merged

29 tasks

Conversation

harupy commented Nov 10, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes are proposed in this pull request?

How is this patch tested?

Does this PR change the documentation?

Release Notes

Is this a user-facing change?

What component(s), interfaces, languages, and integrations does this PR affect?

How should the PR be classified in the release notes? Choose one:

Uh oh!

Uh oh!

Uh oh!

harupy commented Nov 15, 2021

Uh oh!

harupy commented Nov 15, 2021

Uh oh!

harupy commented Nov 15, 2021

Uh oh!

dbczumar left a comment

Choose a reason for hiding this comment

Uh oh!

dbczumar commented Nov 16, 2021

Uh oh!

harupy commented Nov 16, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dbczumar commented Nov 16, 2021

Uh oh!

harupy commented Nov 16, 2021

Uh oh!

harupy Nov 16, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dbczumar Nov 16, 2021

Choose a reason for hiding this comment

Uh oh!

harupy Nov 17, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dbczumar Nov 17, 2021

Choose a reason for hiding this comment

Uh oh!

dbczumar left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

harupy commented Nov 10, 2021 •

edited

Loading

harupy commented Nov 16, 2021 •

edited

Loading

harupy Nov 16, 2021 •

edited

Loading

harupy Nov 17, 2021 •

edited

Loading