Fix a leaked run error in `test_pytorch_test_metrics_logged` by harupy · Pull Request #4302 · mlflow/mlflow

harupy · 2021-04-29T12:57:23Z

Signed-off-by: harupy 17039389+harupy@users.noreply.github.com

What changes are proposed in this pull request?

Fix a leaked run error in test_pytorch_test_metrics_logged.

https://github.com/mlflow/mlflow/runs/2464693969?check_suite_focus=true#step:10:23

==================================== ERRORS ====================================
____________ ERROR at teardown of test_pytorch_test_metrics_logged _____________

    @pytest.fixture(autouse=True)
    def clean_up_leaked_runs():
        """
        Certain test cases validate safety API behavior when runs are leaked. Leaked runs that
        are not cleaned up between test cases may result in cascading failures that are hard to
        debug. Accordingly, this fixture attempts to end any active runs it encounters and
        throws an exception (which reported as an additional error in the pytest execution output).
        """
        try:
            yield
>           assert (
                not mlflow.active_run()
            ), "test case unexpectedly leaked a run. Run info: {}. Run data: {}".format(
                mlflow.active_run().info, mlflow.active_run().data
            )
E           AssertionError: test case unexpectedly leaked a run. Run info: <RunInfo: artifact_uri='./mlruns/0/2a9d070e8c4b4d4b8e27cf0754d2190e/artifacts', end_time=None, experiment_id='0', lifecycle_stage='active', run_id='2a9d070e8c4b4d4b8e27cf0754d2190e', run_uuid='2a9d070e8c4b4d4b8e27cf0754d2190e', start_time=1619683570521, status='RUNNING', user_id='runner'>. Run data: <RunData: metrics={}, params={}, tags={'mlflow.source.name': '/usr/share/miniconda/bin/pytest',
E              'mlflow.source.type': 'LOCAL',
E              'mlflow.user': 'runner'}>
E           assert not <ActiveRun: >
E            +  where <ActiveRun: > = <function active_run at 0x7fabc1ac96a8>()
E            +    where <function active_run at 0x7fabc1ac96a8> = mlflow.active_run

How is this patch tested?

Fixed test

Release Notes

Is this a user-facing change?

No. You can skip the rest of this section.
Yes. Give a description of this change to be included in the release notes for MLflow users.

(Details in 1-2 sentences. You can just refer to another PR with a description if this PR is part of a larger change.)

What component(s), interfaces, languages, and integrations does this PR affect?

Components

Interface

area/uiux: Front-end, user experience, JavaScript, plotting
area/docker: Docker use across MLflow's components, such as MLflow Projects and MLflow Models
area/sqlalchemy: Use of SQLAlchemy in the Tracking Service or Model Registry
area/windows: Windows support

Language

language/r: R APIs and clients
language/java: Java APIs and clients
language/new: Proposals for new client languages

Integrations

integrations/azure: Azure and Azure ML integrations
integrations/sagemaker: SageMaker integrations
integrations/databricks: Databricks integrations

How should the PR be classified in the release notes? Choose one:

rn/breaking-change - The PR will be mentioned in the "Breaking Changes" section
rn/none - No description will be included. The PR will be mentioned only by the PR number in the "Small Bugfixes and Documentation Updates" section
rn/feature - A new user-facing feature worth mentioning in the release notes
rn/bug-fix - A user-facing bug fix worth mentioning in the release notes
rn/documentation - A user-facing documentation change worth mentioning in the release notes

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>

harupy · 2021-04-29T13:14:32Z

-    trainer.fit(model, dm)
-    trainer.test()
+    with mlflow.start_run() as run:
+        trainer.fit(model, dm)
+        trainer.test()


This change in the pytorch-lightning repo caused the issue:

https://github.com/PyTorchLightning/pytorch-lightning/pull/7232/files#diff-667a2513c158c2b01d73b479e4dea75587d8531f9cd286891d34570a2fd145dcR986-R987

Prior to this change, Trainer.test calls Trainer.fit (which is patched and automatically creates / terminates a run). This means when on_test_end is called, an active run exists.

Now Trainer.test no longer calls Trainer.fit. This means when on_test_end is called, no active run exists and a new run will be created by mlflow.set_tag and leaked.

def on_test_end(self, trainer, pl_module): """ Logs accuracy and other relevant metrics on the testing end :param trainer: pytorch lightning trainer instance :param pl_module: pytorch lightning base module """ try_mlflow_log(mlflow.set_tag, "Mode", "testing") for key, value in trainer.callback_metrics.items(): try_mlflow_log(mlflow.log_metric, key, float(value))

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com> Signed-off-by: Yiqing Wang <yiqing@wangemail.com>

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>

Run trainer.test in with mlflow.start_run

29288dc

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>

github-actions Bot added the rn/none List under Small Changes in Changelogs. label Apr 29, 2021

harupy commented Apr 29, 2021

View reviewed changes

harupy mentioned this pull request Apr 29, 2021

list experiments page token #3881

Merged

27 tasks

wamartin-aml approved these changes Apr 29, 2021

View reviewed changes

WeichenXu123 approved these changes Apr 30, 2021

View reviewed changes

harupy merged commit 780b3bf into mlflow:master Apr 30, 2021

harupy deleted the fix-pl-leaked-run branch April 30, 2021 08:32

YQ-Wang pushed a commit to YQ-Wang/mlflow that referenced this pull request May 29, 2021

Run trainer.test in with mlflow.start_run (mlflow#4302)

2897c1c

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com> Signed-off-by: Yiqing Wang <yiqing@wangemail.com>

harupy added a commit to wamartin-aml/mlflow that referenced this pull request Jun 7, 2021

Run trainer.test in with mlflow.start_run (mlflow#4302)

3761e89

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix a leaked run error in `test_pytorch_test_metrics_logged`#4302

Fix a leaked run error in `test_pytorch_test_metrics_logged`#4302
harupy merged 1 commit intomlflow:masterfrom
harupy:fix-pl-leaked-run

harupy commented Apr 29, 2021 •

edited

Loading

Uh oh!

harupy Apr 29, 2021 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

harupy commented Apr 29, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes are proposed in this pull request?

How is this patch tested?

Release Notes

Is this a user-facing change?

What component(s), interfaces, languages, and integrations does this PR affect?

How should the PR be classified in the release notes? Choose one:

Uh oh!

harupy Apr 29, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

harupy commented Apr 29, 2021 •

edited

Loading

harupy Apr 29, 2021 •

edited

Loading