Automatically update metric plots for in-progress runs #2099 by cedkoffeto · Pull Request #5017 · mlflow/mlflow

cedkoffeto · 2021-11-07T20:13:58Z

Signed-off-by: Cedric Koffeto cedkoffeto@gmail.com

What changes are proposed in this pull request?

setInterval function added in metricsPlotPanel component which retrieves execution status and metrics history from API every 8 seconds to update the plot and automatically stops when all executions are complete. closes #2099

How is this patch tested?

Unit tests
Manually verified the proposed feature works correctly using this script:

import time
import numpy as np
import mlflow
import multiprocessing as mp


def log(run_id, slope, repeat):
    sleep = 10
    with mlflow.start_run(run_id=run_id):
        for epoch in range(1, repeat + 1):
            print(epoch)
            mlflow.log_metric(key="metric1", value=slope * epoch * np.log(epoch), step=epoch)
            mlflow.log_metric(key="metric2", value=slope * (1 / epoch) * np.log(epoch), step=epoch)
            time.sleep(sleep)


client = mlflow.tracking.MlflowClient()
run_uuids = [client.create_run("0").info.run_id for _ in range(2)]
runs_param = "[" + ",".join(map(lambda s: f"%22{s}%22", run_uuids)) + "]"

print(
    "URL:",
    r"http://localhost:3000/#/metric/metric1?runs=<<< runs_param >>>&experiment=0&plot_metric_keys=[%22metric1%22]&plot_layout={%22autosize%22:true,%22xaxis%22:{},%22yaxis%22:{}}&x_axis=step&y_axis_scale=linear&line_smoothness=1&show_point=true&deselected_curves=[]&last_linear_y_axis_range=[]".replace(
        "<<< runs_param >>>", runs_param
    ),
)

args_list = [(run_uuid, idx + 1, 5 + idx * 3) for idx, run_uuid in enumerate(run_uuids)]

with mp.Pool() as pool:
    pool.starmap(log, args_list)

It can be tested with a unit test to check if the plot is actually updated

Release Notes

Is this a user-facing change?

No. You can skip the rest of this section.
Yes. Give a description of this change to be included in the release notes for MLflow users.

Automatically update metric plots for in-progress runs

What component(s), interfaces, languages, and integrations does this PR affect?

Components

Interface

area/uiux: Front-end, user experience, plotting, JavaScript, JavaScript dev server
area/docker: Docker use across MLflow's components, such as MLflow Projects and MLflow Models
area/sqlalchemy: Use of SQLAlchemy in the Tracking Service or Model Registry
area/windows: Windows support

Language

language/r: R APIs and clients
language/java: Java APIs and clients
language/new: Proposals for new client languages

Integrations

integrations/azure: Azure and Azure ML integrations
integrations/sagemaker: SageMaker integrations
integrations/databricks: Databricks integrations

How should the PR be classified in the release notes? Choose one:

rn/breaking-change - The PR will be mentioned in the "Breaking Changes" section
rn/none - No description will be included. The PR will be mentioned only by the PR number in the "Small Bugfixes and Documentation Updates" section
rn/feature - A new user-facing feature worth mentioning in the release notes
rn/bug-fix - A user-facing bug fix worth mentioning in the release notes
rn/documentation - A user-facing documentation change worth mentioning in the release notes

github-actions · 2021-11-07T20:14:15Z

@cedkoffeto Thanks for the contribution! The DCO check failed. Please sign off your commits by following the instructions here: https://github.com/mlflow/mlflow/runs/4132478673. See https://github.com/mlflow/mlflow/blob/master/CONTRIBUTING.rst#sign-your-work for more details.

Signed-off-by: Cedric Koffeto cedkoffeto@gmail.com Signed-off-by: cedric koffeto <cedkoffeto@gmail.com>

Signed-off-by: cedric koffeto <cedkoffeto@gmail.com>

harupy · 2021-11-08T00:11:08Z

Hi @cedkoffeto, thanks for the PR! I'll review it soon :)

cedkoffeto · 2021-11-08T00:19:51Z

Hi @cedkoffeto, thanks for the PR! I'll review it soon :)

@harupy Glad to be able to help 😉

harupy · 2021-11-08T00:44:58Z

@cedkoffeto btw could you take a screen record of how the metric plot automatically gets updated?

Signed-off-by: cedric koffeto <cedkoffeto@gmail.com>

cedkoffeto · 2021-11-08T02:23:21Z

@cedkoffeto btw could you take a screen record of how the metric plot automatically gets updated?

Here it is @harupy

Enregistrement.de.l.ecran.2021-11-08.a.03.09.24.mp4

harupy · 2021-11-08T05:49:24Z

Thanks for the screen recording. It looks like the entire plot gets rendered.

I'm investigating how we can only update the lines like this:

only-update-lines.mov

harupy · 2021-11-08T09:05:21Z

Here's my attempt: https://github.com/harupy/mlflow/tree/5017-harupy. The implementation is almost the same as yours. In my implementation, I don't call this.setState((prevState) => ({historyRequestIds: [...]})). I'm investigating whether this is required or not.

          this.setState((prevState) => ({
            historyRequestIds: [...prevState.historyRequestIds, ...requestIds],
          }));

The message showing up at the top is just for demo purposes.

auto-plot-update.mov

Python script I used:

import time
import numpy as np
import mlflow


with mlflow.start_run() as run:
    print(
        "URL:",
        r"http://localhost:3000/#/metric/metric1?runs=[%22<<< RUN_ID >>>%22]&experiment=0&plot_metric_keys=[%22metric1%22]&plot_layout={%22autosize%22:true,%22xaxis%22:{},%22yaxis%22:{}}&x_axis=relative&y_axis_scale=linear&line_smoothness=1&show_point=true&deselected_curves=[]&last_linear_y_axis_range=[]".replace(
            "<<< RUN_ID >>>", run.info.run_id
        ),
    )
    for epoch in range(1, 10):
        print(epoch)
        mlflow.log_metric(key="metric1", value=epoch * np.log(epoch), step=epoch)
        mlflow.log_metric(key="metric2", value=(1 / epoch) * np.log(epoch), step=epoch)
        time.sleep(3)

dbczumar · 2021-11-08T19:20:47Z

@cedkoffeto @harupy Awesome stuff! Does the proposal from https://github.com/harupy/mlflow/tree/5017-harupy also preserve plot customizations and zoom?

cedkoffeto · 2021-11-08T20:06:04Z

this.setState((prevState) => ({historyRequestIds: [...]}))

Hi @harupy,
In fact, I also think that saving requests seems unnecessary in our case.

cedkoffeto · 2021-11-08T20:12:31Z

@cedkoffeto @harupy Awesome stuff! Does the proposal from https://github.com/harupy/mlflow/tree/5017-harupy also preserve plot customizations and zoom?

Thanks @dbczumar
I think it does preserve plot customizations and zoom but let @harupy confirm.

harupy · 2021-11-09T00:31:17Z

@dbczumar Yep, it does. Here's a quick demo.

auto-plot-update-customization.mov

Signed-off-by: harupy <hkawamura0130@gmail.com>

cedkoffeto · 2021-11-19T17:59:59Z

Hi @harupy, any update?

harupy · 2021-11-26T08:32:17Z

Hi @cedkoffeto, sorry for the late reply. We internally discussed this feature. Here's our latest prototype:

automatic-metric-plot-update.mov

code: https://github.com/harupy/mlflow/pull/28/files

cedkoffeto · 2021-11-26T21:09:09Z

Hi @cedkoffeto, sorry for the late reply. We internally discussed this feature. Here's our latest prototype:

automatic-metric-plot-update.mov
code: https://github.com/harupy/mlflow/pull/28/files

Hi @harupy, that's great!

Signed-off-by: harupy <hkawamura0130@gmail.com>

harupy · 2021-11-29T06:55:39Z

@cedkoffeto I pushed some commits to update the PR.

Signed-off-by: harupy <hkawamura0130@gmail.com>

dbczumar · 2021-11-29T21:42:18Z

+// Stop polling when the polling duration exceeds this value
+export const METRICS_PLOT_POLLING_DURATION_MS = 3600 * 1000; // 1 hour


I think we should only stop polling when there's no new data.

@dbczumar I remember we discussed that we should set an appropriate polling threshold when a run never ends, but not setting such a threshold sounds ok to me because runs end in most cases.

Offline discussion: check the timestamp of the last metric, and if it's more than 1 week, then we won't refresh.

dbczumar · 2021-11-29T21:43:15Z

 export const CHART_TYPE_BAR = 'bar';

+// Polling interval
+export const METRICS_PLOT_POLLING_INTERVAL_MS = 5000;


Can we increase this to 10 seconds? 5 seems aggressive.

In general, what happens if the refresh fails? Does the page crash?

Can we increase this to 10 seconds? 5 seems aggressive.

Sure!

In general, what happens if the refresh fails? Does the page crash?

Let me test.

when-request-fails.mov

The page doesn't crash.

The page keeps polling.

cedkoffeto · 2021-11-29T22:52:14Z

@cedkoffeto I pushed some commits to update the PR.
👍🏽

Signed-off-by: harupy <hkawamura0130@gmail.com>

harupy · 2021-11-30T11:56:13Z

+export const METRICS_PLOT_POLLING_INTERVAL_MS = 10 * 1000; // 10 seconds
+// A run is considered as 'hanging' if its status is 'RUNNING' but its latest metric was logged
+// prior to this threshold. The metrics plot doesn't automatically update hanging runs.
+export const METRICS_PLOT_HANGING_RUN_THRESHOLD_MS = 3600 * 24 * 7 * 1000; // 1 week


Does "hanging" make sense?

harupy · 2021-11-30T11:56:46Z

    "description": "Text for registered model link in the title for model comparison page"
  },
+  "UEDu0c": {
+    "defaultMessage": "MLflow UI automatically fetches metric histories for active runs and updates the metrics plot with a {interval} second interval.",


Included interval so a user doesn't need to guess or measure how long the interval is.

harupy · 2021-12-02T00:46:37Z

-            { key: 'metric_1', value: 100, step: 2, timestamp: 1556662044000 },
-            { key: 'metric_1', value: 50, step: 1, timestamp: 1556662043000 },
+            { key: 'metric_1', value: 100, step: 2, timestamp: now },
+            { key: 'metric_1', value: 50, step: 1, timestamp: now - 1 },


Replaced hardcoded timestamps with now to prevent the metrics plot from considering these runs as hanging.

dbczumar

LGTM! Awesome work, @cedkoffeto, @harupy ! Thank you so much for this contribution, @cedkoffeto!

Signed-off-by: harupy <hkawamura0130@gmail.com>

cedkoffeto · 2021-12-03T16:02:00Z

much

Thanks! It was a pleasure :)
Thanks also to @harupy for your great help 🙏🏽

github-actions Bot added area/uiux Front-end, user experience, plotting, JavaScript, JavaScript dev server rn/feature Mention under Features in Changelogs. labels Nov 7, 2021

cedkoffeto added 3 commits November 7, 2021 21:27

Automatically update metric plots for in-progress runs mlflow#2099

4552061

Signed-off-by: Cedric Koffeto cedkoffeto@gmail.com Signed-off-by: cedric koffeto <cedkoffeto@gmail.com>

eslint corrections

42bb25e

Signed-off-by: cedric koffeto <cedkoffeto@gmail.com>

eslint

8af2942

Signed-off-by: cedric koffeto <cedkoffeto@gmail.com>

cedkoffeto force-pushed the master branch from 662bb4a to 8af2942 Compare November 7, 2021 20:28

harupy self-requested a review November 8, 2021 00:10

bug fix

18b9fac

Signed-off-by: cedric koffeto <cedkoffeto@gmail.com>

This was referenced Nov 10, 2021

Use ... when comparing diff between master and PR branch for cross version tests #5040

Closed

Obtain changed files using GitHub /pulls/{ pr_number }/files API in cross version tests #5041

Merged

harupy reviewed Nov 10, 2021

View reviewed changes

Comment thread mlflow/server/js/src/experiment-tracking/components/MetricsPlotPanel.js Outdated

commit

e4a22cb

Signed-off-by: harupy <hkawamura0130@gmail.com>

harupy reviewed Nov 10, 2021

View reviewed changes

Comment thread mlflow/server/js/src/experiment-tracking/components/MetricsPlotPanel.js Outdated

harupy reviewed Nov 11, 2021

View reviewed changes

Comment thread mlflow/server/js/src/experiment-tracking/components/MetricsPlotPanel.js Outdated

checkOnRunUnfinished() replaced

a91c53c

Merge branch 'master' into pr/cedkoffeto/5017

dbc980a

Signed-off-by: harupy <hkawamura0130@gmail.com>

harupy added 3 commits November 29, 2021 11:49

cherry pick

7777420

Signed-off-by: harupy <hkawamura0130@gmail.com>

add tests

70408f2

Signed-off-by: harupy <hkawamura0130@gmail.com>

i18n

164eb69

Signed-off-by: harupy <hkawamura0130@gmail.com>

harupy added 3 commits November 29, 2021 15:57

check state

90c976b

Signed-off-by: harupy <hkawamura0130@gmail.com>

refactor

df7ad32

Signed-off-by: harupy <hkawamura0130@gmail.com>

add rendering test

c0895fe

Signed-off-by: harupy <hkawamura0130@gmail.com>

dbczumar reviewed Nov 29, 2021

View reviewed changes

harupy added 6 commits November 30, 2021 09:51

increase polling duration

c41d94f

Signed-off-by: harupy <hkawamura0130@gmail.com>

ignore hanging runs

712f9a7

Signed-off-by: harupy <hkawamura0130@gmail.com>

show interval in tooltip

037cefb

Signed-off-by: harupy <hkawamura0130@gmail.com>

rename test

2bb4b62

Signed-off-by: harupy <hkawamura0130@gmail.com>

i18n

a5ed069

Signed-off-by: harupy <hkawamura0130@gmail.com>

lint

ebad40f

Signed-off-by: harupy <hkawamura0130@gmail.com>

harupy reviewed Nov 30, 2021

View reviewed changes

harupy reviewed Dec 2, 2021

View reviewed changes

dbczumar approved these changes Dec 2, 2021

View reviewed changes

harupy added 2 commits December 2, 2021 17:29

refactor

643e2ce

Signed-off-by: harupy <hkawamura0130@gmail.com>

fix flaky test

f246b63

Signed-off-by: harupy <hkawamura0130@gmail.com>

harupy merged commit 1b4bbb6 into mlflow:master Dec 2, 2021

sim-san mentioned this pull request Jan 19, 2022

[FR] Automatic reload #1849

Closed

kbumsik mentioned this pull request May 8, 2024

[FR] Auto-refresh metrics #11936

Open

22 tasks

		// Stop polling when the polling duration exceeds this value
		export const METRICS_PLOT_POLLING_DURATION_MS = 3600 * 1000; // 1 hour

Conversation

cedkoffeto commented Nov 7, 2021 • edited by harupy Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes are proposed in this pull request?

How is this patch tested?

Release Notes

Is this a user-facing change?

What component(s), interfaces, languages, and integrations does this PR affect?

How should the PR be classified in the release notes? Choose one:

Uh oh!

github-actions Bot commented Nov 7, 2021

Uh oh!

harupy commented Nov 8, 2021

Uh oh!

cedkoffeto commented Nov 8, 2021

Uh oh!

harupy commented Nov 8, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cedkoffeto commented Nov 8, 2021

Uh oh!

harupy commented Nov 8, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

harupy commented Nov 8, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Python script I used:

Uh oh!

dbczumar commented Nov 8, 2021

Uh oh!

cedkoffeto commented Nov 8, 2021

Uh oh!

cedkoffeto commented Nov 8, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

harupy commented Nov 9, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cedkoffeto commented Nov 19, 2021

Uh oh!

harupy commented Nov 26, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cedkoffeto commented Nov 26, 2021

Uh oh!

harupy commented Nov 29, 2021

Uh oh!

Choose a reason for hiding this comment

Uh oh!

harupy Nov 30, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dbczumar Nov 29, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

harupy Nov 30, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cedkoffeto commented Nov 29, 2021

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cedkoffeto commented Nov 7, 2021 •

edited by harupy

Loading

harupy commented Nov 8, 2021 •

edited

Loading

harupy commented Nov 8, 2021 •

edited

Loading

harupy commented Nov 8, 2021 •

edited

Loading

cedkoffeto commented Nov 8, 2021 •

edited

Loading

harupy commented Nov 9, 2021 •

edited

Loading

harupy commented Nov 26, 2021 •

edited

Loading

harupy Nov 30, 2021 •

edited

Loading

dbczumar Nov 29, 2021 •

edited

Loading

harupy Nov 30, 2021 •

edited

Loading

dbczumar left a comment •

edited

Loading