Automatically update metric plots for in-progress runs #2099#5017
Conversation
|
@cedkoffeto Thanks for the contribution! The DCO check failed. Please sign off your commits by following the instructions here: https://github.com/mlflow/mlflow/runs/4132478673. See https://github.com/mlflow/mlflow/blob/master/CONTRIBUTING.rst#sign-your-work for more details. |
Signed-off-by: Cedric Koffeto cedkoffeto@gmail.com Signed-off-by: cedric koffeto <cedkoffeto@gmail.com>
Signed-off-by: cedric koffeto <cedkoffeto@gmail.com>
|
Hi @cedkoffeto, thanks for the PR! I'll review it soon :) |
@harupy Glad to be able to help 😉 |
|
@cedkoffeto btw could you take a screen record of how the metric plot automatically gets updated? |
Here it is @harupy Enregistrement.de.l.ecran.2021-11-08.a.03.09.24.mp4 |
|
Thanks for the screen recording. It looks like the entire plot gets rendered. I'm investigating how we can only update the lines like this: only-update-lines.mov |
|
Here's my attempt: https://github.com/harupy/mlflow/tree/5017-harupy. The implementation is almost the same as yours. In my implementation, I don't call this.setState((prevState) => ({
historyRequestIds: [...prevState.historyRequestIds, ...requestIds],
}));The message showing up at the top is just for demo purposes. auto-plot-update.movPython script I used:import time
import numpy as np
import mlflow
with mlflow.start_run() as run:
print(
"URL:",
r"http://localhost:3000/#/metric/metric1?runs=[%22<<< RUN_ID >>>%22]&experiment=0&plot_metric_keys=[%22metric1%22]&plot_layout={%22autosize%22:true,%22xaxis%22:{},%22yaxis%22:{}}&x_axis=relative&y_axis_scale=linear&line_smoothness=1&show_point=true&deselected_curves=[]&last_linear_y_axis_range=[]".replace(
"<<< RUN_ID >>>", run.info.run_id
),
)
for epoch in range(1, 10):
print(epoch)
mlflow.log_metric(key="metric1", value=epoch * np.log(epoch), step=epoch)
mlflow.log_metric(key="metric2", value=(1 / epoch) * np.log(epoch), step=epoch)
time.sleep(3) |
|
@cedkoffeto @harupy Awesome stuff! Does the proposal from https://github.com/harupy/mlflow/tree/5017-harupy also preserve plot customizations and zoom? |
Hi @harupy, |
Thanks @dbczumar |
|
@dbczumar Yep, it does. Here's a quick demo. auto-plot-update-customization.mov |
|
Hi @harupy, any update? |
|
Hi @cedkoffeto, sorry for the late reply. We internally discussed this feature. Here's our latest prototype: automatic-metric-plot-update.mov |
Hi @harupy, that's great! |
Signed-off-by: harupy <hkawamura0130@gmail.com>
|
@cedkoffeto I pushed some commits to update the PR. |
Signed-off-by: harupy <hkawamura0130@gmail.com>
Signed-off-by: harupy <hkawamura0130@gmail.com>
| // Stop polling when the polling duration exceeds this value | ||
| export const METRICS_PLOT_POLLING_DURATION_MS = 3600 * 1000; // 1 hour |
There was a problem hiding this comment.
I think we should only stop polling when there's no new data.
There was a problem hiding this comment.
@dbczumar I remember we discussed that we should set an appropriate polling threshold when a run never ends, but not setting such a threshold sounds ok to me because runs end in most cases.
There was a problem hiding this comment.
Offline discussion: check the timestamp of the last metric, and if it's more than 1 week, then we won't refresh.
| export const CHART_TYPE_BAR = 'bar'; | ||
|
|
||
| // Polling interval | ||
| export const METRICS_PLOT_POLLING_INTERVAL_MS = 5000; |
There was a problem hiding this comment.
Can we increase this to 10 seconds? 5 seems aggressive.
There was a problem hiding this comment.
In general, what happens if the refresh fails? Does the page crash?
There was a problem hiding this comment.
Can we increase this to 10 seconds? 5 seems aggressive.
Sure!
There was a problem hiding this comment.
In general, what happens if the refresh fails? Does the page crash?
Let me test.
There was a problem hiding this comment.
when-request-fails.mov
- The page doesn't crash.
- The page keeps polling.
|
Signed-off-by: harupy <hkawamura0130@gmail.com>
Signed-off-by: harupy <hkawamura0130@gmail.com>
Signed-off-by: harupy <hkawamura0130@gmail.com>
Signed-off-by: harupy <hkawamura0130@gmail.com>
| export const METRICS_PLOT_POLLING_INTERVAL_MS = 10 * 1000; // 10 seconds | ||
| // A run is considered as 'hanging' if its status is 'RUNNING' but its latest metric was logged | ||
| // prior to this threshold. The metrics plot doesn't automatically update hanging runs. | ||
| export const METRICS_PLOT_HANGING_RUN_THRESHOLD_MS = 3600 * 24 * 7 * 1000; // 1 week |
| "description": "Text for registered model link in the title for model comparison page" | ||
| }, | ||
| "UEDu0c": { | ||
| "defaultMessage": "MLflow UI automatically fetches metric histories for active runs and updates the metrics plot with a {interval} second interval.", |
There was a problem hiding this comment.
Included interval so a user doesn't need to guess or measure how long the interval is.
| { key: 'metric_1', value: 100, step: 2, timestamp: 1556662044000 }, | ||
| { key: 'metric_1', value: 50, step: 1, timestamp: 1556662043000 }, | ||
| { key: 'metric_1', value: 100, step: 2, timestamp: now }, | ||
| { key: 'metric_1', value: 50, step: 1, timestamp: now - 1 }, |
There was a problem hiding this comment.
Replaced hardcoded timestamps with now to prevent the metrics plot from considering these runs as hanging.
There was a problem hiding this comment.
LGTM! Awesome work, @cedkoffeto, @harupy ! Thank you so much for this contribution, @cedkoffeto!
Signed-off-by: harupy <hkawamura0130@gmail.com>
Thanks! It was a pleasure :) |
Signed-off-by: Cedric Koffeto cedkoffeto@gmail.com
What changes are proposed in this pull request?
setInterval function added in metricsPlotPanel component which retrieves execution status and metrics history from API every 8 seconds to update the plot and automatically stops when all executions are complete. closes #2099
How is this patch tested?
It can be tested with a unit test to check if the plot is actually updated
Release Notes
Is this a user-facing change?
Automatically update metric plots for in-progress runs
What component(s), interfaces, languages, and integrations does this PR affect?
Components
area/artifacts: Artifact stores and artifact loggingarea/build: Build and test infrastructure for MLflowarea/docs: MLflow documentation pagesarea/examples: Example codearea/model-registry: Model Registry service, APIs, and the fluent client calls for Model Registryarea/models: MLmodel format, model serialization/deserialization, flavorsarea/projects: MLproject format, project running backendsarea/scoring: MLflow Model server, model deployment tools, Spark UDFsarea/server-infra: MLflow Tracking server backendarea/tracking: Tracking Service, tracking client APIs, autologgingInterface
area/uiux: Front-end, user experience, plotting, JavaScript, JavaScript dev serverarea/docker: Docker use across MLflow's components, such as MLflow Projects and MLflow Modelsarea/sqlalchemy: Use of SQLAlchemy in the Tracking Service or Model Registryarea/windows: Windows supportLanguage
language/r: R APIs and clientslanguage/java: Java APIs and clientslanguage/new: Proposals for new client languagesIntegrations
integrations/azure: Azure and Azure ML integrationsintegrations/sagemaker: SageMaker integrationsintegrations/databricks: Databricks integrationsHow should the PR be classified in the release notes? Choose one:
rn/breaking-change- The PR will be mentioned in the "Breaking Changes" sectionrn/none- No description will be included. The PR will be mentioned only by the PR number in the "Small Bugfixes and Documentation Updates" sectionrn/feature- A new user-facing feature worth mentioning in the release notesrn/bug-fix- A user-facing bug fix worth mentioning in the release notesrn/documentation- A user-facing documentation change worth mentioning in the release notes