Issue 25323: Fixed error catch and route handling by igoristic · Pull Request #43338 · elastic/kibana

igoristic · 2019-08-15T10:39:41Z

EDIT: This is an old iteration, please checkout the new PR at: #44800

Resolves #25323

In some cases we weren't catching monitoringClusters failures which resulted in a blank error screen:

The default route was set to /no-data which didn't have any loading state, so I rerouted all the entry points to /loading.

I also removed some on route resolve triggers, since their controllers were already doing that same loading/logic

Steps to cause monitoringClusters() to timeout (on master):

Start with a fresh ES/Kibana stack from master
Enable Monitoring in Stack Monitoring
Start a 2nd/independent ES instance (without security) eg:

bin/elasticsearch -E xpack.license.self_generated.type=trial -E xpack.monitoring.collection.enabled=true -E xpack.security.enabled=false

Go to Dev Tools and add your second cluster and add the 2nd cluster (https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-remote-clusters.html)) something like:

PUT _cluster/settings
{
  "persistent": {
    "cluster": {
      "remote": {
        "cluster_two": {
          "seeds": [
            "127.0.0.1:9201"
          ],
          "skip_unavailable": true
        }
      }
    }
  }
}

Go to Stack Monitoring and notice that Kibana will go to /no-data for about 30 seconds with a completely blank screen. After that you'll see a toast error that'll go away, but the blank screen will still be there

The fixed code will go to /loading page instead (so now there is a loading indicator) and then it will go to /no-data (w/ the relevant error details) after the timeout

TODO:

Added cluster.remoteInfo pre-check
Removed AppState and added a temp state to kbnUrl
Fix unit tests
Fix unhandled rejected promises

elasticmachine · 2019-08-15T10:39:43Z

Pinging @elastic/stack-monitoring

elasticmachine · 2019-08-15T11:47:25Z

💔 Build Failed

continuous-integration/kibana-ci/pull-request

cachedout · 2019-08-19T11:49:32Z

After reading the description of this PR, I'm not sure I understand what the new behavior will be so it's hard to test this. Could you please update the description to further explain the change?

elasticmachine · 2019-08-19T12:52:14Z

💔 Build Failed

continuous-integration/kibana-ci/pull-request

igoristic · 2019-08-19T15:11:26Z

@cachedout I added a step by step instructions, and clarified the current behavior vs the fixed/expected behavior. Hope it helps. Let me know if you have any questions, I'm happy to walk you through it

chrisronline

The problem with using AppState is we need to ensure we are removing it properly.

If I bring down the remote cluster, I see the error page (which is great!), but then once I bring that same remote cluster back up, the error is stuck in app state:

We need to clear that out

chrisronline · 2019-08-19T20:36:35Z

x-pack/legacy/plugins/monitoring/public/views/loading/index.js


        initSetupModeState($scope, $injector);

-        const setupMode = getSetupModeState();


Why was this moved?

'/elasticsearch/nodes' route will do its own monitoringClusters() call through routeInitProvider and if that fails it will be stuck at a blank screen again. There is no point of sending it to the nodes route if the clusters call fails.

We are calling monitoringClusters() twice this way, but at least this way it has less chances of failing (after it has just succeeded). I figured this is fine for now, since this will go away once the setup becomes obsolete

chrisronline · 2019-08-19T20:39:08Z

x-pack/legacy/plugins/monitoring/public/views/no_data/index.js

 uiRoutes
  .when('/no-data', {
    template,
-    resolve: {


What's the thinking behind this page? I'm guessing it's to avoid doing the check that we might fail, but we still should be doing this check so we can do the redirect, if necessary.

The redirect logic will still happen here: https://github.com/elastic/kibana/pull/43338/files#diff-87ec11d43f1e502792408786cf726eb6R96

We should avoid doing any kinds of requests in resolve, since it provides no template in which we can indicate a loading state to the user.

igoristic · 2019-08-20T19:16:35Z

The problem with using AppState is we need to ensure we are removing it properly.

@chrisronline I agree 💯

Unfortunately there is no way to clear it without refreshing the state again. This is why I wanted to create something that will not be coupled with URL state.

I was actually thinking of extending kbnUrl.changePath to be able to accept state that would be destroyed on the next hash/url change. Or, maybe we can do something similar with AppState

chrisronline · 2019-08-20T20:37:11Z

So I was rereading the original issue and I'm thinking we might be able to simplify and improve this fix.

Instead of needing to incur even the first 30s of timeout, I think we might be able to use the cluster.remoteInfo api to do a sort of "pre check" in the server side code and then throw an appropriate error (which should result in a toast showing). This feels more closely aligned to some other checks we're doing on the server side and the amount of code is much less, plus we can avoid the first timeout all together!

Do you mind opening a new PR and trying to get this approach working? I think it'll be nice to compare both solutions

cachedout · 2019-08-26T08:00:09Z

@igoristic Could you please comment on the plan for this? Are you going to open a new PR as @chrisronline as requested above or are you waiting on a review from me on this?

igoristic · 2019-08-26T20:50:06Z

@cachedout Sorry should have mentioned. Yes, I'm implementing cluster.remoteInfo pre-call to this PR.

It's working pretty well, just need to check a couple more cases. Should have it up soon

elasticmachine · 2019-08-27T22:45:14Z

💔 Build Failed

continuous-integration/kibana-ci/pull-request

chrisronline · 2019-08-28T14:20:08Z

This seems like a lot of code to solve this problem. I'm wondering if we can do something even simpler.

What if add a verify check like we do for auth and if there is a remote cluster that is not found, we throw an error like the auth logic does. Any errors thrown in the server code will result in a toast on the client side. We might be able to get away with only changing a couple files if this approach does work.

I think you made some efforts to remove some data fetching in resolve blocks, but we should probably separate that out into a separate PR (but I do agree we should be changing that).

WDYT?

igoristic · 2019-08-28T16:39:33Z

@chrisronline This might seems like a trivial issue with a "quick fix", but it's a little more complex and tightly coupled with the majority of our app.

I like your suggestion, but I don't think it'll yield less code (or a better quality fix), just a chain less in clusters.js promise: https://github.com/elastic/kibana/pull/43338/files#diff-7bd0dcc7ef2ad499c1f11c39f0d092eb

The majority of the problem also came from route responses not catching any errors (thus the blank white screen), so it might be inefficient to add .catch to all the route responses in this PR, and then completely remove the response requests in a different PR.

I might be missing your point though. Happy to zoom and discuss anytime 😄

chrisronline · 2019-08-28T17:16:52Z

@igoristic Check out this draft PR I just put up: #44297

I think this approach might also work, and might be easier to reason about since it follows a more common convention in our monitoring code base.

Maybe we can bring in @cachedout for thoughts here too

igoristic · 2019-08-28T18:03:04Z

@chrisronline Thank you I really appreciate the draft PR!

Let me poke at it a little before I make a decision

igoristic · 2019-09-04T16:59:25Z

This is an old iteration, please checkout the new PR at: #44800

elasticmachine · 2019-09-04T18:27:08Z

💔 Build Failed

continuous-integration/kibana-ci/pull-request

Fixed error catch and route handling

9d54bdf

igoristic added bug Fixes for quality problems that affect the customer experience review release_note:fix Team:Monitoring Stack Monitoring team v8.0.0 v7.4.0 labels Aug 15, 2019

igoristic requested review from chrisronline and ycombinator August 15, 2019 10:39

igoristic requested review from cachedout and removed request for ycombinator August 15, 2019 13:20

chrisronline suggested changes Aug 19, 2019

View reviewed changes

Fixed merge conflicts

3becdfa

igoristic added 3 commits August 26, 2019 16:54

Merge branch 'master' of https://github.com/elastic/kibana into 25323

5c01704

Added remote cluster check and covered more cases

0404b24

Merge branch 'master' of https://github.com/elastic/kibana into 25323

bcf4f88

igoristic mentioned this pull request Sep 4, 2019

Issue 25323: Fixed error catch and route handling v2 #44800

Merged

igoristic closed this Sep 4, 2019

chrisronline mentioned this pull request Mar 10, 2020

Handle inaccessible remote cluster more gracefully #39806

Closed


		initSetupModeState($scope, $injector);

		const setupMode = getSetupModeState();

Conversation

igoristic commented Aug 15, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

elasticmachine commented Aug 15, 2019

Uh oh!

elasticmachine commented Aug 15, 2019

💔 Build Failed

Uh oh!

cachedout commented Aug 19, 2019

Uh oh!

elasticmachine commented Aug 19, 2019

💔 Build Failed

Uh oh!

igoristic commented Aug 19, 2019

Uh oh!

chrisronline left a comment

Choose a reason for hiding this comment

Uh oh!

chrisronline Aug 19, 2019

Choose a reason for hiding this comment

Uh oh!

igoristic Aug 20, 2019

Choose a reason for hiding this comment

Uh oh!

chrisronline Aug 19, 2019

Choose a reason for hiding this comment

Uh oh!

igoristic Aug 20, 2019

Choose a reason for hiding this comment

Uh oh!

igoristic commented Aug 20, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chrisronline commented Aug 20, 2019

Uh oh!

cachedout commented Aug 26, 2019

Uh oh!

igoristic commented Aug 26, 2019

Uh oh!

elasticmachine commented Aug 27, 2019

💔 Build Failed

Uh oh!

chrisronline commented Aug 28, 2019

Uh oh!

igoristic commented Aug 28, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chrisronline commented Aug 28, 2019

Uh oh!

igoristic commented Aug 28, 2019

Uh oh!

igoristic commented Sep 4, 2019

Uh oh!

elasticmachine commented Sep 4, 2019

💔 Build Failed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

igoristic commented Aug 15, 2019 •

edited

Loading

igoristic commented Aug 20, 2019 •

edited

Loading

igoristic commented Aug 28, 2019 •

edited

Loading