monitoror icon indicating copy to clipboard operation
monitoror copied to clipboard

Flakiness in Jenkins tiles

Open cachedout opened this issue 4 years ago • 3 comments

Hello!

First, nice work on this project. It's extremely well-designed and very easy to work with.

I'm experiencing a bit of a strange problem. I have a number of tiles which are organized into groups where each group contains between 5-20 Jenkins jobs. Monitoror starts up fine, but after a minute or two, I start seeing jobs flap between failure and success, despite there being no change in the job itself on the Jenkins side. (I have verified this repeatedly just to be sure.)

I have experimented with the core cache values but to no avail. I'm still seeing certain jobs flap between success and failure.

I'm certainly willing to believe that Monitoror is doing the right thing here and Jenkins is failing to send the correct API response, but my question is -- how can I tell?

Is there debug logging in Monitoror which can be enabled to watch responses as they are returned? If not, would you consider adding a flag to enable it?

My second question is about the caching options. Do they control the randomization splay for requests or just the rate at which those requests are made upstream? If it's the latter, is there any way to increase the amount of spay in between upstream requests?

Thanks very much in advance.

cachedout avatar Jul 13 '21 13:07 cachedout

I should point out that I believe I'm seeing multiple tiles update at once, which leads me to believe that perhaps there may be some places where additional splay and randomization may need to be added. I'll keep an eye on this and see if I can confirm this behavior.

cachedout avatar Jul 13 '21 14:07 cachedout

From watching the requests from the browser to the app, here's what comes back when a job suddenly becomes flakey. (Sensitive information snipped out):

{"type":"JENKINS-BUILD","status":"FAILURE","label":"<snipped>","message":"unable to find job","build":{"branch":"master"}}

cachedout avatar Jul 13 '21 14:07 cachedout

We've (possibly) tracked this down to nginx in front of the Jenkins instance rate-limiting Monitoror. However, I still believe this may be a bug as I believe we shouldn't be seeing Monitoror send bursts of requests quite so aggressively. :-/

cachedout avatar Jul 13 '21 14:07 cachedout