[Alerting & Actions ] More debug logging#84670
Conversation
|
Pinging @elastic/kibana-alerting-services (Team:Alerting Services) |
|
@elasticmachine merge upstream |
mikecote
left a comment
There was a problem hiding this comment.
Changes LGTM! This will be useful for debugging purposes!
| const recoveredAlertInstances = pickBy( | ||
| alertInstances, | ||
| (alertInstance: AlertInstance) => !alertInstance.hasScheduledActions() | ||
| ); |
There was a problem hiding this comment.
nit: 👏 we could also leverage this logic within generateNewAndRecoveredInstanceEvents.
| const throttled = alertInstance.isThrottled(throttle); | ||
| const muted = mutedInstanceIdsSet.has(alertInstanceName); | ||
| const shouldExecuteAction = !throttled && !muted; |
There was a problem hiding this comment.
👏 awesome, much more descriptive!
| } = this.taskInstance; | ||
|
|
||
| const runDate = new Date().toISOString(); | ||
| this.logger.debug(`executing alert ${this.alertType.id}:${alertId} at ${runDate}`); |
There was a problem hiding this comment.
Was the runDate added to match the Kibana logs with the event log data timestamp when debugging?
| } | ||
|
|
||
| const actionLabel = `${actionTypeId}:${actionId}: ${name}`; | ||
| logger.debug(`executing action ${actionLabel} at ${new Date().toISOString()}`); |
There was a problem hiding this comment.
With the logger prefixing all the logs with a timestamp, is there a reason for the extra at ${new Date().toISOString()} here?
|
I was also wondering if we should log (as debug) the raw errors the connectors receive when they encounter an error. I know some of them will do some processing on the error to ensure we don't return sensitive information (ex: credentials) but maybe using debug is ok? I'm not sure.. cc @kobelb @pmuellr I'm thinking the scenario of the webhook connector. Today if you put an invalid URL, it will give a generic message and no indicator that it can't communicate with the external service (see below). It then becomes hard to debug this for a customer. Other scenarios this could happen is if the customer forgets to set proxy settings or didn't configure their network properly w/ external communications. |
As long as we're reasonably confident that the error messages won't include credentials, this seems reasonable to me. |
|
I haven't seen cases where credentials are leaked by any services, directly. But, in some cases like Slack, the secret is also the primary URL (contains a token in the URL path), so we'd just need to be sensitive in any Slack action implementation code not to print the URL out in a message - since it might not be clear from the context that it's not just a URL, but also a secret. Webhook is probably the worst case, since there might be passwords/tokens/etc in non-secret config or params - eg, if you used webhook to post slack messages, the url for the webhook will contain a slack api token. Our error messages there are also not great, because, how do we really know what to pull from the webhook response as an "error message"? Or even know it's an error in the first place if the webhook returns errors as 200 responses (it happens). It might be a nice enhancement to have a mode to run actions, where we provided more of the response info back to the user. Eg, if you're using the web ui and the "test connector" functionality. |
In this situation, are the end-users that are authorized to view the webhook connector able to view the password/tokens/etc in non-secret params? If so, we're already "leaking" this information and aren't arguably making it any worse... |
True, we aren't making that specific case worse. We could be making it worse by writing that URL into the Kibana log though. Of all the connectors, webhook is the hardest to "lock down" in terms of providing useful feedback, since it is (by definition) open ended. |
|
Yikes! Bad merge. Sorry for the notifications. |
💔 Build Failed
Failed CI StepsTest FailuresX-Pack Jest Tests.x-pack/plugins/actions/server/lib.successfully executesStandard OutStack TraceChrome X-Pack UI Functional Tests.x-pack/test/functional/apps/uptime/locations·ts.Uptime app with generated data Observer location displays less monitor availabilityStandard OutStack TraceMetrics [docs]Module Count
Async chunks
Distributable file count
Page load bundle
Saved Objects .kibana field count
Unknown metric groups@kbn/ui-shared-deps asset size
async chunk count
History
To update your PR or re-run it, just comment with: |

Resolves #49401
Summary
Added debug logging for:
muteAllmuteorthrottleVerified that error logging for alert and action execution errors log any detailed messages that bubble up