Skip to content

[Metrics][Discover] internal/search/esql_async returns 200 but METRICS_INFO responds with error#260237

Merged
jorgeoliveira117 merged 11 commits intoelastic:mainfrom
jorgeoliveira117:259815-metricsdiscover-internalsearchesql_async-returns-200-but-metrics_info-responds-with-error
Apr 1, 2026
Merged

[Metrics][Discover] internal/search/esql_async returns 200 but METRICS_INFO responds with error#260237
jorgeoliveira117 merged 11 commits intoelastic:mainfrom
jorgeoliveira117:259815-metricsdiscover-internalsearchesql_async-returns-200-but-metrics_info-responds-with-error

Conversation

@jorgeoliveira117
Copy link
Copy Markdown
Contributor

@jorgeoliveira117 jorgeoliveira117 commented Mar 30, 2026

Closes #259815

Summary

For the async ES|QL query we do when we're in the Metrics Experience, there's a chance that METRICS_INFO ES|QL request returns a 200 answer but it actually fails.
To make things more clear, after getESQLResults resolves for the background METRICS_INFO ES|QL request, we now detect if there's an error from ES, throwing an error if needed.
This won't change anything in the UI yet but will enable future work to provide a better UX. Which will be a follow-up on #258493

Changes

  • Add getErrorMessageFromEsqlResponse to build an error message from error.type / error.reason, or error.root_cause[0], with a generic fallback.
  • If a message is present, an error is thrown so executeEsqlQuery returns rejected promise.

@jorgeoliveira117 jorgeoliveira117 added release_note:skip Skip the PR/issue when compiling release notes backport:skip This PR does not require backporting Team:obs-exploration Observability Exploration team v9.4.0 labels Mar 30, 2026
…ync-returns-200-but-metrics_info-responds-with-error
@jorgeoliveira117 jorgeoliveira117 marked this pull request as ready for review March 30, 2026 12:04
@jorgeoliveira117 jorgeoliveira117 requested a review from a team as a code owner March 30, 2026 12:04
@elasticmachine
Copy link
Copy Markdown
Contributor

Pinging @elastic/obs-exploration-team (Team:obs-exploration)

@jorgeoliveira117 jorgeoliveira117 changed the title Add simple error handling for ESQL query responses on metrics info call [Metrics][Discover] internal/search/esql_async returns 200 but METRICS_INFO responds with error Mar 30, 2026
Copy link
Copy Markdown
Contributor

@justinkambic justinkambic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking good. One main thing I'd ask for would be some unit tests to cover what we're adding here with getErrorMessageFromEsqlResponse.

  • error w/type or reason
  • error with only root_cause
  • empty error
  • successful happy path run

I am OK if we just export the function for the sake of simplified testing, but you could also add the tests to the existing test file for this module.

}
const rc = e.root_cause?.[0];
const fromRoot = [rc?.type, rc?.reason].filter((x): x is string => Boolean(x?.trim())).join(': ');
return fromRoot || 'Elasticsearch returned an error';
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing I'd like to call out here. If Elasticsearch returns an object missing all the properties that we care about, is this a case where we want to throw? If ES for some reason sends:

{
  error: { }
}

Do we care about throwing an error for that? Would that happen if we didn't? Just telling the user "ES threw an unknown error" is not going to be super helpful for them.

@kpatticha since you're more aware of the actual situation we are addressing here, do you have a thought on a case where the error has none of the keys we're filtering for here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When it comes to the "why", even though it might be rare for ES to thrown an empty error, it can happen. We got a similar behaviour here, where we fallback to a default message

const message = causeReason ? causeReason : i18n.translate('searchErrors.esError.unknownRootCause', { defaultMessage: 'unknown' });

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since you're more aware of the actual situation we are addressing here, do you have a thought on a case where the error has none of the keys we're filtering for here?

Good call out @justinkambic . I’m not aware of a specific case, but I think we should align with the pattern used in Discover here.

Copy link
Copy Markdown
Contributor

@lucaslopezf lucaslopezf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great Jorge! A few comments!

uiSettings: IUiSettingsClient;
}

function getErrorMessageFromEsqlResponse(response: object): string | undefined {
Copy link
Copy Markdown
Contributor

@lucaslopezf lucaslopezf Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking this Jorge!

I didn't test it, but a few early suggestions

  • Try to reuse type, taking a quickly look maybe we can reuse ErrorCause from @elastic/elasticsearch instead of the inline cast { type?; reason?; root_cause? } .
  • Returning string | undefined is ok but maybe we want to make some logic in the future and it will depends of the reason, type, etc. Instead of manage with strings what if we create a Class with properties?
  • Instead of checking the response inline after getESQLResults, a small wrapper could wrap the API call and guarantee the response is clean so executeEsqlQuery stays linear.
  • Add tests to cover new functions
  • One concern: since we're throwing an error, we should make sure every consumer of executeEsqlQuery handles it properly. Today we're safe because useAsyncFn in useFetchMetricsData catches everything. But if a future consumer calls executeEsqlQuery without a try/catch or useAsyncFn, will we break the experience?

I put together one possible approach to illustrate the idea, just a starting point:

// esql_response_error.ts

export class EsqlResponseError extends Error {
  public readonly type?: string;
  public readonly reason?: string;
  public readonly rootCause?: ErrorCause[];

  constructor(errorCause: ErrorCause) {
    super(formatErrorCause(errorCause));
    this.name = 'EsqlResponseError';
    this.type = errorCause.type;
    this.reason = errorCause.reason ?? undefined;
    this.rootCause = errorCause.root_cause;
  }
}

// execute_esql_query.ts

const fetchEsqlResponseOrThrow = async (
  params: Parameters<typeof getESQLResults>[0]
): Promise<ESQLSearchResponse> => {
  const { response } = await getESQLResults(params);
  const errorCause = extractEsqlResponseErrorCause(response);
  if (errorCause) {
    throw new EsqlResponseError(errorCause);
  }
  return response;
};

// execute_esql_query.ts (usage in executeEsqlQuery)

const response = await fetchEsqlResponseOrThrow({
  esqlQuery, search, signal, filter, timeRange, variables,
  ...getMetricsExecutionContext(
    MetricsExecutionContextAction.FETCH,
    MetricsExecutionContextName.METRICS_INFO
  ),
});

return esqlResultToPlainObjects<TDocument>(response);

Let me know what you think!

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your comment!

When it comes to your suggestions:

  • Sounds great and I already have a small change in place
  • If we're talking a simple, small class, it could make sense even if we're most likely just going to use a general message
  • Sounds good
  • Yes, some are already in place but with these changes I'll add a few more most likely
  • That makes sense, we could change the return type to force the users to handle the response properly but, it could also be a bit more costly. Shall we create a small follow-up for this specific point?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense, we could change the return type to force the users to handle the response properly but, it could also be a bit more costly. Shall we create a small follow-up for this specific point?

It depends, what happens when we throw an error? We should now how it behaves now before taking a decision

Copy link
Copy Markdown
Contributor

@justinkambic justinkambic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM from my side, I'll let @lucaslopezf also follow-up with his feedback from before and approve for the both of us.

Copy link
Copy Markdown
Contributor

@lucaslopezf lucaslopezf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great Jorge, much better, a few comments.
Also, thinking about it, as second iteration we should manage the errors for this in a core way, same for Discover, profiles, etc, so I'd like to create a follow-up issue to review the error handling architecture and be consistent with Discover, like:

  • How errors are captured
  • Use the same error type
  • Handle it homogeneously cross-profiles

and so on. Could you create it and add a comment with the TODO and the issue related?

…-200-but-metrics_info-responds-with-error' of https://github.com/jorgeoliveira117/kibana into 259815-metricsdiscover-internalsearchesql_async-returns-200-but-metrics_info-responds-with-error
Copy link
Copy Markdown
Contributor

@lucaslopezf lucaslopezf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Just two minor things before merge it!

@elasticmachine
Copy link
Copy Markdown
Contributor

💚 Build Succeeded

Metrics [docs]

Module Count

Fewer modules leads to a faster build time

id before after diff
discover 1954 1955 +1

Async chunks

Total size of all lazy-loaded chunks that will be downloaded as the user navigates the app

id before after diff
discover 1.6MB 1.6MB +1016.0B

History

@jorgeoliveira117 jorgeoliveira117 merged commit 7e6ff7a into elastic:main Apr 1, 2026
19 checks passed
response: object
): EsqlResponseErrorCause | undefined => extractEsqlEmbeddedError(response)?.cause;

export class EsqlResponseError extends Error {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably fine for now, but we should not implement custom error messages. If such handlers are needed, they should be handled at the ESQL level rather than on our side.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally agree, we created this issue to handle things like this and more

: undefined;

const { response } = await getESQLResults({
const response = await fetchEsqlResponseOrThrow({
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A try/catch handler would likely handle the same

jorgeoliveira117 added a commit that referenced this pull request Apr 1, 2026
…S_INFO responds with error (#260746)

## Summary

This PR covers small code changes missed in #260237

---------

Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
eokoneyo pushed a commit to davismcphee/kibana that referenced this pull request Apr 2, 2026
…S_INFO responds with error (elastic#260237)

Closes elastic#259815 

## Summary

For the async ES|QL query we do when we're in the Metrics Experience,
there's a chance that `METRICS_INFO` ES|QL request returns a 200 answer
but it actually fails.
To make things more clear, after `getESQLResults` resolves for the
background `METRICS_INFO` ES|QL request, we now detect if there's an
`error` from ES, throwing an error if needed.
**This won't change anything in the UI yet** but will enable future work
to provide a better UX. Which will be a follow-up on elastic#258493

### Changes

- Add `getErrorMessageFromEsqlResponse` to build an error message from
`error.type` / `error.reason`, or `error.root_cause[0]`, with a generic
fallback.
- If a message is present, an error is thrown so `executeEsqlQuery`
returns rejected promise.

---------

Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
eokoneyo pushed a commit to davismcphee/kibana that referenced this pull request Apr 2, 2026
…S_INFO responds with error (elastic#260746)

## Summary

This PR covers small code changes missed in elastic#260237

---------

Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
paulinashakirova pushed a commit to paulinashakirova/kibana that referenced this pull request Apr 2, 2026
…S_INFO responds with error (elastic#260237)

Closes elastic#259815 

## Summary

For the async ES|QL query we do when we're in the Metrics Experience,
there's a chance that `METRICS_INFO` ES|QL request returns a 200 answer
but it actually fails.
To make things more clear, after `getESQLResults` resolves for the
background `METRICS_INFO` ES|QL request, we now detect if there's an
`error` from ES, throwing an error if needed.
**This won't change anything in the UI yet** but will enable future work
to provide a better UX. Which will be a follow-up on elastic#258493

### Changes

- Add `getErrorMessageFromEsqlResponse` to build an error message from
`error.type` / `error.reason`, or `error.root_cause[0]`, with a generic
fallback.
- If a message is present, an error is thrown so `executeEsqlQuery`
returns rejected promise.

---------

Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
paulinashakirova pushed a commit to paulinashakirova/kibana that referenced this pull request Apr 2, 2026
…S_INFO responds with error (elastic#260746)

## Summary

This PR covers small code changes missed in elastic#260237

---------

Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport:skip This PR does not require backporting release_note:skip Skip the PR/issue when compiling release notes Team:obs-exploration Observability Exploration team v9.4.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Metrics][Discover] internal/search/esql_async returns 200 but METRICS_INFO responds with error

6 participants