[kbn-evals] Fix APM/OTel tracing conflict and inference endpoint connector resolution#259446
[kbn-evals] Fix APM/OTel tracing conflict and inference endpoint connector resolution#259446viduni94 merged 9 commits intoelastic:mainfrom
Conversation
|
Pinging @elastic/obs-ai-team (Team:obs-ai) |
| elastic.apm.active: false | ||
| elastic.apm.contextPropagationOnly: false |
There was a problem hiding this comment.
Some additional doc references we'll probably want to update along with this one:
- This file:
- Lines ~356-358 — the "configure the HTTP exporter in kibana.dev.yml" block only shows exporters, but the note about APM flags is added nearby (good).
- Lines ~423-434 — the trace-based evaluators prerequisite section also shows exporter config without APM flags.
These should either include the flags or cross-reference the note about them.
- Evals plugin readme:
x-pack/platform/plugins/shared/evals/README.md
Missing elastic.apm.active: false and elastic.apm.contextPropagationOnly: false before the telemetry lines. Users following this README to configure their kibana.dev.yml will hit the same APM/OTel conflict.
- Agent Builder readme:
x-pack/platform/plugins/shared/agent_builder/README.md
First example on lines 21-28 and the second example at lines 38-44. Neither snippet includes the APM-disable flags. Users enabling tracing for agent_builder will hit the same conflict.
- Entity Analytics readme:
kbn-evals-suite-entity-analytics/README.md
This config is contradictory -- it sets sample_rate and exporters but disables tracing. This looks like an oversight and should probably enable tracing (in which case it also needs the APM flags). Not our config, but might as well cleanup. cc @ymao1
spong
left a comment
There was a problem hiding this comment.
Code reviewed and LGTM! 👍 Thanks for these fixes @viduni94 🙏
Left one comment about some further documentation updates if you don't mind, and also confirmed the new Scout server config fix propagates to all 11 eval suites, so that all LGTM.
| // CI sets ELASTIC_APM_ACTIVE=true globally. The APM config loader merges env vars | ||
| // after Kibana config (CLI args), so the env var would override our --elastic.apm.active=false. |
csr
left a comment
There was a problem hiding this comment.
Recent custom servers config changes LGTM
| } | ||
| } | ||
|
|
||
| // .inference connectors are backed by ES inference endpoints. |
There was a problem hiding this comment.
I think this is okay for now, but the better solution would be to adapt use to accept both. We shouldn't need to do this transformation, as the inference plugin should accept both connector Ids and inference Ids. If they don't then that's a bug we need to address on the inference plugin.
There was a problem hiding this comment.
Thanks @sphilipse
My initial thought was to add the fix here -
(But wasn't sure whether that would affect anything else as we are moving away of from stack connectors AFAIK)
The problem flow was:
resolveAndCreatePipelinechecksendpointIdCache.has('elastic-llm-claude-46-opus')→false(cache only has ES endpoint IDs)- Falls into the stack-connector path, calls
getInferenceExecutor - Inside,
getConnectorByIdresolves the stack connector to an inference endpoint with connectorId:'.anthropic-claude-4.6-opus-chat_completion'andisInferenceEndpoint: true - But
createInferenceExecutorstill callsactionsClient.execute()with the endpoint ID → "Saved object not found"
Proposed fix:
: async () => {
const executor = await getInferenceExecutor({
connectorId,
request,
actions,
esClient,
logger,
});
const connector = executor.getConnector();
// If the stack connector resolved to an inference endpoint,
// use the endpoint executor instead of the actions-based one.
if (connector.isInferenceEndpoint) {
const inferenceId = connector.connectorId;
const endpointMeta = await resolveInferenceEndpoint({ inferenceId, esClient });
const endpointExecutor = createInferenceEndpointExecutor({ inferenceId, esClient });
return {
callbackContext: {
model: endpointMeta.modelId ? { id: endpointMeta.modelId } : undefined,
},
getSpanModel: (modelName) =>
endpointMeta.provider
? { id: modelName ?? endpointMeta.modelId, provider: endpointMeta.provider }
: undefined,
chatComplete: (options) =>
inferenceEndpointAdapter.chatComplete({ ...options, executor: endpointExecutor }),
};
}
// ... rest of existing stack connector logic
Does this make sense to you? Or do you have any suggestions to resolve it in a better way?
Appreciate any thoughts 🙏🏻
There was a problem hiding this comment.
Yes, I think this makes sense! Thanks @viduni94, you caught a bug for us :)
There was a problem hiding this comment.
Thanks @sphilipse
I'll open a follow up PR with the fix.
|
/ci |
|
run docs-build |
|
/sync-ci |
💚 Build Succeeded
Metrics [docs]
History
cc @viduni94 |
…ector resolution (elastic#259446) Closes elastic#259472 ## Summary Fixes two issues breaking `kbn-evals` runs (both local and CI): ### 1. APM / OpenTelemetry tracing conflict A recent validation in `initTelemetry` (elastic#258303, elastic#258663) throws when Elastic APM and OpenTelemetry tracing are both active. The `evals_tracing` Scout config enables OTel tracing but didn't explicitly disable APM, causing Kibana (and the Playwright worker) to crash on startup. Fix: - Added a `coerceCliValue` helper in `applyConfigOverrides` (`kbn-apm-config-loader`) that converts 'true'/'false' to booleans and numeric strings to numbers before they're set in the config object. - Added `--elastic.apm.active=false` and `--elastic.apm.contextPropagationOnly=false` to the `evals_tracing` Scout server config and to `require_init_apm.js` (for the Playwright worker when `TRACING_EXPORTERS` is set). - Updated the `kbn-evals` README to document the required APM settings when configuring tracing in `kibana.dev.yml`. ### 2. Inference endpoint connector resolution elastic#258530 consolidated LLM connector listing through the inference plugin's `getConnectorList()`, which now returns inference endpoint IDs (e.g.: `.anthropic-claude-4.6-opus-chat_completion`) instead of Kibana stack connector keys (e.g.: `elastic-llm-claude-46-opus`). `kbn-evals` was still passing the stack connector key to the inference API, which then tried to execute it as a Kibana action - resulting in "Saved object `[action/.anthropic-claude-4.6-opus-chat_completion]` not found". Fix: - `createConnectorFixture` now detects `.inference-type` connectors and extracts their `inferenceId` from the config, using the ES inference endpoint ID directly. This bypasses the Kibana actions framework and aligns with the unified connector model from [elastic#258530](elastic#258530). ### Checklist - [x] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios - [x] The PR description includes the appropriate Release Notes section, and the correct `release_note:*` label is applied per the [guidelines](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process) - [x] Review the [backport guidelines](https://docs.google.com/document/d/1VyN5k91e5OVumlc0Gb9RPa3h1ewuPE705nRtioPiTvY/edit?usp=sharing) and apply applicable `backport:*` labels.
…ector resolution (elastic#259446) Closes elastic#259472 ## Summary Fixes two issues breaking `kbn-evals` runs (both local and CI): ### 1. APM / OpenTelemetry tracing conflict A recent validation in `initTelemetry` (elastic#258303, elastic#258663) throws when Elastic APM and OpenTelemetry tracing are both active. The `evals_tracing` Scout config enables OTel tracing but didn't explicitly disable APM, causing Kibana (and the Playwright worker) to crash on startup. Fix: - Added a `coerceCliValue` helper in `applyConfigOverrides` (`kbn-apm-config-loader`) that converts 'true'/'false' to booleans and numeric strings to numbers before they're set in the config object. - Added `--elastic.apm.active=false` and `--elastic.apm.contextPropagationOnly=false` to the `evals_tracing` Scout server config and to `require_init_apm.js` (for the Playwright worker when `TRACING_EXPORTERS` is set). - Updated the `kbn-evals` README to document the required APM settings when configuring tracing in `kibana.dev.yml`. ### 2. Inference endpoint connector resolution elastic#258530 consolidated LLM connector listing through the inference plugin's `getConnectorList()`, which now returns inference endpoint IDs (e.g.: `.anthropic-claude-4.6-opus-chat_completion`) instead of Kibana stack connector keys (e.g.: `elastic-llm-claude-46-opus`). `kbn-evals` was still passing the stack connector key to the inference API, which then tried to execute it as a Kibana action - resulting in "Saved object `[action/.anthropic-claude-4.6-opus-chat_completion]` not found". Fix: - `createConnectorFixture` now detects `.inference-type` connectors and extracts their `inferenceId` from the config, using the ES inference endpoint ID directly. This bypasses the Kibana actions framework and aligns with the unified connector model from [elastic#258530](elastic#258530). ### Checklist - [x] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios - [x] The PR description includes the appropriate Release Notes section, and the correct `release_note:*` label is applied per the [guidelines](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process) - [x] Review the [backport guidelines](https://docs.google.com/document/d/1VyN5k91e5OVumlc0Gb9RPa3h1ewuPE705nRtioPiTvY/edit?usp=sharing) and apply applicable `backport:*` labels.
…oints (#259656) Closes #259641 ## Summary When a stack connector ID (e.g.: a preconfigured `.inference` connector) is passed to the inference plugin's `chatComplete` API, `getConnectorById` may resolve it to an `InferenceConnector` with `isInferenceEndpoint: true`. Previously, `resolveAndCreatePipeline` only checked the `endpointIdCache` to decide whether to use the inference endpoint execution path. If the cache didn't contain the ID, it fell through to the stack connector adapter path, which then failed because there's no adapter for the .inference connector type. This caused a `Saved object [action/<inference-endpoint-id>] not found` error when callers (e.g.: `kbn-evals`) passed preconfigured `.inference` connector IDs, because the code attempted to execute the ES inference endpoint ID as a Kibana saved-object action. ### Changes - `callback_api.ts`: After `getInferenceExecutor` resolves the connector in the stack connector branch, check `connector.isInferenceEndpoint`. If true, redirect to the inference endpoint execution path (`resolveInferenceEndpoint` + `createInferenceEndpointExecutor` + `inferenceEndpointAdapter`). - `api.test.ts`: Added tests covering the "stack connector resolving to inference endpoint" path. - `create_connector_fixture.ts` (`kbn-evals`): Removed the client-side workaround that was extracting inferenceId from `.inference` connectors - no longer needed now that the inference plugin handles this server-side. - `create_connector_fixture.test.ts`: Removed corresponding workaround tests. ## Related - #258530 introduced unified connector listing via `getConnectorList()`/`getConnectorById()`, which returns inference endpoints with `isInferenceEndpoint: true` - #259446 added a temporary client-side workaround in `kbn-evals` (now removed by this PR) ### Checklist - [x] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios - [x] The PR description includes the appropriate Release Notes section, and the correct `release_note:*` label is applied per the [guidelines](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process) - [x] Review the [backport guidelines](https://docs.google.com/document/d/1VyN5k91e5OVumlc0Gb9RPa3h1ewuPE705nRtioPiTvY/edit?usp=sharing) and apply applicable `backport:*` labels.
…oints (elastic#259656) Closes elastic#259641 ## Summary When a stack connector ID (e.g.: a preconfigured `.inference` connector) is passed to the inference plugin's `chatComplete` API, `getConnectorById` may resolve it to an `InferenceConnector` with `isInferenceEndpoint: true`. Previously, `resolveAndCreatePipeline` only checked the `endpointIdCache` to decide whether to use the inference endpoint execution path. If the cache didn't contain the ID, it fell through to the stack connector adapter path, which then failed because there's no adapter for the .inference connector type. This caused a `Saved object [action/<inference-endpoint-id>] not found` error when callers (e.g.: `kbn-evals`) passed preconfigured `.inference` connector IDs, because the code attempted to execute the ES inference endpoint ID as a Kibana saved-object action. ### Changes - `callback_api.ts`: After `getInferenceExecutor` resolves the connector in the stack connector branch, check `connector.isInferenceEndpoint`. If true, redirect to the inference endpoint execution path (`resolveInferenceEndpoint` + `createInferenceEndpointExecutor` + `inferenceEndpointAdapter`). - `api.test.ts`: Added tests covering the "stack connector resolving to inference endpoint" path. - `create_connector_fixture.ts` (`kbn-evals`): Removed the client-side workaround that was extracting inferenceId from `.inference` connectors - no longer needed now that the inference plugin handles this server-side. - `create_connector_fixture.test.ts`: Removed corresponding workaround tests. ## Related - elastic#258530 introduced unified connector listing via `getConnectorList()`/`getConnectorById()`, which returns inference endpoints with `isInferenceEndpoint: true` - elastic#259446 added a temporary client-side workaround in `kbn-evals` (now removed by this PR) ### Checklist - [x] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios - [x] The PR description includes the appropriate Release Notes section, and the correct `release_note:*` label is applied per the [guidelines](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process) - [x] Review the [backport guidelines](https://docs.google.com/document/d/1VyN5k91e5OVumlc0Gb9RPa3h1ewuPE705nRtioPiTvY/edit?usp=sharing) and apply applicable `backport:*` labels.
…ector resolution (elastic#259446) Closes elastic#259472 ## Summary Fixes two issues breaking `kbn-evals` runs (both local and CI): ### 1. APM / OpenTelemetry tracing conflict A recent validation in `initTelemetry` (elastic#258303, elastic#258663) throws when Elastic APM and OpenTelemetry tracing are both active. The `evals_tracing` Scout config enables OTel tracing but didn't explicitly disable APM, causing Kibana (and the Playwright worker) to crash on startup. Fix: - Added a `coerceCliValue` helper in `applyConfigOverrides` (`kbn-apm-config-loader`) that converts 'true'/'false' to booleans and numeric strings to numbers before they're set in the config object. - Added `--elastic.apm.active=false` and `--elastic.apm.contextPropagationOnly=false` to the `evals_tracing` Scout server config and to `require_init_apm.js` (for the Playwright worker when `TRACING_EXPORTERS` is set). - Updated the `kbn-evals` README to document the required APM settings when configuring tracing in `kibana.dev.yml`. ### 2. Inference endpoint connector resolution elastic#258530 consolidated LLM connector listing through the inference plugin's `getConnectorList()`, which now returns inference endpoint IDs (e.g.: `.anthropic-claude-4.6-opus-chat_completion`) instead of Kibana stack connector keys (e.g.: `elastic-llm-claude-46-opus`). `kbn-evals` was still passing the stack connector key to the inference API, which then tried to execute it as a Kibana action - resulting in "Saved object `[action/.anthropic-claude-4.6-opus-chat_completion]` not found". Fix: - `createConnectorFixture` now detects `.inference-type` connectors and extracts their `inferenceId` from the config, using the ES inference endpoint ID directly. This bypasses the Kibana actions framework and aligns with the unified connector model from [elastic#258530](elastic#258530). ### Checklist - [x] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios - [x] The PR description includes the appropriate Release Notes section, and the correct `release_note:*` label is applied per the [guidelines](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process) - [x] Review the [backport guidelines](https://docs.google.com/document/d/1VyN5k91e5OVumlc0Gb9RPa3h1ewuPE705nRtioPiTvY/edit?usp=sharing) and apply applicable `backport:*` labels.
…oints (elastic#259656) Closes elastic#259641 ## Summary When a stack connector ID (e.g.: a preconfigured `.inference` connector) is passed to the inference plugin's `chatComplete` API, `getConnectorById` may resolve it to an `InferenceConnector` with `isInferenceEndpoint: true`. Previously, `resolveAndCreatePipeline` only checked the `endpointIdCache` to decide whether to use the inference endpoint execution path. If the cache didn't contain the ID, it fell through to the stack connector adapter path, which then failed because there's no adapter for the .inference connector type. This caused a `Saved object [action/<inference-endpoint-id>] not found` error when callers (e.g.: `kbn-evals`) passed preconfigured `.inference` connector IDs, because the code attempted to execute the ES inference endpoint ID as a Kibana saved-object action. ### Changes - `callback_api.ts`: After `getInferenceExecutor` resolves the connector in the stack connector branch, check `connector.isInferenceEndpoint`. If true, redirect to the inference endpoint execution path (`resolveInferenceEndpoint` + `createInferenceEndpointExecutor` + `inferenceEndpointAdapter`). - `api.test.ts`: Added tests covering the "stack connector resolving to inference endpoint" path. - `create_connector_fixture.ts` (`kbn-evals`): Removed the client-side workaround that was extracting inferenceId from `.inference` connectors - no longer needed now that the inference plugin handles this server-side. - `create_connector_fixture.test.ts`: Removed corresponding workaround tests. ## Related - elastic#258530 introduced unified connector listing via `getConnectorList()`/`getConnectorById()`, which returns inference endpoints with `isInferenceEndpoint: true` - elastic#259446 added a temporary client-side workaround in `kbn-evals` (now removed by this PR) ### Checklist - [x] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios - [x] The PR description includes the appropriate Release Notes section, and the correct `release_note:*` label is applied per the [guidelines](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process) - [x] Review the [backport guidelines](https://docs.google.com/document/d/1VyN5k91e5OVumlc0Gb9RPa3h1ewuPE705nRtioPiTvY/edit?usp=sharing) and apply applicable `backport:*` labels.
…ector resolution (elastic#259446) Closes elastic#259472 ## Summary Fixes two issues breaking `kbn-evals` runs (both local and CI): ### 1. APM / OpenTelemetry tracing conflict A recent validation in `initTelemetry` (elastic#258303, elastic#258663) throws when Elastic APM and OpenTelemetry tracing are both active. The `evals_tracing` Scout config enables OTel tracing but didn't explicitly disable APM, causing Kibana (and the Playwright worker) to crash on startup. Fix: - Added a `coerceCliValue` helper in `applyConfigOverrides` (`kbn-apm-config-loader`) that converts 'true'/'false' to booleans and numeric strings to numbers before they're set in the config object. - Added `--elastic.apm.active=false` and `--elastic.apm.contextPropagationOnly=false` to the `evals_tracing` Scout server config and to `require_init_apm.js` (for the Playwright worker when `TRACING_EXPORTERS` is set). - Updated the `kbn-evals` README to document the required APM settings when configuring tracing in `kibana.dev.yml`. ### 2. Inference endpoint connector resolution elastic#258530 consolidated LLM connector listing through the inference plugin's `getConnectorList()`, which now returns inference endpoint IDs (e.g.: `.anthropic-claude-4.6-opus-chat_completion`) instead of Kibana stack connector keys (e.g.: `elastic-llm-claude-46-opus`). `kbn-evals` was still passing the stack connector key to the inference API, which then tried to execute it as a Kibana action - resulting in "Saved object `[action/.anthropic-claude-4.6-opus-chat_completion]` not found". Fix: - `createConnectorFixture` now detects `.inference-type` connectors and extracts their `inferenceId` from the config, using the ES inference endpoint ID directly. This bypasses the Kibana actions framework and aligns with the unified connector model from [elastic#258530](elastic#258530). ### Checklist - [x] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios - [x] The PR description includes the appropriate Release Notes section, and the correct `release_note:*` label is applied per the [guidelines](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process) - [x] Review the [backport guidelines](https://docs.google.com/document/d/1VyN5k91e5OVumlc0Gb9RPa3h1ewuPE705nRtioPiTvY/edit?usp=sharing) and apply applicable `backport:*` labels.
…oints (elastic#259656) Closes elastic#259641 ## Summary When a stack connector ID (e.g.: a preconfigured `.inference` connector) is passed to the inference plugin's `chatComplete` API, `getConnectorById` may resolve it to an `InferenceConnector` with `isInferenceEndpoint: true`. Previously, `resolveAndCreatePipeline` only checked the `endpointIdCache` to decide whether to use the inference endpoint execution path. If the cache didn't contain the ID, it fell through to the stack connector adapter path, which then failed because there's no adapter for the .inference connector type. This caused a `Saved object [action/<inference-endpoint-id>] not found` error when callers (e.g.: `kbn-evals`) passed preconfigured `.inference` connector IDs, because the code attempted to execute the ES inference endpoint ID as a Kibana saved-object action. ### Changes - `callback_api.ts`: After `getInferenceExecutor` resolves the connector in the stack connector branch, check `connector.isInferenceEndpoint`. If true, redirect to the inference endpoint execution path (`resolveInferenceEndpoint` + `createInferenceEndpointExecutor` + `inferenceEndpointAdapter`). - `api.test.ts`: Added tests covering the "stack connector resolving to inference endpoint" path. - `create_connector_fixture.ts` (`kbn-evals`): Removed the client-side workaround that was extracting inferenceId from `.inference` connectors - no longer needed now that the inference plugin handles this server-side. - `create_connector_fixture.test.ts`: Removed corresponding workaround tests. ## Related - elastic#258530 introduced unified connector listing via `getConnectorList()`/`getConnectorById()`, which returns inference endpoints with `isInferenceEndpoint: true` - elastic#259446 added a temporary client-side workaround in `kbn-evals` (now removed by this PR) ### Checklist - [x] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios - [x] The PR description includes the appropriate Release Notes section, and the correct `release_note:*` label is applied per the [guidelines](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process) - [x] Review the [backport guidelines](https://docs.google.com/document/d/1VyN5k91e5OVumlc0Gb9RPa3h1ewuPE705nRtioPiTvY/edit?usp=sharing) and apply applicable `backport:*` labels.
Closes #259472
Summary
Fixes two issues breaking
kbn-evalsruns (both local and CI):1. APM / OpenTelemetry tracing conflict
A recent validation in
initTelemetry(#258303, #258663) throws when Elastic APM and OpenTelemetry tracing are both active. Theevals_tracingScout config enables OTel tracing but didn't explicitly disable APM, causing Kibana (and the Playwright worker) to crash on startup.Fix:
coerceCliValuehelper inapplyConfigOverrides(kbn-apm-config-loader) that converts 'true'/'false' to booleans and numeric strings to numbers before they're set in the config object.--elastic.apm.active=falseand--elastic.apm.contextPropagationOnly=falseto theevals_tracingScout server config and torequire_init_apm.js(for the Playwright worker whenTRACING_EXPORTERSis set).kbn-evalsREADME to document the required APM settings when configuring tracing inkibana.dev.yml.2. Inference endpoint connector resolution
#258530 consolidated LLM connector listing through the inference plugin's
getConnectorList(), which now returns inference endpoint IDs (e.g.:.anthropic-claude-4.6-opus-chat_completion) instead of Kibana stack connector keys (e.g.:elastic-llm-claude-46-opus).kbn-evalswas still passing the stack connector key to the inference API, which then tried to execute it as a Kibana action - resulting in "Saved object[action/.anthropic-claude-4.6-opus-chat_completion]not found".Fix:
createConnectorFixturenow detects.inference-typeconnectors and extracts theirinferenceIdfrom the config, using the ES inference endpoint ID directly. This bypasses the Kibana actions framework and aligns with the unified connector model from #258530.Checklist
release_note:*label is applied per the guidelinesbackport:*labels.