[SigEvents] Convert feature duplication evaluators to createPrompt pattern by crespocarlos · Pull Request #256534 · elastic/kibana

crespocarlos · 2026-03-06T19:04:22Z

Summary

Converts the two LLM evaluators in feature duplication evals (createSemanticUniquenessEvaluator and createIdConsistencyEvaluator) from inferenceClient.output() with inline system prompts to the createPrompt + executeUntilValid pattern used by @kbn/evals. System prompts are extracted into .text mustache template files, and tool schemas are preserved as prompt version tool definitions. The CODE evaluator (featureDuplicationEvaluator) is unchanged. All evaluator names, signatures, and return shapes are preserved.

Checklist

Any text added follows EUI's writing guidelines, uses sentence case text and includes i18n support
Documentation was added for features that require explanation or tutorials
Unit or functional tests were updated or added to match the most common scenarios
If a plugin configuration key changed, check if it needs to be allowlisted in the cloud and added to the docker list
This was checked for breaking HTTP API changes, and any breaking changes have been approved by the breaking-change committee. The release_note:breaking label should be applied in these situations.
Flaky Test Runner was used on any tests changed
The PR description includes the appropriate Release Notes section, and the correct release_note:* label is applied per the guidelines
Review the backport guidelines and apply applicable backport:* labels.

Identify risks

No significant risks. This is a structural refactor of evaluator internals — the evaluator contracts (names, signatures, return shapes) are unchanged. The LLM prompts are semantically identical, just moved from inline strings to versioned mustache templates.

🤖 Co-authored with AI assistance.

Refactor `createSemanticUniquenessEvaluator` and `createIdConsistencyEvaluator` from `inferenceClient.output()` to the `createPrompt` + `executeUntilValid` pattern used by the rest of the `@kbn/evals` ecosystem. - Extract system/user prompts into `.text` mustache templates - Define prompt objects with `createPrompt` and Zod input schemas - Replace `inferenceClient.output()` with `executeUntilValid` - Preserve all score calculation logic, evaluator signatures, and return shapes - Move evaluators from flat file into `feature_duplication/` directory Closes elastic#255869 Made-with: Cursor

crespocarlos · 2026-03-09T09:40:44Z

@elasticmachine merge upstream

…-kbn-evals-format

…_duplication_evaluators Made-with: Cursor

crespocarlos · 2026-03-09T13:36:04Z

/ci

...n-evals-suite-streams/src/evaluators/feature_duplication_evaluators/id_consistency_prompt.ts

...ckages/shared/kbn-evals-suite-streams/src/evaluators/feature_duplication_evaluators/index.ts

crespocarlos · 2026-03-17T11:05:12Z

@elasticmachine merge upstream

…-kbn-evals-format

…ators Made-with: Cursor

crespocarlos · 2026-03-18T13:01:21Z

@elasticmachine merge upstream

…-kbn-evals-format

crespocarlos · 2026-03-18T14:02:29Z

@elasticmachine merge upstream

…-kbn-evals-format

crespocarlos · 2026-03-18T16:46:04Z

@elasticmachine merge upstream

…-kbn-evals-format

crespocarlos · 2026-03-18T17:12:32Z

@elasticmachine merge upstream

…-kbn-evals-format

crespocarlos · 2026-03-18T17:38:58Z

@elasticmachine merge upstream

…-kbn-evals-format

crespocarlos · 2026-03-18T18:10:18Z

@elasticmachine merge upstream

…-kbn-evals-format

crespocarlos · 2026-03-18T21:27:39Z

@elasticmachine merge upstream

…-kbn-evals-format

elasticmachine · 2026-03-18T23:14:04Z

💛 Build succeeded, but was flaky

Buildkite Build
Commit: e6878a7

Failed CI Steps

x-pack/platform/test/serverless/functional/configs/security/config.group1.ts

Test Failures

[job] [logs] x-pack/platform/test/serverless/functional/configs/security/config.group1.ts / Serverless Common UI - Management Transform List renders the transform list

Metrics [docs]

✅ unchanged

History

…d_agent_navigation2 * commit '9289d6b5502db245e645e190b0246554396c6c20': (34 commits) [api-docs] 2026-03-19 Daily api_docs build (elastic#258471) [Shared UX][DateRangePicker] Missing parts (elastic#258229) [Dashboard] Keep pinned_panels separate in read response (elastic#258444) Move inheritance: true to top level in .coderabbit.yml (elastic#258461) [DOCS] 9.3.2 Kibana release notes (elastic#257332) adds routing accept metric attribute to the cps metric (elastic#258168) [ML] AI/Inference Connector creation: use 'location' field to correctly set provider config (elastic#250838) [Lens] Add e2e test for legend list layout (elastic#258160) [SigEvents] Convert feature duplication evaluators to createPrompt pattern (elastic#256534) Add actionable-obs author to .coderabbit.yml (elastic#257922) [DOCS] 9.2.7 Kibana release notes (elastic#257331) Grant Serverless editor/viewer access to ES v2 indices (elastic#258384) [SigEvents][Evals] Rename terminology for KI features and KI queries (elastic#258361) [EDR Workflows][Osquery] Add shared table toolbar components and redesign saved queries list (elastic#258394) [Automatic Import V2] Upload samples using an existing index (elastic#258074) Add GET /inference_features route to expose feature registry (elastic#258044) fix additional fields not included (elastic#257625) [Discover] [Metrics] Add tier 2 journeys for Metrics in Discover E2E (elastic#255036) [Lens as code] Support correct X-Axis types in ES|QL visualizations (elastic#258159) Update APM (main) (elastic#254880) ...

…ttern (elastic#256534) Closes elastic#255869 ## Summary Converts the two LLM evaluators in feature duplication evals (`createSemanticUniquenessEvaluator` and `createIdConsistencyEvaluator`) from `inferenceClient.output()` with inline system prompts to the `createPrompt` + `executeUntilValid` pattern used by `@kbn/evals`. System prompts are extracted into `.text` mustache template files, and tool schemas are preserved as prompt version tool definitions. The CODE evaluator (`featureDuplicationEvaluator`) is unchanged. All evaluator names, signatures, and return shapes are preserved. ### Checklist - [x] Any text added follows [EUI's writing guidelines](https://elastic.github.io/eui/#/guidelines/writing), uses sentence case text and includes [i18n support](https://github.com/elastic/kibana/blob/main/src/platform/packages/shared/kbn-i18n/README.md) - [ ] [Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html) was added for features that require explanation or tutorials - [x] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios - [ ] If a plugin configuration key changed, check if it needs to be allowlisted in the cloud and added to the [docker list](https://github.com/elastic/kibana/blob/main/src/dev/build/tasks/os_packages/docker_generator/resources/base/bin/kibana-docker) - [ ] This was checked for breaking HTTP API changes, and any breaking changes have been approved by the breaking-change committee. The `release_note:breaking` label should be applied in these situations. - [ ] [Flaky Test Runner](https://ci-stats.kibana.dev/trigger_flaky_test_runner/1) was used on any tests changed - [x] The PR description includes the appropriate Release Notes section, and the correct `release_note:*` label is applied per the [guidelines](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process) - [ ] Review the [backport guidelines](https://docs.google.com/document/d/1VyN5k91e5OVumlc0Gb9RPa3h1ewuPE705nRtioPiTvY/edit?usp=sharing) and apply applicable `backport:*` labels. ### Identify risks No significant risks. This is a structural refactor of evaluator internals — the evaluator contracts (names, signatures, return shapes) are unchanged. The LLM prompts are semantically identical, just moved from inline strings to versioned mustache templates. --- 🤖 Co-authored with AI assistance. --------- Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com> Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com> Co-authored-by: Viduni Wickramarachchi <viduni.wickramarachchi@elastic.co>

crespocarlos requested review from a team as code owners March 6, 2026 19:04

crespocarlos mentioned this pull request Mar 6, 2026

Convert feature duplication evals to @kbn/evals evaluator format crespocarlos/kibana#12

Closed

4 tasks

crespocarlos changed the title ~~Convert feature duplication evals to @kbn/evals evaluator format~~ refactor(evals): convert feature duplication evaluators to createPrompt pattern Mar 6, 2026

crespocarlos marked this pull request as draft March 6, 2026 19:21

crespocarlos added backport:skip This PR does not require backporting release_note:skip Skip the PR/issue when compiling release notes labels Mar 6, 2026

Changes from node scripts/regenerate_moon_projects.js --update

9df3621

jasonrhodes added author:sig-events closes:sig-events PR closes an issue labeled for Significant Events and removed author:sig-events labels Mar 6, 2026

crespocarlos changed the title ~~refactor(evals): convert feature duplication evaluators to createPrompt pattern~~ [SigEvents] Convert feature duplication evaluators to createPrompt pattern Mar 9, 2026

elasticmachine and others added 2 commits March 9, 2026 10:40

Merge branch 'main' into feature/convert-feature-duplication-evals-to…

ae95b5d

…-kbn-evals-format

refactor(evals): rename feature_duplication evaluators dir to feature…

fa63531

…_duplication_evaluators Made-with: Cursor

crespocarlos marked this pull request as ready for review March 9, 2026 12:45

viduni94 reviewed Mar 11, 2026

View reviewed changes

...n-evals-suite-streams/src/evaluators/feature_duplication_evaluators/id_consistency_prompt.ts Show resolved Hide resolved

...ckages/shared/kbn-evals-suite-streams/src/evaluators/feature_duplication_evaluators/index.ts Outdated Show resolved Hide resolved

elasticmachine and others added 2 commits March 17, 2026 04:05

Merge branch 'main' into feature/convert-feature-duplication-evals-to…

88c3b06

…-kbn-evals-format

fix(evals): validate tool call arguments in feature duplication evalu…

1be3e13

…ators Made-with: Cursor

crespocarlos removed models:eis/anthropic-claude-4.5-sonnet Run LLM evals against model: eis/anthropic-claude-4.5-sonnet models:eis/google-gemini-2.5-flash Run LLM evals against model: eis/google-gemini-2.5-flash labels Mar 18, 2026

Merge branch 'main' into feature/convert-feature-duplication-evals-to…

0678064

…-kbn-evals-format

crespocarlos removed the models:eis/google-gemini-3.0-flash Run LLM evals against model: eis/google-gemini-3.0-flash label Mar 18, 2026

Merge branch 'main' into feature/convert-feature-duplication-evals-to…

ea45df1

…-kbn-evals-format

Merge branch 'main' into feature/convert-feature-duplication-evals-to…

a5237c6

…-kbn-evals-format

crespocarlos added models:judge:llm-gateway/gemini-3.1-pro-preview Override LLM-as-a-judge connector for evals: llm-gateway/gemini-3.1-pro-preview and removed models:judge:eis/google-gemini-3.1-pro Override LLM-as-a-judge connector for evals: eis/google-gemini-3.1-pro labels Mar 18, 2026

Merge branch 'main' into feature/convert-feature-duplication-evals-to…

5fc7268

…-kbn-evals-format

Merge branch 'main' into feature/convert-feature-duplication-evals-to…

0f7be25

…-kbn-evals-format

Merge branch 'main' into feature/convert-feature-duplication-evals-to…

3ec919c

…-kbn-evals-format

crespocarlos removed the (deprecated) evals:streams-sigevents This label is deprecated. Use `evals:significant-events` to run the Significant Events eval suite. label Mar 18, 2026

Merge branch 'main' into feature/convert-feature-duplication-evals-to…

e6878a7

…-kbn-evals-format

crespocarlos merged commit 5e2b8e8 into elastic:main Mar 18, 2026
18 checks passed

crespocarlos deleted the feature/convert-feature-duplication-evals-to-kbn-evals-format branch March 18, 2026 23:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SigEvents] Convert feature duplication evaluators to createPrompt pattern#256534

[SigEvents] Convert feature duplication evaluators to createPrompt pattern#256534
crespocarlos merged 22 commits intoelastic:mainfrom
crespocarlos:feature/convert-feature-duplication-evals-to-kbn-evals-format

crespocarlos commented Mar 6, 2026 •

edited

Loading

Uh oh!

crespocarlos commented Mar 9, 2026

Uh oh!

crespocarlos commented Mar 9, 2026

Uh oh!

Uh oh!

Uh oh!

crespocarlos commented Mar 17, 2026

Uh oh!

crespocarlos commented Mar 18, 2026

Uh oh!

crespocarlos commented Mar 18, 2026

Uh oh!

crespocarlos commented Mar 18, 2026

Uh oh!

crespocarlos commented Mar 18, 2026

Uh oh!

crespocarlos commented Mar 18, 2026

Uh oh!

crespocarlos commented Mar 18, 2026

Uh oh!

crespocarlos commented Mar 18, 2026

Uh oh!

Uh oh!

elasticmachine commented Mar 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

crespocarlos commented Mar 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Checklist

Identify risks

Uh oh!

crespocarlos commented Mar 9, 2026

Uh oh!

crespocarlos commented Mar 9, 2026

Uh oh!

Uh oh!

Uh oh!

crespocarlos commented Mar 17, 2026

Uh oh!

crespocarlos commented Mar 18, 2026

Uh oh!

crespocarlos commented Mar 18, 2026

Uh oh!

crespocarlos commented Mar 18, 2026

Uh oh!

crespocarlos commented Mar 18, 2026

Uh oh!

crespocarlos commented Mar 18, 2026

Uh oh!

crespocarlos commented Mar 18, 2026

Uh oh!

crespocarlos commented Mar 18, 2026

Uh oh!

Uh oh!

elasticmachine commented Mar 18, 2026

💛 Build succeeded, but was flaky

Failed CI Steps

Test Failures

Metrics [docs]

History

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

crespocarlos commented Mar 6, 2026 •

edited

Loading