Skip to content

[SigEvents] Convert feature duplication evaluators to createPrompt pattern#256534

Merged
crespocarlos merged 22 commits intoelastic:mainfrom
crespocarlos:feature/convert-feature-duplication-evals-to-kbn-evals-format
Mar 18, 2026
Merged

[SigEvents] Convert feature duplication evaluators to createPrompt pattern#256534
crespocarlos merged 22 commits intoelastic:mainfrom
crespocarlos:feature/convert-feature-duplication-evals-to-kbn-evals-format

Conversation

@crespocarlos
Copy link
Copy Markdown
Contributor

@crespocarlos crespocarlos commented Mar 6, 2026

Closes #255869

Summary

Converts the two LLM evaluators in feature duplication evals (createSemanticUniquenessEvaluator and createIdConsistencyEvaluator) from inferenceClient.output() with inline system prompts to the createPrompt + executeUntilValid pattern used by @kbn/evals. System prompts are extracted into .text mustache template files, and tool schemas are preserved as prompt version tool definitions. The CODE evaluator (featureDuplicationEvaluator) is unchanged. All evaluator names, signatures, and return shapes are preserved.

Checklist

  • Any text added follows EUI's writing guidelines, uses sentence case text and includes i18n support
  • Documentation was added for features that require explanation or tutorials
  • Unit or functional tests were updated or added to match the most common scenarios
  • If a plugin configuration key changed, check if it needs to be allowlisted in the cloud and added to the docker list
  • This was checked for breaking HTTP API changes, and any breaking changes have been approved by the breaking-change committee. The release_note:breaking label should be applied in these situations.
  • Flaky Test Runner was used on any tests changed
  • The PR description includes the appropriate Release Notes section, and the correct release_note:* label is applied per the guidelines
  • Review the backport guidelines and apply applicable backport:* labels.

Identify risks

No significant risks. This is a structural refactor of evaluator internals — the evaluator contracts (names, signatures, return shapes) are unchanged. The LLM prompts are semantically identical, just moved from inline strings to versioned mustache templates.


🤖 Co-authored with AI assistance.

Refactor `createSemanticUniquenessEvaluator` and `createIdConsistencyEvaluator`
from `inferenceClient.output()` to the `createPrompt` + `executeUntilValid`
pattern used by the rest of the `@kbn/evals` ecosystem.

- Extract system/user prompts into `.text` mustache templates
- Define prompt objects with `createPrompt` and Zod input schemas
- Replace `inferenceClient.output()` with `executeUntilValid`
- Preserve all score calculation logic, evaluator signatures, and return shapes
- Move evaluators from flat file into `feature_duplication/` directory

Closes elastic#255869

Made-with: Cursor
@crespocarlos crespocarlos requested review from a team as code owners March 6, 2026 19:04
@crespocarlos crespocarlos changed the title Convert feature duplication evals to @kbn/evals evaluator format refactor(evals): convert feature duplication evaluators to createPrompt pattern Mar 6, 2026
@crespocarlos crespocarlos marked this pull request as draft March 6, 2026 19:21
@crespocarlos crespocarlos added backport:skip This PR does not require backporting release_note:skip Skip the PR/issue when compiling release notes labels Mar 6, 2026
@jasonrhodes jasonrhodes added author:sig-events closes:sig-events PR closes an issue labeled for Significant Events and removed author:sig-events labels Mar 6, 2026
@crespocarlos crespocarlos changed the title refactor(evals): convert feature duplication evaluators to createPrompt pattern [SigEvents] Convert feature duplication evaluators to createPrompt pattern Mar 9, 2026
@crespocarlos
Copy link
Copy Markdown
Contributor Author

@elasticmachine merge upstream

@crespocarlos crespocarlos marked this pull request as ready for review March 9, 2026 12:45
@crespocarlos
Copy link
Copy Markdown
Contributor Author

/ci

@crespocarlos
Copy link
Copy Markdown
Contributor Author

@elasticmachine merge upstream

@crespocarlos crespocarlos added models:llm-gateway/claude-sonnet-4-5 Run LLM evals against model: llm-gateway/claude-sonnet-4-5 models:eis/anthropic-claude-4.6-opus Run LLM evals against model: eis/anthropic-claude-4.6-opus models:eis/google-gemini-3.0-flash Run LLM evals against model: eis/google-gemini-3.0-flash models:eis/openai-gpt-5.2 Run LLM evals against model: eis/openai-gpt-5.2 models:eis/openai-gpt-oss-120b Run LLM evals against model: eis/openai-gpt-oss-120b deprecated:models:judge:eis/google-gemini-3.0-pro DEPRECATED - model no longer available deprecated:models:llm-gateway/gemini-2.5-flash DEPRECATED - model no longer available models:llm-gateway/claude-sonnet-4-6 Run LLM evals against model: llm-gateway/claude-sonnet-4-6 labels Mar 17, 2026
@crespocarlos crespocarlos removed models:eis/anthropic-claude-4.5-sonnet Run LLM evals against model: eis/anthropic-claude-4.5-sonnet models:eis/google-gemini-2.5-flash Run LLM evals against model: eis/google-gemini-2.5-flash labels Mar 18, 2026
@crespocarlos
Copy link
Copy Markdown
Contributor Author

@elasticmachine merge upstream

@crespocarlos crespocarlos removed the models:eis/google-gemini-3.0-flash Run LLM evals against model: eis/google-gemini-3.0-flash label Mar 18, 2026
@crespocarlos
Copy link
Copy Markdown
Contributor Author

@elasticmachine merge upstream

@crespocarlos
Copy link
Copy Markdown
Contributor Author

@elasticmachine merge upstream

@crespocarlos crespocarlos added models:judge:llm-gateway/gemini-3.1-pro-preview Override LLM-as-a-judge connector for evals: llm-gateway/gemini-3.1-pro-preview and removed models:judge:eis/google-gemini-3.1-pro Override LLM-as-a-judge connector for evals: eis/google-gemini-3.1-pro labels Mar 18, 2026
@crespocarlos
Copy link
Copy Markdown
Contributor Author

@elasticmachine merge upstream

@crespocarlos
Copy link
Copy Markdown
Contributor Author

@elasticmachine merge upstream

@crespocarlos crespocarlos removed models:eis/anthropic-claude-4.6-opus Run LLM evals against model: eis/anthropic-claude-4.6-opus models:eis/openai-gpt-5.2 Run LLM evals against model: eis/openai-gpt-5.2 models:eis/openai-gpt-oss-120b Run LLM evals against model: eis/openai-gpt-oss-120b models:judge:llm-gateway/gemini-3.1-pro-preview Override LLM-as-a-judge connector for evals: llm-gateway/gemini-3.1-pro-preview labels Mar 18, 2026
@crespocarlos
Copy link
Copy Markdown
Contributor Author

@elasticmachine merge upstream

@crespocarlos crespocarlos removed the (deprecated) evals:streams-sigevents This label is deprecated. Use `evals:significant-events` to run the Significant Events eval suite. label Mar 18, 2026
@crespocarlos
Copy link
Copy Markdown
Contributor Author

@elasticmachine merge upstream

@crespocarlos crespocarlos merged commit 5e2b8e8 into elastic:main Mar 18, 2026
18 checks passed
@elasticmachine
Copy link
Copy Markdown
Contributor

💛 Build succeeded, but was flaky

Failed CI Steps

Test Failures

  • [job] [logs] x-pack/platform/test/serverless/functional/configs/security/config.group1.ts / Serverless Common UI - Management Transform List renders the transform list

Metrics [docs]

✅ unchanged

History

@crespocarlos crespocarlos deleted the feature/convert-feature-duplication-evals-to-kbn-evals-format branch March 18, 2026 23:18
mbondyra added a commit to mbondyra/kibana that referenced this pull request Mar 19, 2026
…d_agent_navigation2

* commit '9289d6b5502db245e645e190b0246554396c6c20': (34 commits)
  [api-docs] 2026-03-19 Daily api_docs build (elastic#258471)
  [Shared UX][DateRangePicker] Missing parts (elastic#258229)
  [Dashboard] Keep pinned_panels separate in read response (elastic#258444)
  Move inheritance: true to top level in .coderabbit.yml (elastic#258461)
  [DOCS] 9.3.2 Kibana release notes (elastic#257332)
  adds routing accept metric attribute to the cps metric (elastic#258168)
  [ML] AI/Inference Connector creation: use 'location' field to correctly set provider config  (elastic#250838)
  [Lens] Add e2e test for legend list layout (elastic#258160)
  [SigEvents] Convert feature duplication evaluators to createPrompt pattern (elastic#256534)
  Add actionable-obs author to .coderabbit.yml (elastic#257922)
  [DOCS] 9.2.7 Kibana release notes (elastic#257331)
  Grant Serverless editor/viewer access to ES v2 indices (elastic#258384)
  [SigEvents][Evals] Rename terminology for KI features and KI queries (elastic#258361)
  [EDR Workflows][Osquery] Add shared table toolbar components and redesign saved queries list (elastic#258394)
  [Automatic Import V2] Upload samples using an existing index (elastic#258074)
  Add GET /inference_features route to expose feature registry (elastic#258044)
  fix additional fields not included (elastic#257625)
  [Discover] [Metrics] Add tier 2 journeys for Metrics in Discover E2E (elastic#255036)
  [Lens as code] Support correct X-Axis types in ES|QL visualizations (elastic#258159)
  Update APM (main) (elastic#254880)
  ...
flash1293 pushed a commit to flash1293/kibana that referenced this pull request Mar 19, 2026
…ttern (elastic#256534)

Closes elastic#255869

## Summary

Converts the two LLM evaluators in feature duplication evals
(`createSemanticUniquenessEvaluator` and `createIdConsistencyEvaluator`)
from `inferenceClient.output()` with inline system prompts to the
`createPrompt` + `executeUntilValid` pattern used by `@kbn/evals`.
System prompts are extracted into `.text` mustache template files, and
tool schemas are preserved as prompt version tool definitions. The CODE
evaluator (`featureDuplicationEvaluator`) is unchanged. All evaluator
names, signatures, and return shapes are preserved.

### Checklist

- [x] Any text added follows [EUI's writing
guidelines](https://elastic.github.io/eui/#/guidelines/writing), uses
sentence case text and includes [i18n
support](https://github.com/elastic/kibana/blob/main/src/platform/packages/shared/kbn-i18n/README.md)
- [ ]
[Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html)
was added for features that require explanation or tutorials
- [x] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios
- [ ] If a plugin configuration key changed, check if it needs to be
allowlisted in the cloud and added to the [docker
list](https://github.com/elastic/kibana/blob/main/src/dev/build/tasks/os_packages/docker_generator/resources/base/bin/kibana-docker)
- [ ] This was checked for breaking HTTP API changes, and any breaking
changes have been approved by the breaking-change committee. The
`release_note:breaking` label should be applied in these situations.
- [ ] [Flaky Test
Runner](https://ci-stats.kibana.dev/trigger_flaky_test_runner/1) was
used on any tests changed
- [x] The PR description includes the appropriate Release Notes section,
and the correct `release_note:*` label is applied per the
[guidelines](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)
- [ ] Review the [backport
guidelines](https://docs.google.com/document/d/1VyN5k91e5OVumlc0Gb9RPa3h1ewuPE705nRtioPiTvY/edit?usp=sharing)
and apply applicable `backport:*` labels.

### Identify risks

No significant risks. This is a structural refactor of evaluator
internals — the evaluator contracts (names, signatures, return shapes)
are unchanged. The LLM prompts are semantically identical, just moved
from inline strings to versioned mustache templates.

---
🤖 Co-authored with AI assistance.

---------

Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
Co-authored-by: Viduni Wickramarachchi <viduni.wickramarachchi@elastic.co>
jeramysoucy pushed a commit to jeramysoucy/kibana that referenced this pull request Mar 26, 2026
…ttern (elastic#256534)

Closes elastic#255869

## Summary

Converts the two LLM evaluators in feature duplication evals
(`createSemanticUniquenessEvaluator` and `createIdConsistencyEvaluator`)
from `inferenceClient.output()` with inline system prompts to the
`createPrompt` + `executeUntilValid` pattern used by `@kbn/evals`.
System prompts are extracted into `.text` mustache template files, and
tool schemas are preserved as prompt version tool definitions. The CODE
evaluator (`featureDuplicationEvaluator`) is unchanged. All evaluator
names, signatures, and return shapes are preserved.

### Checklist

- [x] Any text added follows [EUI's writing
guidelines](https://elastic.github.io/eui/#/guidelines/writing), uses
sentence case text and includes [i18n
support](https://github.com/elastic/kibana/blob/main/src/platform/packages/shared/kbn-i18n/README.md)
- [ ]
[Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html)
was added for features that require explanation or tutorials
- [x] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios
- [ ] If a plugin configuration key changed, check if it needs to be
allowlisted in the cloud and added to the [docker
list](https://github.com/elastic/kibana/blob/main/src/dev/build/tasks/os_packages/docker_generator/resources/base/bin/kibana-docker)
- [ ] This was checked for breaking HTTP API changes, and any breaking
changes have been approved by the breaking-change committee. The
`release_note:breaking` label should be applied in these situations.
- [ ] [Flaky Test
Runner](https://ci-stats.kibana.dev/trigger_flaky_test_runner/1) was
used on any tests changed
- [x] The PR description includes the appropriate Release Notes section,
and the correct `release_note:*` label is applied per the
[guidelines](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)
- [ ] Review the [backport
guidelines](https://docs.google.com/document/d/1VyN5k91e5OVumlc0Gb9RPa3h1ewuPE705nRtioPiTvY/edit?usp=sharing)
and apply applicable `backport:*` labels.

### Identify risks

No significant risks. This is a structural refactor of evaluator
internals — the evaluator contracts (names, signatures, return shapes)
are unchanged. The LLM prompts are semantically identical, just moved
from inline strings to versioned mustache templates.

---
🤖 Co-authored with AI assistance.

---------

Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
Co-authored-by: Viduni Wickramarachchi <viduni.wickramarachchi@elastic.co>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport:skip This PR does not require backporting closes:sig-events PR closes an issue labeled for Significant Events release_note:skip Skip the PR/issue when compiling release notes v9.4.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[SigEvents][Evals] Convert feature duplication evals to the @kbn/evals evaluator format

5 participants