Skip to content

ES|QL categorize options#131104

Merged
jan-elastic merged 7 commits intoelastic:mainfrom
jan-elastic:esql-categorize-options
Jul 17, 2025
Merged

ES|QL categorize options#131104
jan-elastic merged 7 commits intoelastic:mainfrom
jan-elastic:esql-categorize-options

Conversation

@jan-elastic
Copy link
Copy Markdown
Contributor

@jan-elastic jan-elastic commented Jul 11, 2025

The categorizate text agg has some configuration options. ES|QL categorize does not.

This PR adds them with syntax comparable to options of the ES|QL match function.

The exposed options are:

  • analyzer
  • similarity threshold
  • output format (regex (default) or space-seperated tokens)

Furthermore, the options functionality of match is refactored to make it reusable.

@jan-elastic jan-elastic added >feature :ml Machine learning Team:ML Meta label for the ML team v9.2.0 v8.20.0 labels Jul 11, 2025
@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/ml-core (Team:ML)

@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Hi @jan-elastic, I've created a changelog YAML for you.

@jan-elastic jan-elastic force-pushed the esql-categorize-options branch from 0faee91 to b308397 Compare July 11, 2025 14:43
@jan-elastic jan-elastic marked this pull request as draft July 11, 2025 14:44
@jan-elastic jan-elastic force-pushed the esql-categorize-options branch 4 times, most recently from ddf2f1f to 5ed9dfa Compare July 14, 2025 14:31
@jan-elastic jan-elastic marked this pull request as ready for review July 14, 2025 18:01
@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Hi @jan-elastic, I've created a changelog YAML for you.

@jan-elastic jan-elastic requested review from alex-spies and removed request for alex-spies July 15, 2025 07:04
Copy link
Copy Markdown
Contributor

@ivancea ivancea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks good!

}
}
return new CategorizeDef(
(String) optionsMap.get("analyzer"),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we validate the analyzer here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not that I know of. You'd need the AnalysisRegistry here.

During execution, that comes via the EsPhysicalOperationProviders from the SearchService. I don't see how to obtain something similar at this stage.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok! As it's checked now at planning time, no problem then

@ghudgins ghudgins removed the v8.20.0 label Jul 15, 2025
@ghudgins
Copy link
Copy Markdown

@jan-elastic FYI there's no v8.20.0 as of now. I removed your label

@jan-elastic
Copy link
Copy Markdown
Contributor Author

@ghudgins does that mean we're not backporting new functionality anymore? Just bugfixes to 8.19.x?

@jan-elastic jan-elastic force-pushed the esql-categorize-options branch from a154ede to 5300744 Compare July 16, 2025 07:39
@jan-elastic jan-elastic requested a review from ivancea July 16, 2025 07:46
@jan-elastic
Copy link
Copy Markdown
Contributor Author

@ivancea Thanks for the thorough review. Fixed all you comments. PTAL

Copy link
Copy Markdown
Contributor

@bpintea bpintea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm

@jan-elastic jan-elastic force-pushed the esql-categorize-options branch from 5300744 to 6248657 Compare July 16, 2025 14:34
@jan-elastic
Copy link
Copy Markdown
Contributor Author

@bpintea Thanks for your review. Fixed all your comments

@jan-elastic jan-elastic force-pushed the esql-categorize-options branch from 6248657 to db90b33 Compare July 17, 2025 06:41
@jan-elastic jan-elastic enabled auto-merge (squash) July 17, 2025 06:43
@jan-elastic jan-elastic merged commit ec7f77b into elastic:main Jul 17, 2025
33 checks passed
ywangd pushed a commit to ywangd/elasticsearch that referenced this pull request Jul 17, 2025
* ES|QL categorize options

* refactor options

* fix serialization

* polish

* add verfications

* better test coverage + polish code

* better test coverage + polish code
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>feature :ml Machine learning Team:ML Meta label for the ML team v9.2.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants