ES|QL Add initial support for semantic_text field type#113920
ES|QL Add initial support for semantic_text field type#113920ioanatia merged 38 commits intoelastic:mainfrom
Conversation
|
Hi @ioanatia, I've created a changelog YAML for you. |
8537aaa to
36e4c7d
Compare
|
Pinging @elastic/es-search-relevance (Team:Search Relevance) |
| .setting("cluster.remote.connections_per_cluster", "1") | ||
| .shared(true) | ||
| .setting("cluster.routing.rebalance.enable", "none") | ||
| .plugin("inference-service-test") |
There was a problem hiding this comment.
this plugin is used for testing semantic_text - it's an inference service that can create sparse or dense embeddings - that have no actual "semantic" meaning since they are not using a model, but they are supposed to be deterministic.
| Request request = new Request("PUT", "_inference/sparse_embedding/test_sparse_inference"); | ||
| request.setJsonEntity(""" | ||
| { | ||
| "service": "test_service", |
There was a problem hiding this comment.
this is the inference service from the inference-service-test test plugin
There was a problem hiding this comment.
This one's probably worth javadoc to explain that it's for the semantic text fields.
There was a problem hiding this comment.
It kinda surprises me that this is the first capability conditional in the test infra. How expensive is this service endpoint? Conditionally creating it is of course fine, but the test infra would be simpler if it was unconditionally registered.
There was a problem hiding this comment.
The test endpoint is quite lightweight. We could always register it.
nik9000
left a comment
There was a problem hiding this comment.
Makes sense to me. @not-napoleon might want to take a look because this is beginning to follow your lead.
| public class EsqlSpecIT extends EsqlSpecTestCase { | ||
| @ClassRule | ||
| public static ElasticsearchCluster cluster = Clusters.testCluster(spec -> {}); | ||
| public static ElasticsearchCluster cluster = Clusters.testCluster(spec -> spec.plugin("inference-service-test")); |
There was a problem hiding this comment.
Will everything need this plugin? should it be in testCluster?
There was a problem hiding this comment.
I think only this one needs it - it's the only one for multi_node that extends from EsqlSpecTestCase that loads the CSV data sets.
| Request request = new Request("PUT", "_inference/sparse_embedding/test_sparse_inference"); | ||
| request.setJsonEntity(""" | ||
| { | ||
| "service": "test_service", |
There was a problem hiding this comment.
This one's probably worth javadoc to explain that it's for the semantic text fields.
x-pack/plugin/esql/qa/testFixtures/src/main/resources/semantic_text.csv-spec
Outdated
Show resolved
Hide resolved
...nference/src/main/java/org/elasticsearch/xpack/inference/mapper/SemanticTextFieldMapper.java
Outdated
Show resolved
Hide resolved
...nference/src/main/java/org/elasticsearch/xpack/inference/mapper/SemanticTextFieldMapper.java
Outdated
Show resolved
Hide resolved
carlosdelest
left a comment
There was a problem hiding this comment.
semantic_text support, yay! 💯
...nference/src/main/java/org/elasticsearch/xpack/inference/mapper/SemanticTextFieldMapper.java
Outdated
Show resolved
Hide resolved
| Request request = new Request("PUT", "_inference/sparse_embedding/test_sparse_inference"); | ||
| request.setJsonEntity(""" | ||
| { | ||
| "service": "test_service", |
There was a problem hiding this comment.
The test endpoint is quite lightweight. We could always register it.
|
There are some issues with loading this plugin for multi-cluster and mixed-versions when one of the cluster nodes in on 8.16.0: multi-node and single-node should continue to work just fine. The checks |
|
Ok, mixed node testing will be avoided until the following issue is resolved as a follow up. #115166. (given that mixed mode will only be relevant when this PR is backported) |
|
@elasticmachine update branch |
|
@elasticmachine update branch |
|
@elasticmachine update branch |
|
@elasticmachine test this please |
|
@elasticmachine update branch |
|
Thank you folks, especially @fang-xing-esql and @ChrisHegarty for your help on this PR! |
💔 Backport failedThe backport operation could not be completed due to the following error: You can use sqren/backport to manually backport by running |
* Add initial support for semantic_text field type * Update docs/changelog/113920.yaml * More tests and fixes * Use mock inference service * Fix tests * Spotless * Fix mixed-cluster and multi-clusters tests * sort * Attempt another fix for bwc tests * Spotless * Fix merge * Attempt another fix * Don't load the inference-service-test plugin for mixed versions/clusters * Add more tests, address review comments * trivial * revert * post-merge fix block loader * post-merge fix compile * add mixed version testing * whitespace * fix MultiClusterSpecIT * add more fields to mapping * Revert mixed version testing * whitespace --------- Co-authored-by: ChrisHegarty <chegar999@gmail.com> Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
…5256) * Add initial support for semantic_text field type * Update docs/changelog/113920.yaml * More tests and fixes * Use mock inference service * Fix tests * Spotless * Fix mixed-cluster and multi-clusters tests * sort * Attempt another fix for bwc tests * Spotless * Fix merge * Attempt another fix * Don't load the inference-service-test plugin for mixed versions/clusters * Add more tests, address review comments * trivial * revert * post-merge fix block loader * post-merge fix compile * add mixed version testing * whitespace * fix MultiClusterSpecIT * add more fields to mapping * Revert mixed version testing * whitespace --------- Co-authored-by: ChrisHegarty <chegar999@gmail.com> Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
* Add initial support for semantic_text field type * Update docs/changelog/113920.yaml * More tests and fixes * Use mock inference service * Fix tests * Spotless * Fix mixed-cluster and multi-clusters tests * sort * Attempt another fix for bwc tests * Spotless * Fix merge * Attempt another fix * Don't load the inference-service-test plugin for mixed versions/clusters * Add more tests, address review comments * trivial * revert * post-merge fix block loader * post-merge fix compile * add mixed version testing * whitespace * fix MultiClusterSpecIT * add more fields to mapping * Revert mixed version testing * whitespace --------- Co-authored-by: ChrisHegarty <chegar999@gmail.com> Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
* Add initial support for semantic_text field type * Update docs/changelog/113920.yaml * More tests and fixes * Use mock inference service * Fix tests * Spotless * Fix mixed-cluster and multi-clusters tests * sort * Attempt another fix for bwc tests * Spotless * Fix merge * Attempt another fix * Don't load the inference-service-test plugin for mixed versions/clusters * Add more tests, address review comments * trivial * revert * post-merge fix block loader * post-merge fix compile * add mixed version testing * whitespace * fix MultiClusterSpecIT * add more fields to mapping * Revert mixed version testing * whitespace --------- Co-authored-by: ChrisHegarty <chegar999@gmail.com> Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
Support is added behind a feature flag. We could not simply use
EsqlCapabilitiessince that's not available inesql-core.Right now we have no support in existing functions.
I followed the approach for adding initial support for
date_nanoswhich was also added behind a feature flag which allowed for incremental progress, rather than adding support for everything in one PR: #110205With this PR we will return
semantic_textfields as part of the results and it will also allow us to refer tosemantic_textfields in thematchfunction (to run semantic search):realtes: #115103