Conversation
…84/elasticsearch into 27243_collapse_with_rescore
|
Since this is a community submitted pull request, a Jenkins build has not been kicked off automatically. Can an Elastic organization member please verify the contents of this patch and then kick off a build manually? |
1 similar comment
|
Since this is a community submitted pull request, a Jenkins build has not been kicked off automatically. Can an Elastic organization member please verify the contents of this patch and then kick off a build manually? |
|
@elasticmachine ok to test |
|
https://github.com/elastic/elasticsearch/blob/master/rest-api-spec/src/main/resources/rest-api-spec/test/search/110_field_collapsing.yml#L241 is failing. I think we could remove it, I've already added integration test for this behaviour |
|
|
||
| SearchResponse searchResponse = client().prepareSearch("test") | ||
| .setTypes("type1") | ||
| .setQuery(new MatchQueryBuilder("name", "one")) |
There was a problem hiding this comment.
The score of this query depends on the number of shards, the default similarity, ... To make sure that we have consistent scoring you can use a function_score query like the following:
QueryBuilder query = functionScoreQuery(
termQuery("name", "one"),
ScoreFunctionBuilders.fieldValueFactorFunction("my_static_doc_score")
).boostMode(CombineFunction.REPLACE);
... and add the my_static_doc_score at indexing time.
| SearchResponse searchResponse = client().prepareSearch("test") | ||
| .setTypes("type1") | ||
| .setQuery(new MatchQueryBuilder("name", "one")) | ||
| .addRescorer(new QueryRescorerBuilder(new MatchQueryBuilder("name", "two"))) |
There was a problem hiding this comment.
You can use the same for the rescore with another field for instance
|
@jimczi Thanks for reviewing. I'll update PR next week. |
|
@jimczi PR updated, now integration test use static scoring. |
|
@elasticmachine ok to test |
This change adds the ability to rescore collapsed documents.
* master: [TEST] AwaitsFix QueryRescorerIT.testRescoreAfterCollapse Decouple XContentType from StreamInput/Output (elastic#28927) Remove BytesRef usage from XContentParser and its subclasses (elastic#28792) [DOCS] Correct typo in configuration (elastic#28903) Fix incorrect datemath example (elastic#28904) Add a usage example of the JLH score (elastic#28905) Wrap stream passed to createParser in try-with-resources (elastic#28897) Rescore collapsed documents (elastic#28521) Fix (simple)_query_string to ignore removed terms (elastic#28871) [Docs] Fix typo in composite aggregation (elastic#28891) Try if tombstone is eligable for pruning before locking on it's key (elastic#28767)
|
I had to revert this change since it doesn't work as expected. I forgot that the collapsed values would also need to be resorted by the rescorer. We use these values in the coordinating node to collapse the results of each shard but the rescorer in Lucene cannot access them: I am really sorry I missed that but since it would require a rewriting of the rescorer in Lucene and that the collapsing code is only in es I don't think it is worth the effort. |
|
Doesn't Solr support collapse + rescore (rerank)? The claim that Lucene's rescorer needs a rewrite seems dubious. |
|
I agree that we should be able to rescore collapsed documents but this is more high hanging fruit than I thought which is why I reverted and closed the issue for now (sorry @fred84 ). |
|
@jimczi let me now when I can start this issue again :) |
* es/master: (48 commits) Update bucket-sort-aggregation.asciidoc (#28937) [Docs] REST high-level client: Fix code for most basic search request (#28916) Improved percolator's random candidate query duel test and fixed bugs that were exposed by this: Revert "Rescore collapsed documents (#28521)" Build: Fix test logger NPE when no tests are run (#28929) [TEST] AwaitsFix QueryRescorerIT.testRescoreAfterCollapse Decouple XContentType from StreamInput/Output (#28927) Remove BytesRef usage from XContentParser and its subclasses (#28792) [DOCS] Correct typo in configuration (#28903) Fix incorrect datemath example (#28904) Add a usage example of the JLH score (#28905) Wrap stream passed to createParser in try-with-resources (#28897) Rescore collapsed documents (#28521) Fix (simple)_query_string to ignore removed terms (#28871) [Docs] Fix typo in composite aggregation (#28891) Try if tombstone is eligable for pruning before locking on it's key (#28767) Limit analyzed text for highlighting (improvements) (#28808) Missing `timeout` parameter from the REST API spec JSON files (#28328) Clarifies how query_string splits textual part (#28798) Update outdated java version reference (#28870) ...
* es/6.x: (48 commits) Update bucket-sort-aggregation.asciidoc (#28937) [Docs] REST high-level client: Fix code for most basic search request (#28916) Improved percolator's random candidate query duel test and fixed bugs that were exposed by this: Revert "Rescore collapsed documents (#28521)" Build: Fix test logger NPE when no tests are run (#28929) [TEST] AwaitsFix QueryRescorerIT.testRescoreAfterCollapse Decouple XContentType from StreamInput/Output (#28927) Remove BytesRef usage from XContentParser and its subclasses (#28792) Add doc note for -server flag on Windows service [DOCS] Correct typo in configuration (#28903) Fix incorrect datemath example (#28904) Add a usage example of the JLH score (#28905) Limit analyzed text for highlighting (improvements) (#28907) Wrap stream passed to createParser in try-with-resources (#28897) [Docs] Fix typo in composite aggregation (#28891) Rescore collapsed documents (#28521) Fix (simple)_query_string to ignore removed terms (#28871) Missing `timeout` parameter from the REST API spec JSON files (#28328) Clarifies how query_string splits textual part (#28798) Update outdated java version reference (#28870) ...
This change adds the ability to rescore collapsed documents.
This reverts commit f057fc2. The rescorer does not resort the collapsed values inside the top docs during rescoring. For this reason the Lucene rescorer is not compatible with collapsing. Relates elastic#27243
|
Sorry it took me some time to come back at this. I checked why Solr was able to rescore the collapsed documents seamlessly and found out that they force the routing of each group in a single shard. This means that all the documents belonging to a single group are on the same shard so the rescoring is always done on the final head of the group. In es we don't enforce the routing so each group can be spread over multiple shards. This complicates the rescoring since it is always applied at the shard level and in this case on the temporary head of the groups (we don't know the final head in the shard since another shard can contain a better document for that group). For this reason I am reluctant to add this functionality because it might be surprising to see a head in a group that is not the best document of that group in the final response. This can happen if the rescoring gives a score to a document in a shard that is better that the score of the best document in the group which is in another shard. I don't see how we could avoid this unless we force the routing of the groups. |
|
Is there a way to revisit this functionality? This limitation curtails the use of any LTR algorithms |
|
Excuse me, is there a way to apply rescoring before collapsing? |
I am looking for this option for LTR rescoring |
|
Hello, @jimczi I am looking for possible (even limited) solution to make collapse working with rescore (especially LTR). "ext": {
"post-rescore" : { /* ... */ }
}This The present solution rescore only results present on current page. Code of solution is here: I will be thankful for any feedback and suggestions. |
Add support for rescoring collapsed docs (#27243). Documents at first get collapsed and then rescored.
@jimczi please take a look