Skip to content

Explain Function Score Query#111807

Merged
john-wagster merged 15 commits intoelastic:mainfrom
john-wagster:explain-function-score-query
Aug 13, 2024
Merged

Explain Function Score Query#111807
john-wagster merged 15 commits intoelastic:mainfrom
john-wagster:explain-function-score-query

Conversation

@john-wagster
Copy link
Copy Markdown
Contributor

@john-wagster john-wagster commented Aug 12, 2024

Addressing an NPE found when building a custom plugin leveraging a script.
#109177

This occurs when calling execute on a ScoreScript like this:

public double execute(ExplanationHolder explanation) {
    explanation.set("An example optional custom description to explain details for this script's execution; we'll provide a default one if you leave this out.");
   ...
}
Example Returned Hit with the Explanation Customized
{
  "_shard" : "[test][0]",
  "_node" : "6aOP4_Q5SOGIyr4oIxj0TQ",
  "_index" : "test",
  "_id" : "2",
  "_score" : 0.5685853,
  "_source" : {
    "important_field" : "foo foo foo"
  },
  "_explanation" : {
    "value" : 0.5685853,
    "description" : "function score, product of:",
    "details" : [
      ...
      {
        "value" : 3.0,
        "description" : "min of:",
        "details" : [
          {
            "value" : 3.0,
            "description" : "An example optional custom description to explain details for this script's execution; we'll provide a default one if you leave this out.",
            "details" : [
              {
                "value" : 0.18952844,
                "description" : "_score: ",
                "details" : [
                  {
                    "value" : 0.18952844,
                    "description" : "weight(important_field:foo in 1) [PerFieldSimilarity], result of:",
                    "details" : [
                      {
                        "value" : 0.18952844,
                        "description" : "score(freq=3.0), computed as boost * idf * tf from:",
                        "details" : [
                          {
                            "value" : 2.2,
                            "description" : "boost",
                            "details" : [ ]
                          },
                          {
                            "value" : 0.13353139,
                            "description" : "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:",
                            "details" : [
                              {
                                "value" : 3,
                                "description" : "n, number of documents containing term",
                                "details" : [ ]
                              },
                              {
                                "value" : 3,
                                "description" : "N, total number of documents with field",
                                "details" : [ ]
                              }
                            ]
                          },
                          {
                            "value" : 0.6451613,
                            "description" : "tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:",
                            "details" : [
                              {
                                "value" : 3.0,
                                "description" : "freq, occurrences of term within document",
                                "details" : [ ]
                              },
                              {
                                "value" : 1.2,
                                "description" : "k1, term saturation parameter",
                                "details" : [ ]
                              },
                              {
                                "value" : 0.75,
                                "description" : "b, length normalization parameter",
                                "details" : [ ]
                              },
                              {
                                "value" : 3.0,
                                "description" : "dl, length of field",
                                "details" : [ ]
                              },
                              {
                                "value" : 2.0,
                                "description" : "avgdl, average length of field",
                                "details" : [ ]
                              }
                            ]
                          }
                        ]
                      }
                    ]
                  }
                ]
              }
            ]
          },
          {
            "value" : 3.4028235E38,
            "description" : "maxBoost",
            "details" : [ ]
          }
        ]
      }
    ]
  }
}
What the Hit Would Normally Look Like Without the Custom Explanation
{
  "_shard" : "[test][0]",
  "_node" : "Fg3ne_PXSi2ljhqljA6QOQ",
  "_index" : "test",
  "_id" : "2",
  "_score" : 0.5685853,
  "_source" : {
    "important_field" : "foo foo foo"
  },
  "_explanation" : {
    "value" : 0.5685853,
    "description" : "function score, product of:",
    "details" : [
      ...
      {
        "value" : 3.0,
        "description" : "min of:",
        "details" : [
          {
            "value" : 3.0,
            "description" : "script score function, computed with script:\"Script{type=inline, lang='expert_scripts', idOrCode='pure_df', options={}, params={field=important_field, term=foo}}\"",
            "details" : [
              {
                "value" : 0.18952844,
                "description" : "_score: ",
                "details" : [
                  {
                    "value" : 0.18952844,
                    "description" : "weight(important_field:foo in 1) [PerFieldSimilarity], result of:",
                    "details" : [
                      {
                        "value" : 0.18952844,
                        "description" : "score(freq=3.0), computed as boost * idf * tf from:",
                        "details" : [
                          {
                            "value" : 2.2,
                            "description" : "boost",
                            "details" : [ ]
                          },
                          {
                            "value" : 0.13353139,
                            "description" : "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:",
                            "details" : [
                              {
                                "value" : 3,
                                "description" : "n, number of documents containing term",
                                "details" : [ ]
                              },
                              {
                                "value" : 3,
                                "description" : "N, total number of documents with field",
                                "details" : [ ]
                              }
                            ]
                          },
                          {
                            "value" : 0.6451613,
                            "description" : "tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:",
                            "details" : [
                              {
                                "value" : 3.0,
                                "description" : "freq, occurrences of term within document",
                                "details" : [ ]
                              },
                              {
                                "value" : 1.2,
                                "description" : "k1, term saturation parameter",
                                "details" : [ ]
                              },
                              {
                                "value" : 0.75,
                                "description" : "b, length normalization parameter",
                                "details" : [ ]
                              },
                              {
                                "value" : 3.0,
                                "description" : "dl, length of field",
                                "details" : [ ]
                              },
                              {
                                "value" : 2.0,
                                "description" : "avgdl, average length of field",
                                "details" : [ ]
                              }
                            ]
                          }
                        ]
                      }
                    ]
                  }
                ]
              }
            ]
          },
          {
            "value" : 3.4028235E38,
            "description" : "maxBoost",
            "details" : [ ]
          }
        ]
      }
    ]
  }
}

@john-wagster john-wagster added v8.15.1 :Search Relevance/Search Catch all for Search Relevance labels Aug 12, 2024
@elasticsearchmachine elasticsearchmachine added v8.16.0 Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch labels Aug 12, 2024
@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-search-relevance (Team:Search Relevance)

@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Hi @john-wagster, I've created a changelog YAML for you.

leafScript._setIndexName(indexName);
leafScript._setShard(shardId);
return new LeafScoreFunction() {
ScoreScript.ExplanationHolder explanation = new ScoreScript.ExplanationHolder();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably only want to create a new ExplanationHolder if explain is true for this query.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense to me; I'll poke around to see if I can figure out how to get that param down to that level

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So one problem with this is that if the plugin attempts to set an explanation even if explain isn't set to true then we will still throw an NPE. So it may be problematic to conditionally create an ExplanationHolder. I could create a dummy one at a higher level and pass that in, which would just be a no-op. Thoughts on this are welcome.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pushed up a change for this; would love some feedback on whether what I've done to both pass down that explain param and provide a dummy placeholder are reasonable.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe with the refactoring I've just done this is no longer a concern; let me know if you disagree for any reason @jdconrad

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're correct that it requires a null check in the script. I should have added a bit more context. In our docs, our example has a null check on the explanation holder for this reason. It's not ideal to have the user do this, but it seemed like the better trade off at the time rather than creating possibly millions of additional objects for a script that won't use them.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's my miss. The docs mentioned this as did Ben in the issue raised and I missed that on my initial pass at this.

);
queryBoost = request.indexBoost();
this.lowLevelCancellation = lowLevelCancellation;
SearchSourceBuilder builder = request.source();
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this is a reasonable expectation that source will be available when we expect to check for the explain flag.

this.lowLevelCancellation = lowLevelCancellation;
SearchSourceBuilder builder = request.source();
if (builder != null) {
Boolean requestExplain = request.source().explain();
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seemed like the most logic way to get explain from the original query but would be good to validate this.

Copy link
Copy Markdown
Member

@benwtrent benwtrent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is overcomplicated. It can be simplified with minimal changes.

public double execute(
ExplanationHolder explanation
) {
explanation.set("An example optional custom description to explain details for this script's execution; we'll provide a default one if you leave this out.");
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these should check for null. These will be null when explain: false


public class ScriptScoreFunction extends ScoreFunction {

private static final ScoreScript.ExplanationHolder DUMMY_EXPLAIN_HOLDER = new ScoreScript.ExplanationHolder();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

underlying folks should check for null. Passing this in implies that explain is occurring. It might be the script's plugin explanation takes compute power, we need to be able to signal that explain actually isn't occurring.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, we shouldn't have this dummy holder.

scorer.docid = docId;
scorer.score = subQueryScore;
double result = leafScript.execute(null);
double result;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know why we are updating the score here at all. Doesn't this work with explainScore ? And if explainScore is being called, we know we are "explaining" and thus should create a ScoreScript.ExplanationHolder ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes a lot of sense; I appreciate all your comments / direction. I'll come back with changes around all of your comments.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe all of your concerns have been addressed in the lastest push; let me know if you have additional concerns at this point @benwtrent

@@ -85,9 +110,13 @@ public Explanation explainScore(int docId, Explanation subQueryScore) throws IOE
} else {
double score = score(docId, subQueryScore.getValue().floatValue());
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What we should do here, is since we know explainScore means we are doing an explain call, we should handle the logic for creating a ScoreScript.ExplanationHolder here.

We should add a new public double score(int docId, float subQueryScore, ExplanationHolder holder) throws IOException { that both score and explainScore use where explainScore passes in a non-null holder, and scorer passes in null. This way we only create this object if we are explaining.

Comment on lines +728 to +729
public void explain(boolean explain) {
this.explain = explain;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't need all this. All this logic is already handled, we just need to take advantage of it. Let's remove all these context updates.

ScoreScript.Factory factory = context.compile(script, ScoreScript.CONTEXT);
SearchLookup lookup = context.lookup();
ScoreScript.LeafFactory searchScript = factory.newFactory(script.getParams(), lookup);
return new ScriptScoreFunction(script, searchScript, lookup, context.index().getName(), context.getShardId());
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again, not necessary.

);
queryBoost = request.indexBoost();
this.lowLevelCancellation = lowLevelCancellation;
SearchSourceBuilder builder = request.source();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not necessary. explainScore is called already, which tells us to explain things. Thus we should use that as a signal to do so.

QueryBuilder rewrittenForInnerHits = Rewriteable.rewrite(query, innerHitsRewriteContext, true);
InnerHitContextBuilder.extractInnerHits(rewrittenForInnerHits, innerHitBuilders);
searchExecutionContext.setAliasFilter(context.request().getAliasFilter().getQueryBuilder());
searchExecutionContext.explain(context.explain());
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unnecessary.

Comment on lines 81 to 87
if (leafScript instanceof ExplainableScoreScript) {
leafScript.setDocument(docId);
scorer.docid = docId;
scorer.score = subQueryScore.getValue().floatValue();
exp = ((ExplainableScoreScript) leafScript).explain(subQueryScore);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure why this was removed, but it should be added back.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, you just beat me to it; that's why the build was failing. Misunderstanding on my part. It's added back now.

rest_total_hits_as_int: true
index: test
body:
explain: true
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for not mentioning this before, but we should have two tests here. One with explain and the other without.

The one with explain & your new assertion will likely need to skip old cluster versions (cluster_features: gte_v8.16.0` (or something to that effect).).

Then when backport to 8.15.1, that skip logic will need to be updated to account for 8.15.1.

Then once the backport is merged, main will need to be updated so that it tests against 8.15.1 as well.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense; i'll update the test and track this as it's backported. Thanks!

…ces/rest-api-spec/test/script_expert_scoring/20_score.yml

Co-authored-by: Benjamin Trent <ben.w.trent@gmail.com>
@john-wagster john-wagster merged commit 935c0e4 into elastic:main Aug 13, 2024
john-wagster added a commit to john-wagster/elasticsearch that referenced this pull request Aug 13, 2024
allowing for a custom explanation to be passed through as part of supporting building a plugin with a custom script score; previously threw an npe
@john-wagster
Copy link
Copy Markdown
Contributor Author

PR to backport to 8.15.1: #111864

john-wagster added a commit that referenced this pull request Aug 15, 2024
* Explain Function Score Query (#111807)

allowing for a custom explanation to be passed through as part of supporting building a plugin with a custom script score; previously threw an npe

* updated test for 8.15.1
@john-wagster
Copy link
Copy Markdown
Contributor Author

ported the test update for 8.15.1 forward into main: #111929

cbuescher pushed a commit to cbuescher/elasticsearch that referenced this pull request Sep 4, 2024
allowing for a custom explanation to be passed through as part of supporting building a plugin with a custom script score; previously threw an npe
davidkyle pushed a commit to davidkyle/elasticsearch that referenced this pull request Sep 5, 2024
allowing for a custom explanation to be passed through as part of supporting building a plugin with a custom script score; previously threw an npe
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>bug :Search Relevance/Search Catch all for Search Relevance Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch v8.15.1 v8.16.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants