Currently the ranking evaluation API accepts the full query syntax for the queries specified in the test set and executes them as a multi search. This potentially runs costly aggregations and suggestions too. Since I think aggregations, suggestions and highlighting don't play a role in search evaluation, we ignore and remove those parts of the query if they get accidentally added in order to reduce its cost.
I'm not sure yet if it would be better to throw an error in these cases or just silently ignore the parts
that are irrelevent in the context of this API.
Currently the ranking evaluation API accepts the full query syntax for the queries specified in the test set and executes them as a multi search. This potentially runs costly aggregations and suggestions too. Since I think aggregations, suggestions and highlighting don't play a role in search evaluation, we ignore and remove those parts of the query if they get accidentally added in order to reduce its cost.
I'm not sure yet if it would be better to throw an error in these cases or just silently ignore the parts
that are irrelevent in the context of this API.