[Search Pipelines] Add a processor that provides fine-grained control over what queries are allowed

**Is your feature request related to a problem? Please describe.**
Years ago, I worked on a search service where my team would vend search capabilities to other teams around the company. Since my team would be woken up in the middle of the night if/when people sent very expensive queries that would use all CPU or memory, we built functionality to block any remotely-expensive query by default. We'd just throw an exception saying, "We are not going to let you run that kind of query unless you talk with us and explain your use-case." Based on the particular use-case, we would usually suggest a different, better way to solve the give problem, but sometimes we would allow e.g. a wildcard query **on one or more specific fields**.

OpenSearch has the `search.allow_expensive_queries` cluster setting, but that's basically a giant on/off switch that lacks any nuance.

**Describe the solution you'd like**
I would like to be able to configure what query types (and possibly other features, like aggregations) are allowed, on what fields, and under what circumstances. Ideally, I'd like even more specificity than the safeguards component that my old team built (which was limited to query types or features on individual fields). I'd like to be able to say "Prefix queries are okay on this field, but only if the prefix has length 3 or more". 

I think a search pipeline SearchRequestProcessor would be the ideal place to configure something like that. 

Essentially, I'm thinking we could define a chain of rules for each query type like:

```
# Hand-wavy brainstorm idea for defining rules to restrict query/aggregation types.
# Whatever we build will probably look different. Suggestions welcome!

PUT /_search/pipeline/allowed_queries_pipeline
{
  "request_processors": [
    {
      "restrict_queries" : {
        "query_types": {
          "prefix" : [
            {
              "field": "foo",
              "value_length" : ">=3",
              "allow": true
            },
            {
              "field": "bar",
              "value_length" : ">=5",
              "allow": true
            },
            {
              "deny" : true
            }
          ],
          "wildcard": [
            {
              "field" : "low_cardinality_field",
              "allow": true
            },
            { "deny" : true }
          ],
          "range": [
            {
              "field" : "timestamp",
              "range_width" : "> 604800000",
              "deny" : true
            }
          ]
        },
        "aggregation_types": {
          "variable_width_histogram: [
            {
              "num_buckets": "> 5"
              "deny" : true
            },
            { "allow":true}
          ]
        }
    }
  ]
}
```

We would make use of QueryBuilderVisitor to traverse the query tree. I'm imagining that we would essentially apply the rules sequentially to each visited query -- as soon as an "allow" rule matches, we proceed; if a "deny" rule matches, we fail with a 4xx error explaining what rule was violated.

I like this approach because the "rule language" can start simpler than what is described above and evolve based on need.

**Describe alternatives you've considered**
We could do something much smaller like turning `search.allow_expensive_queries` into an index-level setting without much code change. Even reworking it to act per-field wouldn't be a huge burden (though the current "allowExpensiveQuery" checks would need to be updated to check per-field). I don't see how it would be practical to implement something like the property-based checks or checks on aggregations.

Instead of using search pipelines, we could define a whole new API for query restriction rules (maybe in a plugin), but then we would need the full set of CRUD operations on these rules. Since this is logic that we would likely want to run on the coordinator, before fanning out to shards, it feels like search pipelines already provide a pretty good persistence API. 

**Additional context**
Note that a SearchRequestProcessor that implements this functionality could be delivered via a plugin. It doesn't need to be core OpenSearch functionality. (It could theoretically start in a plugin and move into the search-pipeline-common module.)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Search Pipelines] Add a processor that provides fine-grained control over what queries are allowed #10938

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Search Pipelines] Add a processor that provides fine-grained control over what queries are allowed #10938

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions