Add FieldCapabilities (_field_caps) API#23007
Add FieldCapabilities (_field_caps) API#23007jimczi merged 4 commits intoelastic:masterfrom jimczi:field_caps
Conversation
|
Thanks @jimczi I have a couple of questions:
@spalger any thought about the above? |
For consistency yes. I'll fix this.
I thought that putting fields on top would simplify the navigation. I can switch back to a map keyed by index, not sure that it'd produce far fewer objects. It all depends on the number of indices vs number of fields. |
The typical case is requesting But before you change this back, I'd like to hear from @spalger about which access pattern makes more sense to him |
|
Looks nice @jimczi In regards to the {
"fields": {
"field1": {
"type": "string",
"searchable": true,
"aggregatable": true
},
"field2": {
"type": "conflict",
"type_indices": {
"keyword": ["t"],
"long": ["v"]
},
// should these still be true? Is a field that is both keyword and long really aggregatable?
"searchable": true,
"aggregatable": true
}
}
}re |
|
Also, I'm pretty sure we've decided not to go this route in the past, because consistency, but I'd like to reiterate my desire for for lists of objects as arrays rather than maps, like this: {
"fields": [
{
"name": "field1",
// ...
}
]
} |
I am on the fence regarding this. In most cases it's not but what if you use a script or whatever agg that can handle this ? For searching capability it's the same so I think it's more important to have good error messages when an agg or a query is conflicting rather than forbidding the capability completely.
The information is available so it could look like this:
What about map with consistent order ;) ? |
Yes consistency :) Can you explain why you would prefer an array? |
In this context it's mostly because the objects in the map are incomplete without the value they are keyed by, so our iteration logic of responses like this usually goes something like:
More generally though, my primary motivator is that the meaning of the keys in these maps is not described by the text alone. Only by knowing the API can know really know what the keys actually are. In the search/aggregation requests/responses for instance, some of the keys are for the type of query/aggregation, some are field names, some are user-generated id's, it's quite ambiguous. |
That's my preference. The request is executed with a specific list of indices and I think the merged value should represent the aggregatable-ness of that specific list (not a subset of that list).
A script agg isn't necessarily field dependent. In my opinion it's script dependent, and knowing whether a script is aggregatable is obviously a very different problem
However we present it, if we have the information I would prefer that it was included in the response |
If the above comment refers to the general case where there is no field conflict, then: If the comment refers only to the conflicting case, then I'm on the fence. Searches may still work but aggregations are much trickier and much more likely to fail. I like the syntax that @jimczi suggests in #23007 (comment) as it gives you all the information you need. The only downside is that there is no explicit |
|
I was talking about the conflict case: if I asked for the field capabilities of
Kibana is just going to treat conflicting fields with multiple types as not-aggregatable, regardless of what the API says, but I don't think the API should say that fields are aggregatable or searchable when that's the meaning behind it. |
I think this is problematic since multiple types are not necessary not-aggregatable. This is the problem with
Ok I can flip the logic to do that. For |
|
@spalger after some discussion in FixItFriday today, we've decided to go with the format suggested by @jimczi in #23007 (comment) as it is the most usable way we can present the required info. For The question arises when you have a field that has So the question is: should this field be marked as aggregatable or not? |
|
@sophiec20 Pinging you for input on this API as your team will be using it too |
|
@dimitris-athanasiou please review. |
I'm guessing that we all agree that the API should be correct as possible, but we don't seem to agree if we should treat I'm still leaning toward
|
OK, let's do that |
|
Looks good from ML side of things. |
|
@jimczi do you think it would be reasonable to include details like |
IMO no, the way we decide if the field is aggregatable or searchable depends on the field type. Some fields are searchable even though they have ... when |
|
@jimczi this looks great, do you think the response is summarized correctly below?
|
Only because of the field configuration in index "t". We have searchable and aggregatable for each type so the conflict is not taken into account. For Kibana you can consider every field with multiple types as conflicting but that should not be reflected in fieldcaps IMO |
|
Right, I'm suggesting that as how we could represent it to users in Kibana. Thanks! |
|
Another "capability" that we currently look at the mappings for is "sortable". Is aggregatable == sortable, or would these two attributes ever diverge in the future? If so would it be possible to also add a |
|
Oops, good catch @Bargs |
This change introduces a new API called `_field_caps` that allows to retrieve the capabilities of specific fields.
This field centric API relies solely on the mapping of the requested indices to extract the following infos:
* `types`: one or many field types if the type is not the same across the requested indices
* `searchable`: Whether this field is indexed for search on at least one requested indices.
* `aggregatable`: Whether this field can be aggregated on at least one requested indices.
Example:
````
GET t,v/_field_caps?fields=field1,field2,field3
````
returns:
````
{
"fields": {
"field1": {
"_all": {
"types": "text",
"searchable": true,
"aggregatable": false
}
},
"field3": {
"_all": {
"types": "text",
"searchable": true,
"aggregatable": false
}
},
"field2": {
"_all": {
"types": [
"keyword",
"long"
],
"searchable": true,
"aggregatable": true
}
}
}
}
````
In this example `field1` and `field3` have the same type `text` across the requested indices `t` and `v`.
Conversely `field2` is defined with two conflicting types `keyword` and `long`.
Note that `_field_caps` does not treat this case as an error but rather return the list of unique types seen for this field.
It is also possible to get a view of each index field capabilities with the `level` parameter:
````
GET t,v/_field_caps?fields=field2&level=indices
````
````
{
"fields": {
"field2": {
"t": {
"types": "keyword",
"searchable": true,
"aggregatable": true
},
"v": {
"types": "long",
"searchable": true,
"aggregatable": true
}
}
}
}
````
|
I pushed another iteration that implements the format described in #23007 (comment). |
jpountz
left a comment
There was a problem hiding this comment.
It looks great and clean. I left some questions about places where we use arrays that are logical sets, so maybe we should use sets directly? Also the documentation states clearly what null values mean for the indices arrays, but I think it would help to reiterate it in the code to have this information in context.
|
|
||
| private final String[] indices; | ||
| private final String[] nonSearchableIndices; | ||
| private final String[] nonAggregatableIndices; |
There was a problem hiding this comment.
The builder uses sets that are transformed in arrays when the final object is built.
| this.isAggregatable = isAggregatable; | ||
| this.indices = indices; | ||
| this.nonSearchableIndices = nonSearchableIndices; | ||
| this.nonAggregatableIndices = nonAggregatableIndices; |
There was a problem hiding this comment.
I see from the serialization code that thiese arrays can be null sometimes, is there any validation we should do, eg. I suspect that if one array is not null then other arrays should not be null either?
There was a problem hiding this comment.
Not necessarily. indices is null if all indices have the same type for the field, nonSearchableIndices is null only if all indices are either searchable or non-searchable and nonAggregatableIndices is null if all indices are either aggregatable or non-aggregatable.
| } | ||
|
|
||
| /** | ||
| * The types of the field. |
| FieldCapabilitiesIndexRequest(FieldCapabilitiesRequest request, String index) { | ||
| super(index); | ||
| Set<String> fields = new HashSet<>(); | ||
| fields.addAll(Arrays.asList(request.fields())); |
There was a problem hiding this comment.
let's pass the list as a constructor arg of the HashSet?
| public class FieldCapabilitiesIndexRequest | ||
| extends SingleShardRequest<FieldCapabilitiesIndexRequest> { | ||
|
|
||
| private String[] fields; |
There was a problem hiding this comment.
should we store it as a set?
| this.responseMap = responseMap; | ||
| } | ||
|
|
||
| FieldCapabilitiesResponse() { |
There was a problem hiding this comment.
can you leave a comment that this ctor should only be used for serialization?
| String[] concreteIndices = | ||
| indexNameExpressionResolver.concreteIndexNames(clusterState, request); | ||
| final AtomicInteger indexCounter = new AtomicInteger(); | ||
| final AtomicInteger completionCounter = new AtomicInteger(concreteIndices.length); |
There was a problem hiding this comment.
Could we just use a single atomic int for both purposes? eg. we could consider things are done when indexCounter.getAndIncrement returns concreteIndices.length-1?
| for (int i = 0; i < indexResponses.length(); i++) { | ||
| Object element = indexResponses.get(i); | ||
| if (element instanceof FieldCapabilitiesIndexResponse == false) { | ||
| continue; |
There was a problem hiding this comment.
can you assert it is an exception?
|
|
||
| @Override | ||
| protected FieldCapabilitiesIndexResponse | ||
| shardOperation(final FieldCapabilitiesIndexRequest request, |
There was a problem hiding this comment.
weird to go to a new line before the method name?
| shardOperation(final FieldCapabilitiesIndexRequest request, | ||
| ShardId shardId) { | ||
| MapperService mapperService = | ||
| indicesService.indexService(shardId.getIndex()).mapperService(); |
There was a problem hiding this comment.
should it use indexServiceSafe?
|
Thanks @jpountz
@Bargs @spalger if the field is @clintongormley next step is to deprecate the FieldStats API in 5.x and remove it in 6? |
|
@jimczi deprecate it, but let's not remove it from 6.0 just yet |
Cherry picked from a8250b2 This change introduces a new API called `_field_caps` that allows to retrieve the capabilities of specific fields. Example: ```` GET t,s,v,w/_field_caps?fields=field1,field2 ```` ... returns: ```` { "fields": { "field1": { "string": { "searchable": true, "aggregatable": true } }, "field2": { "keyword": { "searchable": false, "aggregatable": true, "non_searchable_indices": ["t"] "indices": ["t", "s"] }, "long": { "searchable": true, "aggregatable": false, "non_aggregatable_indices": ["v"] "indices": ["v", "w"] } } } } ```` In this example `field1` have the same type `text` across the requested indices `t`, `s`, `v`, `w`. Conversely `field2` is defined with two conflicting types `keyword` and `long`. Note that `_field_caps` does not treat this case as an error but rather return the list of unique types seen for this field.
|
🎉 🎉 💃 🎉 🎉 |
|
I like the API feature-wise, but can we reconsider the name? What does "caps" in "field_caps" stands for, "capabilities"? I understand the need to make the name short, but this is rather opaque? If the API will replace the "Field Stats" API, what if we just use |
| } | ||
| }, | ||
| "params": { | ||
| "fields": { |
There was a problem hiding this comment.
@jimczi, I think this parameter should be set as required? Getting illegal_argument_exception, "specified fields can't be null or empty" without it.
There was a problem hiding this comment.
It's not required because you can use the body of the request to set the fields to retrieve.
See the body section at the end of the spec.
There was a problem hiding this comment.
That's right, but the question then is how to document this in the spec, when either the fields attribute or the body is required? /cc @clintongormley
There was a problem hiding this comment.
I think that's OK - elasticsearch will complain
This change introduces a new API called
_field_capsthat allows to retrieve the capabilities of specific fields.This field centric API relies solely on the mapping of the requested indices to extract the following infos:
types: one or many field types if the type is not the same across the requested indicessearchable: Whether this field is indexed for search on at least one requested indices.aggregatable: Whether this field can be aggregated on at least one requested indices.Example:
returns:
In this example
field1andfield3have the same typetextacross the requested indicestandv.Conversely
field2is defined with two conflicting typeskeywordandlong.Note that
_field_capsdoes not treat this case as an error but rather return the list of unique types seen for this field.It is also possible to get a view of each index field capabilities with the
levelparameter:Closes #22438 (comment)