-
Notifications
You must be signed in to change notification settings - Fork 25.8k
Relax field types supported by _termvectors API #31902
Description
Problem background
In adding a plugin for a new field type I've discovered a few places in core where we have hard-coded assumptions about what represents a searchable text field type:
- "All" queries expansion (work in progress here: Change the way that fields are selected for inclusion in
allqueries. #31655) - MLT queries
- TermVectors service
Proposed change
An open question is if all of these problems can be addressed with a new common field-checking service. For now, this issue just relates to solving 3) and the proposed change is to switch the code in TermVectorsService which has :
// must be a string
if (fieldType instanceof KeywordFieldMapper.KeywordFieldType == false
&& fieldType instanceof TextFieldMapper.TextFieldType == false) {
return false;
}
to this broader base class test:
if (fieldType instanceof KeywordFieldMapper.KeywordFieldType == false
&& fieldType instanceof StringFieldType == false) {
return false;
}
Side effect (desirable or undesirable?)
A side effect of this change is that other forms of indexed strings can now return termvectors info e.g. given this mapping:
PUT test
{
"mappings": {
"doc": {
"properties": {
"text": {
"type": "text",
"index_prefixes": {},
"index_phrases": true
...
and this call to the modified termvectors API:
GET /test/doc/_termvectors
{
"doc": {
"text": "my example text"
},
"term_statistics" : true,
"field_statistics" : true,
"positions": false,
"offsets": false,
"filter" : {
"max_num_terms" : 3,
"min_term_freq" : 1,
"min_doc_freq" : 1
}
}
..the new version of this API would also return termvectors for the text._index_phrase and text._index_prefix as well as the usual text field. The max_num_terms setting applies to each indexed field so in this case is 3 terms multiplied by 3 fields = 9 terms.
It's not clear if this change in behaviour is useful or not (opinions, @romseygeek ?)