Skip to content

“totalTermFreq must be at least docFreq” error after upgrading to 7.0.1 #41934

@TheRealChrisS

Description

@TheRealChrisS

Elasticsearch version (bin/elasticsearch --version):
Version: 7.0.1, Build: default/tar/e4efcb5/2019-04-29T12:56:03.145736Z, JVM: 1.8.0_202

Plugins installed:
Just the plugins which are shipped with ES

JVM version (java -version):
openjdk version "1.8.0_202"
OpenJDK Runtime Environment (AdoptOpenJDK)(build 1.8.0_202-b08)
OpenJDK 64-Bit Server VM (AdoptOpenJDK)(build 25.202-b08, mixed mode)

OS version (uname -a if on a Unix-like system):
Linux 3.10.0-514.21.1.el7.x86_64 #1 SMP Thu May 25 17:04:51 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

Description of the problem including expected versus actual behavior:
After upgrading from 6.7.1 I get lots of “totalTermFreq must be at least docFreq” errors.

Steps to reproduce:

I don't have a simple example of how to reproduce because it only happens when I index tons of data. Until now I wasn't able to reproduce it with just a few documents.

But I'll share what I've figured out so far.
Also see the following discussion, there a two other guys encountering the same problem:
https://discuss.elastic.co/t/totaltermfreq-must-be-at-least-docfreq-error-after-upgrading-to-7-0-1/179977

I search using multi-match queries with cross_fields. Changing it to best_fields helps, but is not what I want.

In my environment it seems that array fields are causing the problem.

I've got a mapping like:

{
	"properties": {
		"description": {
			"type": "text",
			"similarity": "custom_similarity",
			"term_vector" : "with_positions_offsets",
			"analyzer": "standard_analyzer",
			"search_analyzer": "standard_search_analyzer",
			"fields": {
				"ngram": {
					"type": "text",
					"similarity": "custom_similarity",
					"analyzer": "ngram_analyzer",
					"search_analyzer": "standard_search_analyzer"
				},
				"edge_ngram_prefix": {
					"type": "text",
					"similarity": "custom_similarity",
					"analyzer": "edge_ngram_1_analyzer",
					"search_analyzer": "standard_search_analyzer"
				}
			}
		},
		"tags": {
			"type": "text",
			"similarity": "custom_similarity",
			"analyzer": "standard_analyzer",
			"search_analyzer": "standard_search_analyzer",
			"fields": {
				"keyword": {
					"type": "keyword",
					"ignore_above": 256
				}
			}
		}
	}
}

And I post data like:

POST /some_index/_doc
{
	"description": "Some description",
	"tags": ["foo", "bar", "foo bar"]
}

When I search in both fields the query fails.

The query looks like:

{  
   "from":0,
   "size":100,
   "query":{  
      "bool":{  
         "must":[  
            {  
               "function_score":{  
                  "query":{  
                     "bool":{  
                        "must":[  
                           {  
                              "function_score":{  
                                 "query":{  
                                    "multi_match":{  
                                       "query":"foo",
                                       "fields":[  
                                          "description^3.0",
                                          "description.edge_ngram_prefix^0.90000004",
                                          "description.ngram^0.6",
                                          "tags^1.0"
                                       ],
                                       "type":"cross_fields",
                                       "operator":"AND",
                                       "slop":0,
                                       "prefix_length":0,
                                       "max_expansions":50,
                                       "tie_breaker":0.05,
                                       "zero_terms_query":"NONE",
                                       "auto_generate_synonyms_phrase_query":true,
                                       "fuzzy_transpositions":true,
                                       "boost":1.0
                                    }
                                 },
                                 "functions":[  
                                    {  
                                       "filter":{  
                                          "match_all":{  
                                             "boost":1.0
                                          }
                                       },
                                       "field_value_factor":{  
                                          "field":"boost",
                                          "factor":1.0,
                                          "modifier":"none"
                                       }
                                    }
                                 ],
                                 "score_mode":"sum",
                                 "max_boost":3.4028235E38,
                                 "boost":1.0
                              }
                           }
                        ],
                        "adjust_pure_negative":true,
                        "boost":1.0
                     }
                  }
               }
            }
         ],
         "adjust_pure_negative":true,
         "boost":1.0
      }
   }
}

When I remove all those array fields from the query I do no longer get this error.

So problem seems to have something to do with the cross_fields option and array fields.

Let me know if you need more details, logs, etc.

Edit:
I also deleted and reindexed the data. But that didn't help, also.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions