-
Notifications
You must be signed in to change notification settings - Fork 25.8k
Remove lowercase_expanded_terms option #9978
Copy link
Copy link
Labels
:Search Relevance/AnalysisHow text is split into tokensHow text is split into tokens>enhancementTeam:Search RelevanceMeta label for the Search Relevance team in ElasticsearchMeta label for the Search Relevance team in Elasticsearchhelp wantedadoptmeadoptmehigh hanging fruit
Metadata
Metadata
Assignees
Labels
:Search Relevance/AnalysisHow text is split into tokensHow text is split into tokens>enhancementTeam:Search RelevanceMeta label for the Search Relevance team in ElasticsearchMeta label for the Search Relevance team in Elasticsearchhelp wantedadoptmeadoptmehigh hanging fruit
Type
Fields
Give feedbackNo fields configured for issues without a type.
for the record, i think we should remove this lowercasing option completely in parsers, disable it, and let the analysis chain take care. For multitermqueries, its a little tricky, which subset of the filters should be used? For example lowercasing is reasonable, stemming is not.
But lucene already annotates each analyzer component deemed to be "reasonable" for wildcards with a marker interface (MultiTermAwareComponent). Things like lowercasefilter have it and things like stemmers dont have it. This is enough to build a "chain", automatically from the query analyzer, that acts reasonably for multitermqueries.
I know we don't use the lucene factories (es has its own), but we have a table that maps between them, i know because its in a test I wrote. So the information is there :)
All queryparsers have hooks (e.g. factory methods for prefix/wildcard/range) that make it possible to use this, for example solr does it by default, as soon as it did this, people stopped complaining about confusing behavior: both for the unanalyzed, and the analyzed case. it just works.
Sorry for the long explanation.
Compare: