Security for _field_names field should not override field statistics#33261
Merged
jimczi merged 5 commits intoelastic:masterfrom Sep 3, 2018
Merged
Security for _field_names field should not override field statistics#33261jimczi merged 5 commits intoelastic:masterfrom
jimczi merged 5 commits intoelastic:masterfrom
Conversation
In Lucene 8 the statistics for a field (doc_count, sum_doc_count, ...) are checked and invalid values (v < 0) are rejected. Though for the _field_names field we hide the statistics of the field if security is enabled since some terms (field names) may be filtered. However this statistics are never used, this field is not used for ranking and cannot be used to generate term vectors. For these reasons this commit restores the original statistics for the field in order to be compliant with Lucene 8.
Collaborator
|
Pinging @elastic/es-search-aggs |
jpountz
reviewed
Aug 31, 2018
| /** An automaton that only accepts authorized fields. */ | ||
| private final CharacterRunAutomaton filter; | ||
| /** {@link Terms} cache with filtered stats for the {@link FieldNamesFieldMapper} field. */ | ||
| private Terms fieldNamesFilterTerms; |
| class FieldNamesTerms extends FilterTerms { | ||
| long size = 0; | ||
| long sumDocFreq; | ||
| int docCount; |
Contributor
There was a problem hiding this comment.
can we make them final somehow?
| while (e.next() != null) { | ||
| size ++; | ||
| sumDocFreq += e.docFreq(); | ||
| docCount = Math.max(e.docFreq(), docCount); |
Contributor
There was a problem hiding this comment.
I don't think this is correct... Maybe we should assume docCount = maxDoc.
Contributor
Author
There was a problem hiding this comment.
oups thanks, I changed it to return maxDoc instead
Contributor
Author
Contributor
Author
|
run gradle build tests |
jpountz
approved these changes
Aug 31, 2018
jasontedor
added a commit
to jasontedor/elasticsearch
that referenced
this pull request
Sep 3, 2018
* master: (197 commits) Prevent NPE parsing the stop datafeed request. (elastic#33347) HLRC: Add ML get overall buckets API (elastic#33297) Core: Fix epoch millis java time formatter (elastic#33302) [Docs] Improve tuning for speed advice (elastic#33315) [Rollup] Fix Caps Comparator to handle calendar/fixed time (elastic#33336) [CI] Mute IndexShardTests#testIndexCheckOnStartup fails elastic#33345 [CI] Mute LuceneChangesSnapshotTests#testUpdateAndReadChangesConcurrently Security for _field_names field should not override field statistics (elastic#33261) Add early termination support to BucketCollector (elastic#33279) Fix extractjar task ci (elastic#33272) Mute testFollowIndexAndCloseNode Logging: Drop Settings from some logging ctors (elastic#33332) HLREST: add update by query API (elastic#32760) TEST: Increase timeout testFollowIndexAndCloseNode (elastic#33333) HLRC: ML Flush job (elastic#33187) HLRC: Adding ML Job stats (elastic#33183) LLREST: Drop deprecated methods (elastic#33223) Mute testSyncerOnClosingShard [DOCS] Moves machine learning APIs to docs folder (elastic#31118) Mute test watcher usage stats output ...
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
In Lucene 8 the statistics for a field (doc_count, sum_doc_count, ...) are
checked and invalid values (v < 0) are rejected. Though for the _field_names
field we hide the statistics of the field if security is enabled since
some terms (field names) may be filtered. However this statistics are never
used, this field is not used for ranking and cannot be used to generate
term vectors. For these reasons this commit restores the original statistics
for the field in order to be compliant with Lucene 8.