Add doc_count field mapper#58339
Add doc_count field mapper#58339csoulios wants to merge 53 commits intoelastic:feature/aggregate-metricsfrom
Conversation
|
Pinging @elastic/es-analytics-geo (:Analytics/Aggregations) |
|
Pinging @elastic/es-search (:Search/Mapping) |
There was a problem hiding this comment.
Should we make this a metadata field _doc_count instead of a standard field type? It could be a nice fit, because it's more of an internal field rather than something we expect users to configure. Also we don't want multiple fields of type doc_count to be defined.
polyfractal
left a comment
There was a problem hiding this comment.
LGTM, although my mapper-skills are relatively poor so take it with a grain of salt. :)
Left a few comments about validation/tests but not critical (and might not be relevant). Exciting!
|
|
||
| [IMPORTANT] | ||
| ======== | ||
| * A `_doc_count` field can only store a single positive integer per document. Nested arrays are not allowed. |
There was a problem hiding this comment.
Random thought while reading the restrictions, is it possible to define _doc_count as an object? We should forbid that as well if it isn't already... but i suspect the current restrictions prevent it from being an object too.
There was a problem hiding this comment.
There is an assertion that the input is a VALUE_NUMBER in the parseCreateField() method.
Is there anything else that should be added?
.../org/elasticsearch/search/aggregations/bucket/terms/GlobalOrdinalsStringTermsAggregator.java
Outdated
Show resolved
Hide resolved
| import static java.util.Collections.singleton; | ||
|
|
||
|
|
||
| public class FieldBasedDocCountProviderTests extends AggregatorTestCase { |
There was a problem hiding this comment.
Is it possible to filter or aggregate on the _doc_counts field itself...I'm guessing it's not possible? Should we add some tests to make sure that's handled politely if a user tries to search/agg it?
There was a problem hiding this comment.
+1, it's common to have a test case like DocCountFieldTypeTests that checks query and field data construction works (or throws an error) as expected.
There was a problem hiding this comment.
I have added test DocCountFieldTypeTests that tests range, term and exists queries.
Also, DocCountFieldType extends the MappedFieldType class that by default throws an exception when calling the fielddataBuilder(). Does this cover aggregations?
jtibshirani
left a comment
There was a problem hiding this comment.
It looks good to me overall, I left some minor comments.
server/src/main/java/org/elasticsearch/index/mapper/DocCountFieldMapper.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/index/mapper/DocCountFieldMapper.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/index/mapper/DocCountFieldMapper.java
Outdated
Show resolved
Hide resolved
Cleaned up/simplified DocCountProvider class
|
@elasticmachine run elasticsearch-ci/bwc |
| try { | ||
| docCountValues = DocValues.getNumeric(ctx.reader(), DocCountFieldMapper.NAME); | ||
| } catch (IOException e) { | ||
| docCountValues = null; |
There was a problem hiding this comment.
I wonder if we should be swallowing an IOException and treating it as if there was no doc count field. What case does this logic address?
There was a problem hiding this comment.
Initially, I felt it was better to swallow the error and fallback to doc_count=1. Now, it feels more transparent to rethrow the exception.
I am not sure what can go wrong there and I don't have a strong opinion about this.
There was a problem hiding this comment.
I agree we should propagate it, an IOException could indicate something serious and it's best not to silently continue on. Perhaps this method could just declare throws IOException.
|
This PR is closed and opened as new PR (#64503) against master branch. |
| import java.io.IOException; | ||
|
|
||
| /** | ||
| * An implementation of a doc_count provider that reads the value |
There was a problem hiding this comment.
Nit: Javadoc still reflects having an interface
| } | ||
| } | ||
|
|
||
| public void setLeafReaderContext(LeafReaderContext ctx) throws IOException { |
There was a problem hiding this comment.
Why isn't this just a constructor argument? Under what conditions would it be correct to use a DocCountProvider without setting the LeafReaderContext?
Bucket aggregations compute bucket
doc_countvalues by incrementing thedoc_countby 1 for every document collected in the bucket.When using summary fields (such as
aggregate_metric_double) one field may represent more than one document. To provide this functionality we have implemented a new field mapper (nameddoc_countfield mapper). This field is a positive integer representing the number of documents aggregated in a single summary field.Bucket aggregations will check if a field of type
doc_countexists in a document and will take this value into consideration when computing doc counts.