Adding aggregations support for the _ignored field#101373
Adding aggregations support for the _ignored field#101373salvatore-campagna merged 123 commits intoelastic:mainfrom
_ignored field#101373Conversation
_ignored field
|
Documentation preview: |
|
Hi @eyalkoren, I've created a changelog YAML for you. |
| @Override | ||
| public IndexFieldData.Builder fielddataBuilder(FieldDataContext fieldDataContext) { | ||
| if (hasDocValues() == false) { | ||
| throw new IllegalArgumentException( |
There was a problem hiding this comment.
What happens when you're aggregating over a data stream that contains both old and new indices? Will this return a partial failure? How does Kibana/Lens handle the failure?
There was a problem hiding this comment.
I don't know. I will look for a way to add an integration test for that
There was a problem hiding this comment.
Since it's a general question about aggregations on a field in a data stream, I guess we can add a test that is not related to the _ignored field.
Something like:
- create a component template that has a mapping for field
foowith"doc_values": false - create a data stream index template that uses this mapping
- index a document
- update the component template so that the mapping for field
foois changed to"doc_values": true - rollover the data stream
- index another document
- check aggregation on field
foo
Would that cover it?
There was a problem hiding this comment.
you would indeed get partial failures in this case, one failure for each shard belonging to the old indices, as they can't be aggregated on.
server/src/main/java/org/elasticsearch/index/mapper/IgnoredFieldMapper.java
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/index/mapper/IgnoredFieldMapper.java
Outdated
Show resolved
Hide resolved
|
@javanna please take a look to see if this is the right direction.
|
I am leaning towards removing the stored field, but I think this requires further discussion, because it is going to be very difficult to go back to relying on the order if we stop storing the field. We said we won't focus for now on the ignored reason, and I would add that we should try not to have positional logic around it when we do so. Could you make the changes necessary to retrieve the field from doc values and stop storing the field in the meantime?
I am not sure that we need to test for that. We know we are going to have partial failures in that case. |
|
Thanks for the feedback @javanna 🙏
I will give it a try 🙂
OK, then I guess the next question is - what do we do about that? Only document it as a caveat related to aggregating on existing data streams? Something else? |
IMO, it would be enough to document that in the same place where you document that the field is supports aggregations now. |
|
@javanna the current state is non-functional, but I pushed so you can see what I have done so far. What I know is that the |
| "load_source_count": 5 | ||
| }, | ||
| "debug": { | ||
| "stored_fields": ["_id", "_ignored", "_routing", "_source"] |
There was a problem hiding this comment.
This is not rerturned anymore because the field is not stored anymore.
There was a problem hiding this comment.
that makes sense, I was expecting this change.
| { | ||
| GetResponse getResponse = client().prepareGet("test", "1").get(); | ||
| assertTrue(getResponse.isExists()); | ||
| assertThat(getResponse.getField("_ignored"), nullValue()); |
There was a problem hiding this comment.
This change is not required...will restore it.
javanna
left a comment
There was a problem hiding this comment.
I left a couple of comments, but I think this is very close.
| "load_source_count": 5 | ||
| }, | ||
| "debug": { | ||
| "stored_fields": ["_id", "_ignored", "_routing", "_source"] |
There was a problem hiding this comment.
that makes sense, I was expecting this change.
rest-api-spec/src/yamlRestTest/resources/rest-api-spec/test/get/120_stored_fields_ignored.yml
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/index/get/ShardGetService.java
Outdated
Show resolved
Hide resolved
server/src/internalClusterTest/java/org/elasticsearch/action/termvectors/GetTermVectorsIT.java
Show resolved
Hide resolved
|
Ideally we should stop using |
javanna
left a comment
There was a problem hiding this comment.
I left a couple questions, mostly around testing. LGTM otherwise! Great work!
| - gt: { profile.shards.0.fetch.breakdown.load_stored_fields_count: 0 } | ||
| - gt: { profile.shards.0.fetch.breakdown.load_stored_fields: 0 } | ||
| - match: { profile.shards.0.fetch.debug.stored_fields: [_id, _ignored, _routing, _source] } | ||
| - match: { profile.shards.0.fetch.debug.stored_fields: [_id, _routing, _source] } |
There was a problem hiding this comment.
I wonder if with this change, the skip above needs updating? Isn't it surprising that this test runs? 8.14 returns the _ignored field while 8.15 does not?
There was a problem hiding this comment.
I fixed this but the other was ok.
| import java.util.Locale; | ||
| import java.util.Map; | ||
|
|
||
| public class IgnoredMetaFieldRollingUpgradeIT extends ParameterizedRollingUpgradeTestCase { |
There was a problem hiding this comment.
This is a great test to have!
| - gt: { profile.shards.0.fetch.breakdown.load_stored_fields_count: 0 } | ||
| - gt: { profile.shards.0.fetch.breakdown.load_stored_fields: 0 } | ||
| - match: { profile.shards.0.fetch.debug.stored_fields: [_id, _ignored, _routing, _source] } | ||
| - match: { profile.shards.0.fetch.debug.stored_fields: [_id, _routing, _source] } |
There was a problem hiding this comment.
Similar to above, I wonder if we need to update the skip above, that's what I would expect.
There was a problem hiding this comment.
I think I forgot about this commit that already made that change: 19db490
| } | ||
| } | ||
|
|
||
| public void testIgnoredMetadataFieldFetch() { |
There was a problem hiding this comment.
This is a bit of a duplicate of MetadatFetchingIT#testWithIgnored ? Is it needed?
There was a problem hiding this comment.
I missed this comment...I can remove the test in another PR.
| public static final IndexVersion UPGRADE_TO_LUCENE_9_10 = def(8_503_00_0, Version.LUCENE_9_10_0); | ||
| public static final IndexVersion TIME_SERIES_ROUTING_HASH_IN_ID = def(8_504_00_0, Version.LUCENE_9_10_0); | ||
| public static final IndexVersion DEFAULT_DENSE_VECTOR_TO_INT8_HNSW = def(8_505_00_0, Version.LUCENE_9_10_0); | ||
| public static final IndexVersion DOC_VALUES_FOR_IGNORED_META_FIELD = def(8_505_00_1, Version.LUCENE_9_10_0); |
There was a problem hiding this comment.
@eyalkoren I've just spotted this - this should have incremented the NNN version of the version id, not the patch version. Check the comment right below the version constants for more info.
There was a problem hiding this comment.
@salvatore-campagna is the one two brought this to completion, thanks @thecoop for raising it.
Closes #59946
The actual implementation needs to take into account how we decide to implement #101153.
One of the original requirements was to stop making the
_ignoredfield stored while addingdoc_valuesto it. However, the current proposal of #101153 assumes that we can access the original order of the field's content, which means we must keep it stored as well. The eventual approach will determine the implementation of this change.