Keep track of timestamp_field mapping as part of a data stream#58096
Keep track of timestamp_field mapping as part of a data stream#58096martijnvg merged 28 commits intoelastic:masterfrom
Conversation
|
Pinging @elastic/es-core-features (:Core/Features/Data streams) |
| // Index doc that triggers creation of a data stream | ||
| IndexRequest indexRequest = new IndexRequest("logs-foobar").source("{}", XContentType.JSON).opType("create"); | ||
| IndexResponse indexResponse = client().index(indexRequest).actionGet(); | ||
| assertThat(indexResponse.getIndex(), equalTo(".ds-logs-foobar-000001")); |
There was a problem hiding this comment.
Where possible, could we use the DataStream::getDefaultBackingIndexName method so that any changes there are automatically picked up at least in Java classes?
| return Objects.hash(name, timeStampField, indices, generation); | ||
| } | ||
|
|
||
| public static final class TimestampField implements Writeable, ToXContentObject { |
There was a problem hiding this comment.
Do we want to allow this field to contain any mapping properties or just the field's data type? Also, should we require that it contains a data type property of either date or date_nanos?
There was a problem hiding this comment.
This class should contain the mapping attributes. Part of the mapping attributes is the type attribute, so that data type is kept track of.
There was a problem hiding this comment.
Ok, should we permit not specifying a data type? A data type of either date or date_nanos is required on the composable template, so it seems like maybe it should be required on the data stream, too?
There was a problem hiding this comment.
It isn't possible that any other data type than date and date_nanos is being included the fieldMapping map, because no composable index template with a data stream definition can be created with timestamp field mapping that is not of type date or date_nanos.
When creating a data stream the contents of the field mapping is copied and this includes a type attribute, so that we know whether either date or date_nanos is specified.
I can add an assert here that checks that value for the type key either returns date or date_nanos for sanity checking.
henningandersen
left a comment
There was a problem hiding this comment.
I left a comment to consider about where the mapping for timestamp field is specified. Otherwise looking good.
| new LinkedHashSet<>(List.of(DateFieldMapper.CONTENT_TYPE, DateFieldMapper.DATE_NANOS_CONTENT_TYPE)); | ||
|
|
||
| private final ClusterService clusterService; | ||
| private final IndicesService indicesService; |
There was a problem hiding this comment.
I think this is unused?
There was a problem hiding this comment.
A left over from an iteration that never made it into the pr.
| assert firstBackingIndex.mapping() != null : "no mapping found for backing index [" + firstBackingIndexName + "]"; | ||
|
|
||
| String fieldName = template.getDataStreamTemplate().getTimestampField(); | ||
| Map<String, Object> mapping = firstBackingIndex.mapping().getSourceAsMap(); |
There was a problem hiding this comment.
I wonder if it was more intuitive to require that the mapping be specified on the timestamp field in the template rather than mixing it in with the index mappings. In particular, it will be more clear that those are special, not modifiable compared to the others which can change on rollover.
There was a problem hiding this comment.
I think requiring that timestamp field mapping the be specified inside the timestamp_field object in a composable template makes sense. It helps highlighting that the timestamp_field and its mapping are special.
However I do think it should be made modifiable? So that new data streams being created can use an updated field mapping? This was my motivation for keeping track of the timestamp field mapping on the DataStream instance, so that new backing indices would use the same valid timestamp field mapping, but when new data streams are being created from the same composable template than an updated timestamp field mapping could be used.
There was a problem hiding this comment.
++, agreed, I meant unmodifiable in the sense that template changes will not reflect on already created data-streams.
There was a problem hiding this comment.
👍 I think we can change the data_stream definition in a composable template to include the mapping for the timestamp_field mapping in a seperate change?
There was a problem hiding this comment.
Yes, that can work out too.
henningandersen
left a comment
There was a problem hiding this comment.
LGTM. I left a number of smaller comments to consider.
| name: "*" | ||
| - match: { 0.name: simple-data-stream1 } | ||
| - match: { 0.timestamp_field: '@timestamp' } | ||
| - match: { 0.timestamp_field.field_name: '@timestamp' } |
There was a problem hiding this comment.
I wonder if this should be just timestamp_field.name?
I would like to also see an assert on the mapping being included and having the right content.
| assertThat(indexResponse.getIndex(), equalTo(DataStream.getDefaultBackingIndexName("logs-foobar", 2))); | ||
|
|
||
| // Change the template to have a different timestamp field | ||
| createIndexTemplate("id1", "logs-foo*", "@timestamp2"); |
There was a problem hiding this comment.
I think we should rename createIndexTemplate to putIndexTemplate to signal that it also updates existing.
| } | ||
|
|
||
| public void testChangeTimestampFieldInComposableTemplatePriorToRollOver() throws Exception { | ||
| createIndexTemplate("id1", "logs-foo*", "@timestamp"); |
There was a problem hiding this comment.
Should we also include tests of any mapping specialization/params? AFAICS, the implementation could keep only the mapping type now and all tests would succeed.
server/src/main/java/org/elasticsearch/cluster/metadata/MetadataCreateIndexService.java
Outdated
Show resolved
Hide resolved
|
|
||
| public TimestampField(String fieldName, Map<String, Object> fieldMapping) { | ||
| this.fieldName = fieldName; | ||
| this.fieldMapping = fieldMapping; |
There was a problem hiding this comment.
Should we assert that the fieldMapping contains ("type")?
There was a problem hiding this comment.
I added that assert here: https://github.com/elastic/elasticsearch/pull/58096/files#diff-82f348b800c60fabd3a34ff574e248a5R159
The reason is that in production, this is where new data stream instances are initiated.
Ideally this assertion would be added here, but then all tests need to be changes to supply: Map.of("type, "date") whereas currently tests supply: Map.of(). Maybe it makes sense to add an new constructor that just sets fieldMapping to Map.of("type, "date") and then delegates to this constructor where we can assert whether type key contains a valid value?
There was a problem hiding this comment.
Could we not just supply DataStreamTestHelper.defaultMapping() (new method) instead? A few more bytes of source code, but I would prefer asserting this here to avoid something creating this in a wrong way somewhere or under some condition.
If the timestamp field path consisted out of object fields and if the final mapping did not contain the parent field then an error occurred, because the prior logic assumed that the object field existed.
…ic#58096) Backporting elastic#58096 to 7.x branch. Relates to elastic#53100 * use mapping source direcly instead of using mapper service to extract the relevant mapping details * moved assertion to TimestampField class and added helper method for tests * Improved logic that inserts timestamp field mapping into an mapping. If the timestamp field path consisted out of object fields and if the final mapping did not contain the parent field then an error occurred, because the prior logic assumed that the object field existed.
Backporting #58096 to 7.x branch. Relates to #53100 * use mapping source direcly instead of using mapper service to extract the relevant mapping details * moved assertion to TimestampField class and added helper method for tests * Improved logic that inserts timestamp field mapping into an mapping. If the timestamp field path consisted out of object fields and if the final mapping did not contain the parent field then an error occurred, because the prior logic assumed that the object field existed.
This change adds logic that keeps track of a data steam's field mapping of the timestamp_field.
When a data stream is created the mapping for the timestamp_field is extracted and stored with the data stream. Each time a rollover happens this field mapping is applied on top of the final mappings when the new backing index is created.
Having this logic is important, because the mapping inside a composable index template can change and so the mapping for the timestamp field can be changed too or removed. With this change the field mapping for the timestamp_field will stay the same for each new backing index.