Skip to content

Keep track of timestamp_field mapping as part of a data stream#58096

Merged
martijnvg merged 28 commits intoelastic:masterfrom
martijnvg:keep_track__of_ds_timestamp_field
Jun 22, 2020
Merged

Keep track of timestamp_field mapping as part of a data stream#58096
martijnvg merged 28 commits intoelastic:masterfrom
martijnvg:keep_track__of_ds_timestamp_field

Conversation

@martijnvg
Copy link
Copy Markdown
Member

This change adds logic that keeps track of a data steam's field mapping of the timestamp_field.
When a data stream is created the mapping for the timestamp_field is extracted and stored with the data stream. Each time a rollover happens this field mapping is applied on top of the final mappings when the new backing index is created.

Having this logic is important, because the mapping inside a composable index template can change and so the mapping for the timestamp field can be changed too or removed. With this change the field mapping for the timestamp_field will stay the same for each new backing index.

@elasticmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-core-features (:Core/Features/Data streams)

@elasticmachine elasticmachine added the Team:Data Management (obsolete) DO NOT USE. This team no longer exists. label Jun 15, 2020
// Index doc that triggers creation of a data stream
IndexRequest indexRequest = new IndexRequest("logs-foobar").source("{}", XContentType.JSON).opType("create");
IndexResponse indexResponse = client().index(indexRequest).actionGet();
assertThat(indexResponse.getIndex(), equalTo(".ds-logs-foobar-000001"));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where possible, could we use the DataStream::getDefaultBackingIndexName method so that any changes there are automatically picked up at least in Java classes?

return Objects.hash(name, timeStampField, indices, generation);
}

public static final class TimestampField implements Writeable, ToXContentObject {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to allow this field to contain any mapping properties or just the field's data type? Also, should we require that it contains a data type property of either date or date_nanos?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This class should contain the mapping attributes. Part of the mapping attributes is the type attribute, so that data type is kept track of.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, should we permit not specifying a data type? A data type of either date or date_nanos is required on the composable template, so it seems like maybe it should be required on the data stream, too?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It isn't possible that any other data type than date and date_nanos is being included the fieldMapping map, because no composable index template with a data stream definition can be created with timestamp field mapping that is not of type date or date_nanos.

When creating a data stream the contents of the field mapping is copied and this includes a type attribute, so that we know whether either date or date_nanos is specified.

I can add an assert here that checks that value for the type key either returns date or date_nanos for sanity checking.

Copy link
Copy Markdown
Contributor

@henningandersen henningandersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a comment to consider about where the mapping for timestamp field is specified. Otherwise looking good.

new LinkedHashSet<>(List.of(DateFieldMapper.CONTENT_TYPE, DateFieldMapper.DATE_NANOS_CONTENT_TYPE));

private final ClusterService clusterService;
private final IndicesService indicesService;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is unused?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A left over from an iteration that never made it into the pr.

assert firstBackingIndex.mapping() != null : "no mapping found for backing index [" + firstBackingIndexName + "]";

String fieldName = template.getDataStreamTemplate().getTimestampField();
Map<String, Object> mapping = firstBackingIndex.mapping().getSourceAsMap();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it was more intuitive to require that the mapping be specified on the timestamp field in the template rather than mixing it in with the index mappings. In particular, it will be more clear that those are special, not modifiable compared to the others which can change on rollover.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think requiring that timestamp field mapping the be specified inside the timestamp_field object in a composable template makes sense. It helps highlighting that the timestamp_field and its mapping are special.

However I do think it should be made modifiable? So that new data streams being created can use an updated field mapping? This was my motivation for keeping track of the timestamp field mapping on the DataStream instance, so that new backing indices would use the same valid timestamp field mapping, but when new data streams are being created from the same composable template than an updated timestamp field mapping could be used.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++, agreed, I meant unmodifiable in the sense that template changes will not reflect on already created data-streams.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 I think we can change the data_stream definition in a composable template to include the mapping for the timestamp_field mapping in a seperate change?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that can work out too.

Copy link
Copy Markdown
Contributor

@henningandersen henningandersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I left a number of smaller comments to consider.

name: "*"
- match: { 0.name: simple-data-stream1 }
- match: { 0.timestamp_field: '@timestamp' }
- match: { 0.timestamp_field.field_name: '@timestamp' }
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this should be just timestamp_field.name?

I would like to also see an assert on the mapping being included and having the right content.

assertThat(indexResponse.getIndex(), equalTo(DataStream.getDefaultBackingIndexName("logs-foobar", 2)));

// Change the template to have a different timestamp field
createIndexTemplate("id1", "logs-foo*", "@timestamp2");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should rename createIndexTemplate to putIndexTemplate to signal that it also updates existing.

}

public void testChangeTimestampFieldInComposableTemplatePriorToRollOver() throws Exception {
createIndexTemplate("id1", "logs-foo*", "@timestamp");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also include tests of any mapping specialization/params? AFAICS, the implementation could keep only the mapping type now and all tests would succeed.


public TimestampField(String fieldName, Map<String, Object> fieldMapping) {
this.fieldName = fieldName;
this.fieldMapping = fieldMapping;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we assert that the fieldMapping contains ("type")?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added that assert here: https://github.com/elastic/elasticsearch/pull/58096/files#diff-82f348b800c60fabd3a34ff574e248a5R159

The reason is that in production, this is where new data stream instances are initiated.
Ideally this assertion would be added here, but then all tests need to be changes to supply: Map.of("type, "date") whereas currently tests supply: Map.of(). Maybe it makes sense to add an new constructor that just sets fieldMapping to Map.of("type, "date") and then delegates to this constructor where we can assert whether type key contains a valid value?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we not just supply DataStreamTestHelper.defaultMapping() (new method) instead? A few more bytes of source code, but I would prefer asserting this here to avoid something creating this in a wrong way somewhere or under some condition.

@martijnvg martijnvg merged commit 085ba99 into elastic:master Jun 22, 2020
martijnvg added a commit to martijnvg/elasticsearch that referenced this pull request Jun 22, 2020
…ic#58096)

Backporting elastic#58096 to 7.x branch.
Relates to elastic#53100

* use mapping source direcly instead of using mapper service to extract the relevant mapping details
* moved assertion to TimestampField class and added helper method for tests
* Improved logic that inserts timestamp field mapping into an mapping.
If the timestamp field path consisted out of object fields and
if the final mapping did not contain the parent field then an error
occurred, because the prior logic assumed that the object field existed.
martijnvg added a commit that referenced this pull request Jun 22, 2020
Backporting #58096 to 7.x branch.
Relates to #53100

* use mapping source direcly instead of using mapper service to extract the relevant mapping details
* moved assertion to TimestampField class and added helper method for tests
* Improved logic that inserts timestamp field mapping into an mapping.
If the timestamp field path consisted out of object fields and
if the final mapping did not contain the parent field then an error
occurred, because the prior logic assumed that the object field existed.
jrodewig added a commit that referenced this pull request Jun 24, 2020
…58444)

With #58096, data streams now track the timestamp field mapping outside
of the template associated with the stream. This means you can no longer
update the timestamp field mapping using template changes.

This updates the associated data stream docs.
jrodewig added a commit that referenced this pull request Jun 24, 2020
…58444) (#58517)

With #58096, data streams now track the timestamp field mapping outside
of the template associated with the stream. This means you can no longer
update the timestamp field mapping using template changes.

This updates the associated data stream docs.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>non-issue :StorageEngine/Data streams Data streams and their lifecycles Team:Data Management (obsolete) DO NOT USE. This team no longer exists. v7.9.0 v8.0.0-alpha1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants