Use @timestamp field to route documents to a backing index of a data stream by martijnvg · Pull Request #82079 · elastic/elasticsearch

martijnvg · 2021-12-24T18:31:12Z

Currently documents that target data streams are resolved to the target the write index of the data stream being targeted.
This change adjust this logic in the bulk api, to first parse the @timestamp field and then based on this timestamp select the right backing index. If the parsed timestamp of a document falls between a backing index's start_time and end_time then this backing index is used as write index.

Note that this logic is only enabled for tsdb data streams. A temporal slice of backing indices never overlap within a data stream, so either 1 backing index can be selected or none.

Relates #74660

…stream. Relates elastic#74660

martijnvg · 2022-01-06T14:24:41Z

server/src/main/java/org/elasticsearch/action/index/IndexRequest.java

+            ensureExpectedToken(XContentParser.Token.FIELD_NAME, parser.nextToken(), parser);
+            ensureExpectedToken(XContentParser.Token.VALUE_STRING, parser.nextToken(), parser);
+            String timestampAsString = parser.text();
+            // TODO: deal with nanos too here.


We add something (format string) to MappingMetadata that indicates how the @timestamp field should be parsed. We would need to fetch this from the latest backing index of a data stream.

Alternatively we can add information of how the @timestamp field should be parsed to DataStream class. Which feels like a better place, since we know this prior to selecting the right backing index based on the @timestamp field here.

elasticmachine · 2022-01-06T14:25:16Z

Pinging @elastic/es-analytics-geo (Team:Analytics)

elasticmachine · 2022-01-06T14:25:16Z

Pinging @elastic/es-data-management (Team:Data Management)

imotov

Looks good to me in general from the TSDB perspective. Left a couple of suggestions.

imotov · 2022-01-06T18:12:33Z

server/src/main/java/org/elasticsearch/action/index/IndexRequest.java

+
+            Index result = dataStream.selectWriteIndex(timestamp, metadata);
+            if (result == null) {
+                throw new IllegalArgumentException("no index available for a document with an @timestamp of [" + timestampAsString + "]");


It would be great if we could add a bit more useful information here. I think I would rephrase it as "the document timestamp [2022-01-07T19:04:41Z] is outside of ranges of currently writable indices: [[2022-01-06T00:00:00.000Z-2022-01-06T16:02:12.251Z], [2022-01-06T16:02:12.251Z-2022-01-06T20:02:12.251Z]]" or something like this.

Also implemented via: f97d889

imotov · 2022-01-06T18:21:57Z

server/src/main/java/org/elasticsearch/action/index/IndexRequest.java

    }

+    @Override
+    public Index getConcreteWriteIndex(IndexAbstraction ia, Metadata metadata) {


It feels like a bit too much logic for a data class, that the IndexRequest is essentially is. I wonder if it makes more sense as a part of IndexAbstraction instead.

Implemented via: f97d889

dakrone

I left a few really minor comments, but otherwise this looks good to me.

A temporal slice of backing indices never overlap within a data stream, so either 1 backing index can be selected or none.

Can you point me to where we do this validation? I wanted to get more familiar with it.

dakrone · 2022-01-06T21:24:39Z

server/src/main/java/org/elasticsearch/action/DocWriteRequest.java

     */
    int route(IndexRouting indexRouting);

+    default Index getConcreteWriteIndex(IndexAbstraction ia, Metadata metadata) {


Can you add javadocs for this?

Added via: efdd833

dakrone · 2022-01-06T21:59:40Z

server/src/main/java/org/elasticsearch/action/index/IndexRequest.java

+        try (XContentParser parser = contentType.xContent().createParser(TS_EXTRACT_CONFIG, source().streamInput())) {
+            ensureExpectedToken(XContentParser.Token.START_OBJECT, parser.nextToken(), parser);
+            ensureExpectedToken(XContentParser.Token.FIELD_NAME, parser.nextToken(), parser);
+            ensureExpectedToken(XContentParser.Token.VALUE_STRING, parser.nextToken(), parser);


I think we need to support epoch millis here also, correct? I tested it locally and it works, but not sure why this doesn't blow up since I would expect it to fail when a document with "@timestamp": 12309123 is indexed

dakrone · 2022-01-06T22:00:58Z

server/src/main/java/org/elasticsearch/cluster/metadata/DataStream.java

        return indices.get(indices.size() - 1);
    }

+    public Index selectWriteIndex(Instant timestamp, Metadata metadata) {


Can you rename this to be something TSDB specific, like selectTimeseriesWriteIndex (since we may end up adding different selection criteria in the future) and add javadocs?

Renamed and add jdocs: 36af27e

dakrone · 2022-01-06T22:10:10Z

server/src/test/java/org/elasticsearch/action/index/IndexRequestTests.java

        assertThat(validate.getMessage(), containsString("pipeline cannot be an empty string"));
    }
+
+    public void testGetConcreteWriteIndex() {


Can you add a test that uses epoch millis for the @timestamp instead of a string to ensure that it picks the right backing index also?

I added a test for this and also added logic for this. The case of providing timestamp as a number representing mills since epoch failed. I fixes this via: d70aa28

…ded as number.

martijnvg · 2022-01-10T12:58:54Z

Can you point me to where we do this validation? I wanted to get more familiar with it.

The validation that validates the start and end time settings across backing indices doesn't yet exist.
I think this validation should be added to Metadata.Builder#build() method (or when constructing a new DataStreamMetadata instance). This place we have know all data streams and we access to all IndexMetadata instances just before a new metadata instance comes into effect.

But maybe we can do this differently, now that we are going to make time series typed data streams. So I like to add this validation after we made data streams aware of index modes.

imotov

LGTM, Thanks!

dakrone

LGTM also!

So I like to add this validation after we made data streams aware of index modes.

Makes sense, thanks for the heads up

Use @timestamp field to route documents to a backing index of a data …

9581ae4

…stream. Relates elastic#74660

elasticsearchmachine added the v8.1.0 label Dec 24, 2021

martijnvg added 5 commits December 24, 2021 21:12

spotless

9f109d4

Merge remote-tracking branch 'es/master' into tsdb_data_stream_routing

d540c49

Merge remote-tracking branch 'es/master' into tsdb_data_stream_routing

3fd011d

Fixed TODO by reusing existing checkSystemIndexAccess(...) method.

ffd5b69

added unit test

53dae1d

martijnvg commented Jan 6, 2022

View reviewed changes

martijnvg marked this pull request as ready for review January 6, 2022 14:24

martijnvg added :StorageEngine/TSDB You know, for Metrics :StorageEngine/Data streams Data streams and their lifecycles labels Jan 6, 2022

elasticmachine added Team:Data Management (obsolete) DO NOT USE. This team no longer exists. Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) labels Jan 6, 2022

martijnvg added >non-issue and removed Team:Data Management (obsolete) DO NOT USE. This team no longer exists. Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) labels Jan 6, 2022

dakrone self-requested a review January 6, 2022 17:00

imotov reviewed Jan 6, 2022

View reviewed changes

imotov mentioned this pull request Jan 6, 2022

Add better support for metric data types (TSDB) #74660

Closed

dakrone approved these changes Jan 6, 2022

View reviewed changes

martijnvg added 4 commits January 10, 2022 11:08

Add logic so that timestamp can be parsed as millis since epoch provi…

d70aa28

…ded as number.

rename

36af27e

added jdoc for DocWriteRequest#getConcreteWriteIndex(...)

efdd833

refactor

f97d889

imotov approved these changes Jan 10, 2022

View reviewed changes

dakrone approved these changes Jan 10, 2022

View reviewed changes

added unit tests

2de64c3

Merge remote-tracking branch 'es/master' into tsdb_data_stream_routing

2bab109

martijnvg merged commit aa7fafc into elastic:master Jan 11, 2022

Conversation

martijnvg commented Dec 24, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

elasticmachine commented Jan 6, 2022

Uh oh!

elasticmachine commented Jan 6, 2022

Uh oh!

imotov left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dakrone left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

martijnvg commented Jan 10, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

imotov left a comment

Choose a reason for hiding this comment

Uh oh!

dakrone left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

martijnvg commented Dec 24, 2021 •

edited

Loading

martijnvg commented Jan 10, 2022 •

edited

Loading