TSDB: Add _tsid field to time_series indices#80276
Conversation
|
Pinging @elastic/es-analytics-geo (Team:Analytics) |
nik9000
left a comment
There was a problem hiding this comment.
We talked on video a bit but I had already started typing stuff. This makes public what we talked about.
server/src/main/java/org/elasticsearch/index/IndexSettings.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/index/mapper/IpFieldMapper.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/index/mapper/MapperService.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/index/mapper/Mapping.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/index/mapper/Mapping.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/index/mapper/TimeSeriesIdFieldMapper.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/index/mapper/TimeSeriesIdFieldMapper.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/index/mapper/TimeSeriesIdFieldMapper.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/index/mapper/TimeSeriesIdFieldMapper.java
Outdated
Show resolved
Hide resolved
|
Will index sorting settings about dealing _tsid added in this PR? Or create another PR do the feature? |
|
@weizijun this PR is work in progress. I need to add more tests and complete the index sorting part before it is ready to be merged. |
so that they are added explicitly in the field mapper
server/src/main/java/org/elasticsearch/index/mapper/TimeSeriesIdFieldMapper.java
Outdated
Show resolved
Hide resolved
to IndexMode so that it is only created in time_series mode
nik9000
left a comment
There was a problem hiding this comment.
I realized a thing about encoding these even if we're not using them. I think it's rare that a field will be a dimension outside of time series mode but it's possible. We should probably come up with a scheme to skip those fields....
qa/rolling-upgrade/src/test/java/org/elasticsearch/upgrades/IndexingIT.java
Show resolved
Hide resolved
| new FieldSortSpec(TimeSeriesIdFieldMapper.NAME), | ||
| new FieldSortSpec(DataStreamTimestampFieldMapper.DEFAULT_PATH) }; | ||
| return; | ||
| } |
There was a problem hiding this comment.
This could probably use the same "move it into a method on IndexMode" treatment that the builder used. But that can wait for a follow up I think.
| if (dimension) { | ||
| // Extract the tsid part of the dimension field | ||
| BytesReference bytes = TimeSeriesIdFieldMapper.extractTsidValue(NetworkAddress.format(address)); | ||
| context.doc().addDimensionBytes(fieldType().name(), bytes); |
There was a problem hiding this comment.
Now that I think about.... Can't you declare something a dimension without enabling the tsid field? I wonder if we should avoid encoding it if the field isn't present.
There was a problem hiding this comment.
For ip and numeric field types, we can skip the encodeTsidValue part if _tsid field is not going to be generated.
However, for keyword fields we still must do the encoding, because must validate the string length.
server/src/main/java/org/elasticsearch/index/mapper/TimeSeriesIdFieldMapper.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/index/mapper/TimeSeriesIdFieldMapper.java
Outdated
Show resolved
Hide resolved
|
Good call!
…On Thu, Nov 25, 2021, 10:21 AM Christos Soulios ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In server/src/main/java/org/elasticsearch/index/mapper/IpFieldMapper.java
<#80276 (comment)>
:
> @@ -497,18 +499,14 @@ private static InetAddress value(XContentParser parser, InetAddress nullValue) t
}
private void indexValue(DocumentParserContext context, InetAddress address) {
+ if (dimension) {
+ // Extract the tsid part of the dimension field
+ BytesReference bytes = TimeSeriesIdFieldMapper.extractTsidValue(NetworkAddress.format(address));
+ context.doc().addDimensionBytes(fieldType().name(), bytes);
For ip and numeric field types, we can skip the encodeTsidValue part if
_tsid field is not going to be generated.
However, for keyword fields we still must do the encoding, because must
validate the string length.
—
You are receiving this because you are on a team that was mentioned.
Reply to this email directly, view it on GitHub
<#80276 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AABUXIUOG6MA3K3XFT6H7P3UNZIANANCNFSM5HI45LVA>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
|
|
@elasticmachine run elasticsearch-ci/eql-correctness |
|
@elasticmachine run elasticsearch-ci/part-2 |
This PR builds on the work added in #80276 that generates the _tsid field for keyword, ip and number dimension fields. It adds support for unsigned_long dimension fields.
Since _tsid cannot be a multi-value field, this PR modifies the TimeSeriesIdFieldMapper so that _tsid is added as a SortedDocValuesField (instead of a SortedSetDocValuesField) Relates to #80276
This PR adds support for a field named
_tsidthat uniquely identifies the time series a document belongs to.When a document is indexed in a time series index (
IndexMode.TIME_SERIES),_tsidfield is generated from the values of all dimension fields.